H Mahler Exam C Study Guides, 2016

hmahler@mac.
com, Exam C Study Guides, 10/27/15, Page 1
Introduction to 2016 Exam C Study Guides

Howard C. Mahler, FCAS, MAAA
In 2013 and prior years the SOA and CAS jointly administered Exam 4/C.
Starting in 2014 the SOA administered Exam C without the CAS.
How much detail is needed and how many problems need to be done varies by person and topic.
In order to help you to concentrate your efforts:
1. About 1/6 of the many problems are labeled “highly recommended”,
while another 1/6 are labeled “recommended.”
2. Important Sections are listed in bold in the table of contents.
Extremely important Sections are listed in larger type and in bold.
3. Important ideas and formulas are in bold.
4. Each Study Guide has a Section of Important Ideas and Formulas.
5. Each Study Guide has a chart of past exam questions by Section.
6. There is a breakdown of percent of questions on each past exam by Study Guide.
My Study Aids are a thick stack of paper.1 However, many students find they do not need to look
at the textbooks. For those who have trouble getting through the material, concentrate on
the introductions and sections in bold.
Highly Recommended problems (about 1/6 of the total) are double underlined.
Recommended problems (about 1/6 of the total) are underlined.
Do at least the Highly Recommended problems your first time through.
It is important that you do problems when learning a subject and then some more problems
a few weeks later.
Be sure to do all the questions from the recent Exams at some point.
I have written some easy and tougher problems.2 The former exam questions are arranged in
chronological order. The more recent exam questions are on average more similar to what you will
be asked on your exam, than are less recent questions.
All of the Sample Exam questions are included (there are 299 prior to deletions.)
Their locations are shown in my final study guide, Breakdown of Past Exams.
Each of my study guides is divided into sections as shown in its table of contents.
The solutions to the problems in a section of a study guide are at the end of that section.
1
The number of pages is not as important as how long it takes you to understand the material. One page in a
textbook might take someone as long to understand as ten pages in my Study Guides.
2
Points are based on 100 points = a 4 hour exam.
hmahler@mac.com, Exam C Study Guides, 10/27/15, Page 2
In the electronic version use the bookmarks / table of contents in the Navigation Panel
in order to help you find what you want.
You may find it helpful to print out selected portions, such as the Table of Contents and Important
Ideas Section in each of my study guides.
Mahlerʼs Guides for Exam C have 14 parts,

which are listed below, along with my estimated percent of the exam.3
Study Guides for Exam C
1 6% Mahler's Guide to Frequency Distributions

2 8% Mahler's Guide to Loss Distributions
3 6% Mahler's Guide to Aggregate Distributions
4 3% Mahler's Guide to Risk Measures
5 4% Mahler's Guide to Fitting Frequency Distributions
6 25% Mahler's Guide to Fitting Loss Distributions
7 12% Mahler's Guide to Survival Analysis
8 3% Mahler's Guide to Classical Credibility
9 13% Mahler's Guide to Buhlmann Credibility & Bayesian Analysis
10 7% Mahler's Guide to Conjugate Priors
11 1% Mahler's Guide to Semiparametric Estimation
12 3% Mahler's Guide to Nonparametric Credibility
13 9% Mahler's Guide to Simulation
14 Breakdown of Past Exams
My Practice Exams are sold separately.
My Seminar Style Slides are sold separately.
3
This is my best estimate, which should be used with appropriate caution, particularly in light of the changes in the
syllabus. In any case, the number of questions by topic varies from exam to exam.
Syllabus Changes for the October 2013 Exam:
Various sections of the Fourth Edition of Loss Models have been added to the syllabus.
I estimate that they have added approximately 4% to the prior syllabus.
For the (Kaplan-Meier) Approximation for Large Data Sets, Section 12.4 from the 4th edition of
Loss Models replaces Section 14.4 from the 3rd edition.
There is new material as well as new notation.4
Additions to the syllabus from the Fourth Edition of Loss Models:
Section 5.3.4 Two Heavy Tailed Distributions5
Section 13.5 Maximum Likelihood Estimation of Decrement Probabilities6
Section 20.2 Simulation for Specific Distributions7
Changes made to Study Note C-09-00 (Sample Questions), in July 2013:
Questions and solutions 73A and 290-299 were added.8
Question and solution 73 (4, 11/01, Q.27) were modified.
Question 261 (4, 11/06, Q.17) was deleted.
Syllabus Changes for the June 2015 Exam:
The Anderson-Darling test has been dropped.
Author Biography:
Howard C. Mahler is a Fellow of the Casualty Actuarial Society,

and a Member of the American Academy of Actuaries.
He has taught actuarial exam seminars and published study guides since 1994.
He spent over 20 years in the insurance industry, the last 15 as Vice President and Actuary at the
Workers' Compensation Rating and Inspection Bureau of Massachusetts.
He has published dozens of major research papers and won the 1987 CAS Dorweiler prize.
He served 12 years on the CAS Examination Committee including three years as head of the
whole committee (1990-1993).
Mr. Mahler has taught live seminars and/or classes for Exam C, Exam MFE,
CAS Exam ST, CAS Exam 5, and CAS Exam 8.
He has written study guides for all of the above.
hmahler@mac.com www.howardmahler.com/Teaching
4
See Sections 8, 9, 10, and 11 of “Mahlerʼs Guide to Survival Analysis.”
5
See Section 41 of “Mahlerʼs Guide to Loss Distributions.”
6
See Section 8 of “Mahlerʼs Guide to Survival Analysis.”
7
See Sections 6, 7, 11, 13, and 18 of “Mahlerʼs Guide to Simulation.”
8
Included in my guides to Survival Analysis and Simulation.
All references in my study guides are to the fourth edition of Loss Models.
Loss Models, Fourth Edition9 Mahler Study Guides
Chapter 3.1 Loss Distributions: Sections 2-4, 7-8.

Chapter 3.2 Loss Distributions: Section 19.
Chapter 3.3 Freq. Dists.: Section 9, Aggregate Dists.: Sections 4-5
Chapter 3.4 Loss Distributions: Sections 30, 33, 34.
Chapter 3.5 Risk Measures
Chapter 4 Loss Distributions: Sections 21, 38.
Chapter 5.2 Loss Distributions: Sections 29, 39, 40.

Chapter 5.3 Loss Distributions: Sections 22, 24-28, 41.
Chapter 5.4 Conjugate Priors: Section 11.
Chapters 6 Frequency Distributions: Sections 1-6, 9, 11-14, 19.

Chapter 8 Loss Dists.: Sections 6, 15-18, 36. Freq Dists.: Sections 3-6.
Chapter 9.1-9.7, 9.8.1-9.8.210 Aggregate Distributions
Chapter 10 Fitting Loss Dists.: Sections 14, 26. Freq. Dists.: Section 7.
Chapter 11 Loss Dists.: Section 4, Fitting Loss Dists.: Sections 5-6,

Survival Analysis: Sections 1, 4.
Chapter 12 Survival Analysis: Sections 1-3, 5-6, 8-11.

Loss Dists.: Sections 16-17, Fitting Loss Dists.: Sections 5, 6.
Chapter 13 Fit Loss Dists.: Secs. 7-11, 20-25, 27-31. Surv. Anal.: Sec. 7, 8.
Chapter 14.1-14.4, 14.6 Fitting Freq. Dists.: Sections 2, 3, 6.
Chapter 15 Buhlmann Cred.: Sections 4-6, 16. Conj. Priors: Section 11.
Chapter 1611 Fitting Freq. Dists.: Secs 4-5, Fitting Loss Dists.: Secs 12-18.
Chapter 17.2-17.7 Classical Credibility

Chapter 18 Buhlmann Credibility, Conjugate Priors.
Chapter 19 Nonparametric Estimation, Semiparametric Estimation
Chapter 20 Simulation, Risk Measures Section 7

9
Chapters 17-19 of Loss Models are not on the syllabus, but cover material that is on the syllabus.
See instead “Credibility” by Mahler & Dean, and “Topics n Credibility” by Dean.
10
Excluding 9.6.1.
11
Excluding 16.4.2
Besides many past exam questions from the CAS and SOA, my study guides include some past
questions from exams given by the Institute of Actuaries and Faculty of Actuaries in Great Britain.
These questions are copyright by the Institute of Actuaries and Faculty of Actuaries, and are
reproduced here solely to aid students studying for actuarial exams. These IOA questions are
somewhat different in format than those on your exam, but should provide some additional
perspective on the syllabus material.
Your exam will be 3.5 hours and will consist of approximately 35 multiple choice questions, each of
equal value.12 The examination will be offered via computer-based testing.
Download from the SOA website, a copy of the tables to be attached to your exam.13
Read the “Hints on Study and Exam Techniques” in the CAS Syllabus.14
Read “Tips for Taking Exams.”15
Some students have reported success with the following guessing strategy.
When you are ready to guess (a few minutes before time is finished for the exam), count up how
many you have answered of each letter.
Then fill in the least used letter, at each stage.
For example, if the fewest were A, fill in A's until some other letter is fewest.
Now fill in that letter, etc.
Remember that for every question you should fill in a letter answer.16
On Exam C, the following rule applies to the use of the Normal Table:
When using the normal distribution, choose the nearest z-value to find the probability, or
if the probability is given, chose the nearest z-value. No interpolation should be used.
Example: If the given z-value is 0.759, and you need to find Pr(Z < 0.759) from the normal
distribution table, then choose the probability value for z-value = 0.76; Pr(Z < 0.76) = 0.7764.
When using the Normal Approximation to a discrete distribution, use the continuity correction.
12
Equivalent to “2.5 points” each, if 100 points = a 4 hour exam.
13
http://www.soa.org
14
http://www.casact.org/admissions/syllabus/index.cfm?fa=hints
15
www.casact.org/admissions/index.cfm?fa=tips
16
Nothing will be added for an unanswered question and nothing will be subtracted for an incorrect answer.
Therefore put down an answer, even a total guess, for every question.
I suggest you buy and use the TI-30XS Multiview calculator. You will save time doing repeated
calculations using the same formula. Examples include calculating processes variances to calculate an
EPV, constructing a Distribution Table of a frequency distribution, simulating from the same
continuous distribution several times, etc.
While studying, you should do as many problems as possible. Going back and forth between
reading and doing problems is the only way to pass this exam. The only way to learn to solve
problems is to solve lots of problems. You should not feel satisfied with your study of a subject until
you can solve a reasonable number of the problems.
There are two manners in which you should be doing problems. First you can do problems in order
to learn the material. Take as long on each problem as you need to fully understand the concepts
and the solution. Reread the relevant syllabus material. Carefully go over the solution to see if you
really know what to do. Think about what would happen if one or more aspects of the question were
revised.17 This manner of doing problems should be gradually replaced by the following manner as
you get closer to the exam.
The second manner is to do a series of problems under exam conditions, with the items you will
have when you take the exam. Take in advance a number of points to try based on the time
available. For example, if you have an uninterrupted hour, then one might try either
60/2.5 = 24 points or 60/3 = 20 points of problems. Do problems as you would on an exam in any
order, skipping some and coming back to some, until you run out of time. I suggest you leave time
to double check your work.
Expose yourself somewhat to everything on the syllabus. Concentrate on sections and items in
bold. Do not read sections or material in italics your first time through the material.18 Each study guide
has a chart of where the past exam questions have been; this may also help you to direct your
efforts.19 Try not to get bogged down on a single topic. On hard subjects, try to learn at least the
simplest important idea. The first time through do enough problems in each section, but leave some
problems in each section to do closer to the exam. At least every few weeks review the important
ideas and formulas sections of those study guides you have already completed.
Make a schedule and stick to it. Spend a minimum of one hour every day.
I recommend at least two study sessions every day, each of at least 1/2 hour.
17
Some may also find it useful to read about a dozen questions on an important subject, thinking about how to set
up the solution to each one, but only working out in detail any questions they do not quickly see how to solve.
18
Material in italics is provided for those who want to know more about a particular subject and/or to be prepared for
more challenging exam questions. Material in italics could be directly needed to answer perhaps one or two
questions on an exam.
19
While this may indicate what ideas questions on your exam are likely to cover, every exam contains a few questions
on ideas that have yet to be asked.
Use whatever order to go through the material that works best for you.
Here is a schedule that may work for some people.20
A 14 week Study Schedule for Exam C:
1. Frequency Distributions
2. Start of Loss Distributions: sections 1 to 30.
3. Rest of Loss Distributions: Remainder.
4. Aggregate Distributions
5. Fitting Frequency Distributions

Classical Credibility
6. Start of Buhlmann Credibility and Bayesian Analysis: sections 1-6, 12, and 7.
7. Start of Fitting Loss Distributions: sections 1 to 10.
8. Rest of Buhlmann Credibility and Bayesian Analysis: Remainder.
9. More Fitting Loss Distributions: sections 11 to 18.
10. Rest of Fitting Loss Distributions: Remainder.
11. Conjugate Priors
12. Survival Analysis
13. Semiparametric Estimation

Nonparametric Credibility
Risk Measures
14. Simulation
20
This is just an example of one possible schedule. Adjust it to suit your needs or make one up yourself.
Most of you will need to spend a total of 300 or more hours of study time on the entire syllabus; this
means an average of at least 2 hours a day.
Throughout do Exam Problems and Practice Problems in my study guides. At least 50% of your
time should be spent doing problems. As you get closer to the Exam, the portion of time spent
doing problems should increase.
Review the important formulas and ideas section at the end of each study guide.
During the last several weeks do my practice exams, sold separately.

The SOA has posted a preview of the tables for Computer Based Testing:
http://www.beanactuary.org/exams/4C/Split.html
I would suggest you use them if possible when doing practice exams.
Past students helpful suggestions and questions have greatly improved these Study Aids.
I thank them.
Feel free to send me any questions or suggestions:

Howard Mahler, Email: hmahler@mac.com
Please do not copy the Study Aids, except for your own personal use. Giving them to others is
unfair to yourself, your fellow students who have paid for them, and myself.21
If you found them useful, tell a friend to buy his own.
Please send me any suspected errors by Email.

(Please specify as carefully as possible the page, Study Guide and Course.)
The errata sheet will be posted on my webpage: www.howardmahler.com/Teaching
21
These study aids represent thousands of hours of work.
Pass Marks and Passing Percentages for Past Exams:22
Pass Number Effective Number Number Percent % Effective

Exam 4/C Mark of Exams of Exams Passing Passing Passing
Fall 2009 58%23 2198 2004 959 43.6% 47.9%
Spring 2010 66% 1674 1559 702 41.9% 45.0%

Aug. 2010 66%24 1252 1163 552 44.1% 47.5%
Nov. 2010 64% 1512 1358 612 40.5% 45.1%
Feb. 2011 64% 1470 1304 598 40.7% 45.9%

June 2011 64% 1890 1681 745 39.4% 44.3%
Oct. 2011 64% 1962 1723 858 43.7% 49.8%
Feb. 2012 67% 1461 1318 665 45.5% 50.5%

May 2012 67% 2008 1798 904 45.0% 50.3%
Oct. 2012 67% 1969 1800 908 46.1% 50.4%
Feb. 2013 67% 1728 1579 786 45.5% 49.8%

June 2013 67% 2195 1984 934 42.6% 47.1%
Oct. 2013 67% 1909 1723 749 39.2% 43.5%
Feb. 2014 67% 1669 1513 753 45.1% 49.8%

June 2014 67% 2540 2290 1224 48.2% 53.4%
Oct. 2014 67% 2062 1883 983 47.7% 52.2%
Feb. 2015 67% 1741 1600 869 49.9% 54.3%

June 2015 67% 2117 1915 1005 47.5% 52.5%
22
Information taken from the SOA website. Check the website for updated information.
23
Starting in Fall 2009, there was computer-based testing. All versions of the exam are constructed to be of
comparable difficulty to one another. Apparently, the passing percentage varies somewhat by version of the exam.
On average, 58% correct was needed to pass the exam.
24
Examination C/4 is administered using computer-based testing (CBT). Under CBT, it is not possible to schedule
everyone to take the examination at the same time. As a result, each administration consists of multiple versions of
the examination given over a period of several days. The examinations are constructed and scored using Item
Response Theory (IRT). Under IRT, each operational item that appears on an examination has been calibrated for
difficulty and other test statistics and the pass mark for each examination is determined before the examination is
given. All versions of the examination are constructed to be of comparable difficulty to one another.
For the August 2010 administration of Examination C/4, an average of 66% correct was needed to pass the exam.
Mahlerʼs Guide to
Frequency Distributions
Exam C
prepared by
Howard C. Mahler, FCAS
Copyright 2016 by Howard C. Mahler.
Study Aid 2016-C-1
Howard Mahler
hmahler@mac.com
www.howardmahler.com/Teaching
2016-C-1, Frequency Distributions HCM 10/21/15, Page 1
Mahlerʼs Guide to Frequency Distributions

Information in bold or sections whose title is in bold are more important for passing the exam.
Larger bold type indicates it is extremely important.
Information presented in italics (or sections whose title is in italics) should not be needed to directly
answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs
overall understanding of the subject, and to be useful in practical applications.
Highly Recommended problems are double underlined.

Recommended problems are underlined.1
Solutions to the problems in each section are at the end of that section.
Section # Pages Section Name

A 1 4 Introduction
2 5-15 Basic Concepts
3 16-41 Binomial Distribution
4 42-74 Poisson Distribution
B 5 75-96 Geometric Distribution
6 97-122 Negative Binomial Distribution
7 151-163
123-150 Normal Approximation
C 8 151-163 Skewness
9 164-179 Probability Generating Functions
10 180-192 Factorial Moments
11 193-214 (a, b, 0) Class of Distributions

12 215-228 Accident Profiles
D 13 229-252 Zero-Truncated Distributions
14 253-274 Zero-Modified Distributions
15 275-289 Compound Frequency Distributions
16 290-310 Moments of Compound Distributions
17 311-356 Mixed Frequency Distributions
E 18 357-368 Gamma Function
19 369-411 Gamma-Poisson Frequency Process
20 412-422 Tails of Frequency Distributions
F 21 423-430 Important Formulas and Ideas
1
Note that problems include both some written by me and some from past exams. The latter are copyright by the
CAS and SOA, and are reproduced here solely to aid students in studying for exams. The solutions and comments
are solely the responsibility of the author; the CAS and SOA bear no responsibility for their accuracy. While some of
the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is
intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams. In some
cases Iʼve rewritten these questions in order to match the notation in the current Syllabus.
Past Exam Questions by Section of this Study Aid2
Course 3 Course 3 Course 3 Course 3 Course 3 Course 3 CAS 3 SOA 3 CAS 3

Section Sample 5/00 11/00 5/01 11/01 11/02 11/03 11/03 5/04
1
2
3 14
4 16
5
6 18
7
8 28
9
10
11 25 28 32
12
13
14 37
15
16 2 16 36 30 27 26
17 13
18
19 12 4 3 15 27 5 15
20
The CAS/SOA did not release the 5/02 and 5/03 exams.
From 5/00 to 5/03, the Course 3 Exam was jointly administered by the CAS and SOA.
Starting in 11/03, the CAS and SOA gave separate exams. (See the next page.)
2
Excluding any questions that are no longer on the syllabus.
CAS 3 SOA 3 CAS 3 SOA M CAS 3 SOA M CAS 3 CAS 3 SOA M 4/C
Section 11/04 11/04 5/05 5/05 11/05 11/05 5/06 11/06 11/06
Section
5/07
1
2
3 22 24 8 15
4 23 39 24 32
5
6 21 28 32 23 24 31 22
7
8
9
10 25
11 16 19 31
12
13
14
15 27
16 18 35 30
17 32 19 39
18
19 10
20
The SOA did not release its 5/04 and 5/06 exams.
This material was moved to Exam 4/C in 2007.
The CAS/SOA did not release the 11/07 and subsequent exams.
2016-C-1, Frequency Distributions, §1 Introduction HCM 10/21/15, Page 4
Section 1, Introduction
This Study Aid will review what a student needs to know about the frequency distributions in
Loss Models. Much of the first seven sections you should have learned on Exam P.
In actuarial work, frequency distributions are applied to the number of losses, the number of claims,
the number of accidents, the number of persons injured per accident, etc.
Frequency Distributions are discrete functions on the nonnegative integers: 0, 1, 2, 3, ...
There are three named frequency distributions you should know:

Binomial, with special case Bernoulli
Poisson
Negative Binomial, with special case Geometric.
Most of the information you need to know about each of these distributions is shown in
Appendix B, attached to the exam. Nevertheless, since they appear often in exam questions, it is
desirable to know these frequency distributions well, particularly the Poisson Distribution.
In addition, one can make up a frequency distribution.

How to work with such unnamed frequency distributions is discussed in the next section.
In later sections, the important concepts of Compound Distributions and Mixed Distributions will be
discussed.3
The most important case of a mixed frequency distribution is the Gamma-Poisson frequency
process.
3
Compound Distributions are mathematically equivalent to Aggregate Distributions, which are discussed in
“Mahlerʼs Guide to Aggregate Distributions.”
2016-C-1, Frequency Distributions, §2 Basic Concepts HCM 10/21/15, Page 5
Section 2, Basic Concepts
The probability density function4 f(i) can be non-zero at either a finite or infinite number of points.
In the former case, the probability density function is determined by a table of its values at these
finite number of points.
∞
The f(i) can take on any values provided they satisfy 0 ≤ f(i) ≤1 and ∑ f(i) = 1.
i=0
For example:
Number Probability Cumulative
of Claims Density Function Distribution Function
0 0.1 0.1
1 0.2 0.3
2 0 0.3
3 0.1 0.4
4 0 0.4
5 0 0.4
6 0.1 0.5
7 0 0.5
8 0 0.5
9 0.1 0.6
10 0.3 0.9
11 0.1 1
Sum 1
The Distribution Function5 is the cumulative sum of the probability density function:
j
F(j) = ∑ f(i) .
i=0
In the above example, F(3) = f(0) + f(1) + f(2) + f(3) = 0.1 + 0.2 + 0 + 0.1 = 0.4.
4
Loss Models calls the probability density function of frequency the “probability function” or
p.f. and uses the notation pk for f(k), the density at k.
5
Also called the cumulative distribution function.
Moments:
One can calculate the moments of such a distribution.
For example, the first moment or mean is:

(0)(0.1) + (1)(0.2) + (2)(0) + (3)(0.1) + (4)(0) + (5)(0) + (6)(0.1) + (7)(0) + (8)(0) + (9)(0.1)
+ (10)(0.3) + (11)(0.1) = 6.1.
Probability x
Number Probability Probability x Square of
of Claims Density Function # of Claims # of Claims
0 0.1 0 0
1 0.2 0.2 0.2
2 0 0 0
3 0.1 0.3 0.9
4 0 0 0
5 0 0 0
6 0.1 0.6 3.6
7 0 0 0
8 0 0 0
9 0.1 0.9 8.1
10 0.3 3 30
11 0.1 1.1 12.1
Sum 1 6.1 54.9
E[X] = Σ i f(i) = Average of X = 1st moment about the origin = 6.1.
E[X2 ] = Σ i2 f(i) = Average of X2 = 2nd moment about the origin = 54.9.
The second moment is:

(02 )(0.1) + (12 )(0.2) + (22 )(0) + (32 )(0.1) + (42 )(0) + (52 )(0) + (62 )(0.1) + (72 )(0) + (82 )(0)
+ (92 )(0.1) + (102 )(0.3) + (112 )(0.1) = 54.9.
Mean = E[X] = 6.1.
Variance = second central moment = E[(X - E[X])2 ] = E[X2 ] - E[X]2 = 17.69.
Standard Deviation = Square Root of Variance = 4.206.
The mean is the average or expected value of the random variable. For the above example, the
mean is 6.1 claims.
In general means add; E[X+Y] = E[X] + E[Y]. Also multiplying a variable by a constant multiplies the
mean by the same constant; E[kX] = kE[X].
The mean is a linear operator: E[aX + bY] = aE[X] + bE[Y].
The mean of a frequency distribution can also be computed as a sum of its survival functions:6
∞ ∞
E[X] = ∑ Prob[X > i] = ∑ {1 - F(i)} .
i=0 i=0
Mode and Median:
The mean differs from the mode which represents the value most likely to occur. The mode is the
point at which the density function reaches its maximum. The mode for the above example is 10
claims.
For a discrete distribution, take the 100pth percentile as the first value at which F(x) ≥ p.7
The 80th percentile for the above example is 10; F(9) = 0.6, F(10) = 0.9.
The median is the 50th percentile. For frequency distributions, and other discrete distributions,
the median is the first value at which the distribution function is greater than or equal to 0.5. The
median for the above example is 6 claims; F(6) = 0.5.
Definitions:
Exposure Base: The basic unit of measurement upon which premium is determined.
For example, the exposure base could be car-years, $100 of payrolls, number of insured lives, etc.
The rate for Workersʼ Compensation Insurance might be $3.18 per $100 of payroll, with $100 of
payroll being one exposure.
Frequency: The number of losses or number of payments random variable, (unless indicated
otherwise) stated per exposure unit.
For example the frequency could be the number of losses per (insured) house-year.
Mean Frequency: Expected value of the frequency.
For example, the mean frequency might be 0.03 claims per insured life per year.
6
This is analogous to the situation for a continuous loss distributions; the mean of a Loss Distribution can be
computed as the integral of its survival function.
7
Definition 3.6 in Loss Models. F(πp -) ≤ p ≤ F(πp ).
Problems:
Use the following frequency distribution for the next 5 questions:

Number of Claims Probability
0 0.02
1 0.04
2 0.14
3 0.31
4 0.36
5 0.13
2.1 (1 point) What is the mean of the above frequency distribution?

A. less than 3
B. at least 3.1 but less than 3.2
C. at least 3.2 but less than 3.3
D. at least 3.3 but less than 3.4
E. at least 3.4
2.2 (1 point) What is the mode of the above frequency distribution?

A. 2 B. 3 C. 4 D. 5 E. None of the above.
2.3 (1 point) What is the median of the above frequency distribution?

A. 2 B. 3 C. 4 D. 5 E. None of the above.
2.4 (1 point) What is the standard deviation of the above frequency distribution?
A. less than 1.1
E. at least 1.4
2.5 (1 point) What is the 80th percentile of the above frequency distribution?
A. 2 B. 3 C. 4 D. 5 E. None of A, B, C, or D.
2.6 (1 point) The number of claims, N, made on an insurance portfolio follows the following
distribution:
n Pr(N=n)
0 0.7
1 0.2
2 0.1
What is the variance of N?
A. less than 0.3
E. at least 0.6
Use the following information for the next 8 questions:

V and X are each given by the result of rolling a six-sided die.
V and X are independent of each other.
Y= V + X.
Z = 2X.
Hint: The mean of X is 3.5 and the variance of X is 35/12.
2.7 (1 point) What is the mean of Y?

A. less than 7.0
E. at least 7.4
2.8 (1 point) What is the mean of Z?

A. less than 7.0
E. at least 7.4
2.9 (1 point) What is the standard deviation of Y?

A. less than 2.0
E. at least 3.2
2.10 (1 point) What is the standard deviation of Z?

A. less than 2.0
E. at least 3.2
2.11 (1 point) What is the probability that Y = 8?

A. less than .10
B. at least .10 but less than .12
C. at least .12 but less than .14
D. at least .14 but less than .16
E. at least .16
2.12 (1 point) What is the probability that Z = 8?

A. less than .10
E. at least .16
2.13 (1 point) What is the probability that X = 5 if Y≥10?

A. less than .30
E. at least .36
2.14 (1 point) What is the expected value of X if Y≥10?

A. less than 5.0
E. at least 5.6
2.15 (3 points) N is uniform and discrete from 0 to b; Prob[N = n] = 1/(b+1), n = 0, 1, 2, ... , b.

N ∧ 10 ≡ Minimum[N, 10].
If E[N ∧ 10] = 0.875 E[N], determine b.
A. 13 B. 14 C. 15 D. 16 E. 17
2.16 (2 points) What is the variance of the following distribution?

Claim Count: 0 1 2 3 4 5 >5
Percentage of Insureds: 60.0% 24.0% 9.8% 3.9% 1.6% 0.7% 0%
A. 0.2 B. 0.4 C. 0.6 D. 0.8 E. 1.0
2.17 (3 points) N is uniform and discrete from 1 to S; Prob[N = n] = 1/S, n = 1, 2, ... , S.

Determine the variance of N, as a function of S.
2.18 (4, 5/88, Q.31) (1 point) The following table represents data observed for a certain class of
insureds. The regional claims office is being set up to service a group of 10,000 policyholders from
this class.
Number of Claims Probability of a Policyholder
n Making n Claims in a Year
0 0.84
1 0.07
2 0.05
3 0.04
If each claims examiner can service a maximum of 500 claims in a year, and you want to staff the
office so that you can handle a number of claims equal to two standard deviations more than the
mean, how many examiners do you need?
A. 5 or less B. 6 C. 7 D. 8 E. 9 or more
2.19 (4B, 11/99, Q.7) (2 points) A player in a game may select one of two fair, six-sided dice.
Die A has faces marked with 1, 2, 3, 4, 5, and 6. Die B has faces marked with 1, 1, 1, 6, 6, and 6.
If the player selects Die A, the payoff is equal to the result of one roll of Die A. If the player selects
Die B, the payoff is equal to the mean of the results of n rolls of Die B.
The player would like the variance of the payoff to be as small as possible.
Determine the smallest value of n for which the player should select Die B.
A. 1 B. 2 C. 3 D. 4 E. 5
2.20 (1, 11/01, Q.32) (1.9 points) The number of injury claims per month is modeled by a random
1
variable N with P[N = n] = , where n ≥ 0.
(n+ 1) (n+ 2)
Determine the probability of at least one claim during a particular month, given
that there have been at most four claims during that month.
(A) 1/3 (B) 2/5 (C) 1/2 (D) 3/5 (E) 5/6
Solutions to Problems:
2.1. D. mean = (0)(.02) + (1)(.04) + (2)(.14) + (3)(.31) + (4)(.36) + (5)(.13) = 3.34.

Comment: Let S(n) = Prob[N > n] = survival function at n.
S(0) = 0.98. S(1) = .94
∞
E[N] = ∑S(i) = .98 + .94 + .80 + .49 + .13 + 0 = 3.34.
0
2.2. C. f(4) = 36% which is the greatest value attained by the probability density function, therefore
the mode is 4.
2.3. B. Since F(2) = 0.2 < 0.5 and F(3) = 0.51 ≥ 0.5 the median is 3.
Number of Claims Probability Distribution
0 2% 2%
1 4% 6%
2 14% 20%
3 31% 51%
4 36% 87%
5 13% 100%
2.4. B. Variance = (second moment) - (mean)2 = 12.4 - 3.342 = 1.244.

Standard Deviation = 1.244 = 1.116.
2.5. C. Since F(3) = 0.51 < 0.8 and F(4) = 0.87 ≥ 0.8, the 80th percentile is 4.
2.6. C. Mean = (.7)(0) + (.2)(1) + (.1)(2) = .4.

Variance = (.7)(0 - .4)2 + (.2)(1 - .4)2 + (.1)(2 - .4)2 = 0.44.
Alternately, Second Moment = (.7)(02 ) + (.2)(12 ) + (.1)(22 ) = .6. Variance = .6 - .42 = 0.44.
2.7. B. E[Y] = E[V + X] = E[V] + E[X] = 3.5 + 3.5 = 7.
2.8. B. E[Z] = E[2X] = 2 E{X] = (2)(3.5) = 7.
2.9. C. Var[Y] = Var[V+X] = Var[V]+V[X] = (35/12)+(35/12) = 35/6 = 5.83.

Standard Deviation[Y] = 5.83 = 2.41.
2.10. E. Var[Z] = Var[2X] = 22 Var[X] = (4)(35/12) = 35/3 = 11.67.

Standard Deviation[Z] = 11.67 = 3.42.
2.11. C. For Y = 8 we have the following possibilities: V =2, X=6; V=3, X=5; V=4;X=4; V=5; X=3;
V=6, X=2. Each of these has a (1/6)(1/6) = 1/36 chance, so the total chance that Y =8 is 5/36 =
0.139.
Comment: The distribution function for Y is:
y 2 3 4 5 6 7 8 9 10 11 12
f(y) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
2.12. E. Z = 8 when X = 4, which has probability 1/6.

Comment: The distribution function for Z is:
z 2 4 6 8 10 12
f(z) 1/6 1/6 1/6 1/6 1/6 1/6
Note that even though Z has the same mean as Y, it has a significantly different distribution function.
This illustrates the difference between adding the result of several independent identically distributed
variables, and just multiplying a single result by a constant. (If the variable has a finite variance), the
Central Limit applies to the prior situation, but not the latter. The sum of N independent dice starts to
look like a Normal Distribution as N gets large.
N times a single die has a flat distribution similar to that of X or Z, regardless of N.
2.13. C. If Y ≥ 10, then we have the possibilities V=4, X=6; V=5, X =5; V=5, X=6;
V=6, X=4; V=6 ,X=5; V=6, X=6. Out of these 6 equally likely probabilities, for 2 of them X=5.
Therefore if Y≥10, there is a 2/6 = 0.333 chance that X = 5.
Alternately, Prob[Y ≥ 10] = 6/36 = 1/6.
Prob[X = 5 ad Y ≥ 10] = 2/36 = 1/18.
Prob[X = 5 | Y ≥ 10] = (1/18) / (1/6) = 1/3.
Comment: This is an example of a conditional distribution.
The distribution of f(x | y ≥10) is:
x 4 5 6
f(x | y ≥10) 1/6 2/6 3/6
The distribution of f(x | y =10) is:
x 4 5 6
f(x | y =10) 1/3 1/3 1/3
2.14. C. The distribution of f(x | y ≥10) is:

x 4 5 6
f(x | y ≥10) 1/6 2/6 3/6
(1/6)(4) + (2/6)(5) + (3/6)(6) = 32 / 6 = 5.33.
2.15 C. E[N] = (0 + 1 + 2 + ... + b)/(b + 1) = {b(b+1)/2}/(b + 1).

For b ≥ 10, E[N ∧ 10] = {0 + 1 + 2 + ... 9 + (b-9)(10)}/(b + 1) = (45 + 10b - 90)/(b + 1).
E[N ∧ 10] = 0.875 E[N]. ⇒ 10b - 45 = .875b(b+1)/2. ⇒ .875b2 - 19.125b + 90 = 0.
b = (19.125 ± 19.1252 - (4)(0.875)(90) )/1.75 = (19.125 ± 7.125)/1.75 = 15 or 6.857.
However, b has to be integer and at least 10, so b = 15.
Comment: The limited expected value is discussed in “Mahlerʼs Guide to Loss Distributions.”
If b = 15, then there are 6 terms that enter the limited expected value as 10:
E[N ∧ 10] = (0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 10 + 10 + 10 +10 + 10}/16 = 105/16.
E[N] = (0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 +13 + 14 + 15}/16 = 15/2.
Their ratio is 0.875.
2.16. E. Mean = 0.652 and the variance = 1.414 - 0.6522 = 0.989.

Number A Priori Number Times Number Squared
of Claims Probability Probability Times Probability
0 0.60000 0.00000 0.00000
1 0.24000 0.24000 0.24000
2 0.09800 0.19600 0.39200
3 0.03900 0.11700 0.35100
4 0.01600 0.06400 0.25600
5 0.00700 0.03500 0.17500
Sum 1 0.652 1.41400
2.17. E[N] = (1 + 2 + ... + S)/S = {S(S+1)/2}/S = (S + 1)/2.

E[N2 ] = (12 + 22 + ... + S2 )/S = {S(S+1)(2S + 1)/6}/S = (S + 1)(2S + 1)/6.
Var[N] = E[N2 ] - E[N]2 = (S + 1)(2S + 1)/6 - {(S + 1)/2}2 = {(S + 1)/12}{2(2S + 1) - 3(S + 1)}
= {(S + 1)/12}(S - 1) = (S2 - 1)/12.
Comment: For S = 6, a six-sided die, Var[N] = 35/12.
2.18. C. The first moment is: (.84)(0) +(.07)(1) +(.05)(2) +(.04)(3) = 0.29.
The 2nd moment is: (.84)(02 ) +(.07)(12 ) +(.05)(22 ) +(.04)(32 ) = 0.63. Thus the variance is:
0.63 - 0.292 = .5459 for a single policyholder. For 10,000 independent policyholders, the variance
of the sum is (10000)(.5459) = 5459. The standard deviation is: 5459 = 73.9.
The mean number of claims is (10000)(.29) = 2900. Adding two standard deviations one gets
3047.8. This requires 7 claims handlers (since 6 can only handle 3000 claims.)
2.19. C. Both Die A and Die B have a mean of 3.5.

The variance of Die A is: (2.52 + 1.52 + 0.52 + 0.52 + 1.52 + 2.52 ) / 6 = 35/12.
The variance of Die B is: 2.52 = 6.25.
The variance of an average of n rolls of Die B is 6.25/n. We want 6.25 /n < 35/12.
Thus n > (6.25)(12/35) = 2.14. Thus the smallest n is 3.
2.20. B. Prob[N ≥ 1 | N ≤ 4] = Prob[1 ≤ N ≤ 4]/Prob[N ≤ 4] =

(1/6 + 1/12 + 1/20 + 1/30)/(1/2 + 1/6 + 1/12 + 1/20 + 1/30) = 20/50 = 2/5.
Comment: For integer a and b, such that 0 < a < b,
k= b-1 ∞
Σ 1/k = (b-a) Σ 1/{(n+a)(n+b)}.

k=a n=0
k=b−1
Therefore, {(b-a)/ ∑ 1 / k }/{(n+a)(n+b)}, n ≥ 0, is a frequency distribution.
k=a
This is a heavy-tailed distribution without a finite mean.

If b = a + 1, then f(n) = a/{(n+a)(n+a+1)}, n ≥ 0.
In this question, a =1, b = 2, and f(n) = 1/{(n+1)(n+2)}, n ≥ 0.
2016-C-1, Frequency Distributions, §3 Binomial Dist. HCM 10/21/15, Page 16
Section 3, Binomial Distribution
Assume one has five independent lives, each of which has a 10% chance of dying over the next
year. What is the chance of observing two deaths? This is given by the product of three factors. The
first is the chance of death to the power two. The second factor is the chance of not dying to the
power 3 = 5 - 2. The final factor is the ways to pick two lives out of five, or the binomial coefficient of:
⎛5⎞ 5!
⎜ ⎟= = 10.
⎝2⎠ 2! 3!
The chance of observing two deaths is:

⎛5⎞
0.12 0.93 ⎜ ⎟ = 7.29%.
⎝2⎠
The chance of observing other numbers of deaths in this case is:
Number Chance Binomial

of Deaths of Observation Coefficient
0 59.049% 1
1 32.805% 5
2 7.290% 10
3 0.810% 10
4 0.045% 5
5 0.001% 1
Sum 1
This is a just an example of a Binomial distribution, for q = 0.1 and m = 5.
For the binomial distribution: f(x) = m! qx (1-q)m-x /{x!(m-x)!} x = 0, 1, 2, 3,...., m.
Note that the binomial density function is only positive for x ≤ m; there are at most m claims. The
Binomial has two parameters m and q. m is the maximum number of claims and q is the chance of
success.8
Written in terms of the binomial coefficient the Binomial density function is:
⎛m⎞
f(x) = ⎜ ⎟ qx (1- q )m-x, x = 0, 1, 2, 3,..., m.
⎝x ⎠
8
I will use the notation in Loss Models and the tables attached to your exam. Many of you are familiar with the
notation in which the parameters for the Binomial Distribution are n and p rather than m and q as in Loss Models.
Bernoulli Distribution:
The Bernoulli is a distribution with q chance of 1 claim and 1-q chance of 0 claims. There are only two
possibilities: either a success or a failure. The Bernoulli is a special case of the Binomial for m = 1.
The mean of the Bernoulli is q. The second moment of the Bernoulli is (02 )(1-q) + (12)q = q.
Therefore the variance is q - q2 = q(1-q).
Binomial as a Sum of Independent Bernoullis:
The example of five independent lives was the sum of five variables each of which was a Bernoulli
trial with chance of a claim 10%. In general, the Binomial can be thought of as the sum of the
results of m independent Bernoulli trials, each with a chance of success q. Therefore, the
sum of two independent Binomial distributions with the same chance of success q, is another
Binomial distribution; if X is Binomial with parameters q and m1 , while Y is Binomial with parameters
q and m2 , then X+Y is Binomial with parameters q and m1 + m2 .
Mean and Variance:
Since the Binomial is a sum of the results of m identical Bernoulli trials, the mean of the Binomial is m
times the mean of a Bernoulli, which is mq.
The mean of the Binomial is mq.
Similarly the variance of a Binomial is m times the variance of the corresponding Bernoulli, which is
mq(1-q).
The variance of a Binomial is mq(1-q).
For the case m = 5 and q = 0.1 presented previously:

Probability x Probability x
Number Probability Probability x Square of Cube of
of Claims Density Function # of Claims # of Claims # of Claims
0 59.049% 0.00000 0.00000 0.00000
1 32.805% 0.32805 0.32805 0.32805
2 7.290% 0.14580 0.29160 0.58320
3 0.810% 0.02430 0.07290 0.21870
4 0.045% 0.00180 0.00720 0.02880
5 0.001% 0.00005 0.00025 0.00125
Sum 1 0.50000 0.70000 1.16000
The mean is: 0.5 = (5)(0.1) = mq.

The variance is: E[X2 ] - E[X]2 = 0.7 - 0.52 = 0.45 = (5)(0.1)(0.9) = mq(1-q).
Properties of the Binomial Distribution:
Since 0 < q < 1: mq(1-q) < mq.

Therefore, the variance of any Binomial is less than its mean.
A Binomial Distribution with parameters m and q, is the sum of m independent Bernoullis, each with
parameter q. Therefore, if one sums independent Binomials with the same q, then one gets
another Binomial, with the same q parameter and the sum of their m parameters.
Exercise: X is a Binomial with q = 0.4 and m = 8. Y is a Binomial with q = 0.4 and m = 22.
Z is a Binomial with q = 0.4 and m = 17. X, Y, and Z are independent of each other.
What form does X + Y + Z have?
[Solution: X + Y + Z is a Binomial with q = 0.4 and m = 8 + 22 + 17 = 47.]
Specifically, the sum of n independent identically distributed Binomial variables, with the same
parameters q and m, is a Binomial with parameters q and nm.
Exercise: X is a Binomial with q = 0.4 and m = 8.

What is the form of the sum of 25 independent random draws from X?
[Solution: A random draw from a Binomial Distribution with q = 0.4 and m = (25)(8) = 200.]
Thus if one had 25 exposures, each of which had an independent Binomial frequency process with
q = 0.4 and m = 8, then the portfolio of 25 exposures has a Binomial frequency process with q = 0.4
and m = 200.
Thinning a Binomial:
If one selects only some of the claims, in a manner independent of frequency, then if all claims are
Binomial with parameters m and q, the selected claims are also Binomial with parameters m and
qʼ = q(expected portion of claims selected).
For example, assume that the number of claims is given by a Binomial Distribution with
m = 9 and q = 0.3. Assume that on average 1/3 of claims are large.
Then the number of large losses is also Binomial, but with parameters m = 9 and q = 0.3/3 = 0.1.
The number of small losses is also Binomial, but with parameters m = 9 and q = (0.3)(2/3) = 0.2.9
9
The number of small and large losses are not independent; in the case of a Binomial they are negatively correlated.
In the case of a Poisson, they are independent.
Binomial Distribution
Support: x = 0, 1, 2, 3, ..., m. Parameters: 1 > q > 0, m ≥ 1. m is integer.
m = 1 is a Bernoulli Distribution.
D. f. : F(x) = 1 - β(x+1, m-x ; q) = β(m-x, x+1 ; 1-q) Incomplete Beta Function
m! qx (1- q)m - x ⎛m⎞ x

P. d. f. : f(x) = = ⎜ ⎟ q (1- q )m-x.
x! (m- x)! ⎝x ⎠
Mean = mq
Variance = mq(1-q) Variance / Mean = 1 - q < 1.
1 - q
Coefficient of Variation = .
mq
1 - 2q
Skewness = .
mq(1 - q)
1 6
Kurtosis = 3 + - .
mq(1 - q) m
Mode = largest integer in mq + q (if mq + q is an integer, then f(mq + q) = f(mq + q- 1)

and both mq + q and mq + q - 1 are modes.)
Probability Generating Function: P(z) = {1 + q(z-1)}m
f(x+1)/f(x) = a + b/(x+1), a = -q/(1-q), b = (m+1)q/(1-q), f(0) = (1-q)m.
Moment Generating Function: M(s) = (qes + 1-q)m

Binomial Distribution with m = 8 and q = 0.7:

Prob.
0.3
0.25
0.2
0.15
0.1
0.05
x
0 2 4 6 8

Prob.
0.3
0.25
0.2
0.15
0.1
0.05
x
0 2 4 6 8

Prob.
0.25
0.2
0.15
0.1
0.05
x
0 2 4 6 8
Binomial Coefficients:
The binomial coefficient of x out of n trials is:
⎛n⎞ n! n(n -1)(n - 2) ... (n +1- x) Γ(n + 1)

⎜ ⎟= = = .
⎝x⎠ x! (n - x)! x(x -1)(x - 2) ... (1) Γ(x + 1) Γ(n+ 1 - x)
Below are some examples of Binomial Coefficients:
n x=0 x=1 x=2 x=3 x=4 x=5 x=6 x=7 x=8 x=9 x=10 x=11
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
5 1 5 10 10 5 1
6 1 6 15 20 15 6 1
7 1 7 21 35 35 21 7 1
8 1 8 28 56 70 56 28 8 1
9 1 9 36 84 126 126 84 36 9 1
10 1 10 45 120 210 252 210 120 45 10 1
11 1 11 55 165 330 462 462 330 165 55 11 1
It is interesting to note that the entries in a row sum to 2n .

For example, 1 + 6 + 15 + 20 + 15 + 6 + 1 = 64 = 26 .
Also note that for x=0 or x=n the binomial coefficient is one.
The entries in a row can be computed from the previous row. For example, the entry 45 in the
row n =10 is the sum of 9 and 36 the two entries above it and to the left. Similarly, 120 = 36+84.
⎛n⎞ ⎛ n ⎞
Note that: ⎜ ⎟ = ⎜ ⎟.
⎝x⎠ ⎝n - x⎠
For example,
⎛11⎞ 11! 39,916,800 11! ⎛11⎞
⎜ ⎟= = = 462 = = ⎜ ⎟.
⎝ 5 ⎠ 5! (11− 5)! (120) (720) 6! (11− 6)! ⎝ 6 ⎠
Using the Functions of the Calculator to Compute Binomial Coefficients:
⎛n⎞
Using the TI-30X-IIS, the binomial coefficient ⎜ ⎟ can be calculated as follows:
⎝i⎠
n
PRB

nCr
Enter
i
Enter
⎛10 ⎞ 10!
For example, in order to calculate ⎜ ⎟ = = 120:
⎝ 3 ⎠ 3! 7!
10
PRB

nCr
Enter
3
Enter
⎛10 ⎞ 10!
Using instead the BA II Plus Professional, in order to calculate ⎜ ⎟ = = 120:
⎝ 3 ⎠ 3! 7!
10
2nd
nCr
3
=
The TI-30XS Multiview calculator saves time doing repeated calculations using the same formula.
For example constructing a table of the densities of a Binomial distribution, with m = 5 and q = 0.1:10
⎛5⎞
f(x) = ⎜ ⎟ 0.1x 0.95-x.
⎝ x⎠
table
y = (5 nCr x) * .1x * .9(5-x)
Enter
Start = 0
Step = 1
Auto
OK
x=0 y = 0.59049
x=1 y = 0.32805
x=2 y = 0.07290
x=3 y = 0.00810
x=4 y = 0.00045
x=5 y = 0.00001
10
Note that to get Binomial coefficients hit the prb key and select nCr.
Relation to the Beta Distribution:
The binomial coefficient looks almost like 1 over a complete Beta function.11
The incomplete Beta distribution for integer parameters can be used to compute the sum of terms
from the Binomial Distribution.12
i=a+b-1⎛
a + b -1⎞ i
β(a,b;x) = ∑ ⎜ ⎟ x (1- x)a + b - (i + 1) .
i= a ⎝ i ⎠
i=14 ⎛
14 ⎞
For example, β(6, 9; 0.3) = 0.21948 = ∑ ⎜⎝ i ⎟⎠ 0.3i 0.7 14 - i .
i=6
By taking appropriate differences of two Betas one can get any sum of binomials terms.
For example:
⎛ n⎞
⎜ ⎟ q a (1-q)n-a = β(a, n-(a-1) ; q) - β(a+1. n-a ; q).
⎝a⎠
⎛10 ⎞
For example, ⎜ ⎟ 0.23 0.87 = (120) 0.23 0.87 = 0.20133 = β(3, 8 ; 0.2) - β(4, 7; 0.2)
⎝3⎠
β(a, b ; x) = 1 - β(b, a ; 1-x) = F2a, 2b [bx / {a(1-x)}] where F is the distribution function of the
F-distribution with 2a and 2b degrees of freedom.
For example, β(4,7; .607) = .950 = F8,14 [ (7)(.607) /{(4)(.393)} ] = F8,14 [2.70].13
11
The complete Beta Function is defined as Γ(a)Γ(b) / Γ(a+b).
It is the divisor in front of the incomplete Beta function and is equal to the integral from 0 to 1 of xa-1(1-x)b-1.
12
For a discussion of the Beta Distribution, see “Mahlerʼs Guide to Loss Distributions”. On the exam you should
either compute the sum of binomial terms directly or via the Normal Approximation. Note that the use of the Beta
Distribution is an exact result, not an approximation. See for example the Handbook of Mathematical Functions, by
Abramowitz, et. al.
13
If one did an F-Test with 8 and 14 degrees of freedom, then there would be a 5% chance that the value exceeds
2.7.
Problems:
Use the following information for the next seven questions:

One observes 9 independent lives, each of which has a 20% chance of death over the coming
year.
3.1 (1 point) What is the mean number of deaths over the coming year?
A. less than 1.8
E. at least 2.1
3.2 (1 point) What is the variance of the number of deaths observed over the coming year?
A. less than 1.5
E. at least 1.8
3.3 (1 point) What is the chance of observing 4 deaths over the coming year?
A. less than 7%
B. at least 7% but less than 8%
C. at least 8% but less than 9%
D. at least 9% but less than 10%
E. at least 10%
3.4 (1 point) What is the chance of observing no deaths over the coming year?
A. less than 13%
E. at least 16%
3.5 (3 points) What is the chance of observing 6 or more deaths over the coming year?
A. less than .1%
B. at least .1% but less than .2%
C. at least .2% but less than .3%
D. at least .3% but less than .4%
E. at least .4%
3.6 (1 point) What is the median number of deaths per year?

A. 0 B. 1 C. 2 D. 3 E. None of A, B, C, or D
3.7 (1 point) What is the mode of the distribution of deaths per year?
3.8 (1 point) Assume that each year that Joe starts alive, there is a 20% chance that he will die over
the coming year. What is the chance that Joe will die over the next 5 years?
A. less than 67%
E. at least 70%
3.9 (2 points) One insures 10 independent lives for 5 years. Assume that each year that an insured
starts alive, there is a 20% chance that he will die over the coming year.
What is the chance that 6 of these 10 insureds will die over the next 5 years?
A. less than 20%
E. at least 23%
3.10 (1 point) You roll 13 six-sided dice. What is the chance of observing exactly 4 sixes?
A. less than 10%
E. at least 13%
3.11 (1 point) You roll 13 six-sided dice. What is the average number of sixes observed?
A. less than 1.9
E. at least 2.2
3.12 (1 point) You roll 13 six-sided dice.

What is the mode of the distribution of the number of sixes observed?
3.13 (3 point) You roll 13 six-sided dice.

What is the median of the distribution of the number of sixes observed?
3.14 (1 point) You roll 13 six-sided dice. What is the variance of the number of sixes observed?
A. less than 1.9
E. at least 2.2
3.15 (2 point) The number of losses is Binomial with q = 0.4 and m = 90.
The sizes of loss are Exponential with mean 50, F(x) = 1 - e-x/50.
The number of losses and the sizes of loss are independent.
What is the probability of seeing exactly 3 losses of size greater than 100?
A. 9% B. 11% C. 13% D. 15% E. 17%
3.16 (2 points) Total claim counts generated from Policy A follow a Binomial distribution with
parameters m = 2 and q = 0.1. Total claim counts generated from Policy B follow a Binomial
distribution with parameters m = 2 and q = 0.6. Policy A is independent of Policy B.
For the two policies combined, what is the probability of observing 2 claims in total?
A. 32% B. 34% C. 36% D. 38% E. 40%
3.17 (2 points) Total claim counts generated from a portfolio follow a Binomial distribution with
parameters m = 9 and q = 0.1. Total claim counts generated from another independent portfolio
follow a Binomial distribution with parameters m = 15 and q = 0.1.
For the two portfolios combined, what is the probability of observing exactly 4 claims in total?
A. 11% B. 13% C. 15% D. 17% E. 19%
3.18 (3 points) The number of losses follows a Binomial distribution with m = 6 and q = 0.4.
Sizes of loss follow a Pareto Distribution with α = 4 and θ = 50,000.
There is a deductible of 5000, and a coinsurance of 80%.
Determine the probability that there are exactly two payments of size greater than 10,000.
A. 11% B. 13% C. 15% D. 17% E. 19%
Use the following information for the next two questions:

• A state holds a lottery once a week.
• The cost of a ticket is 1.
• 1,000,000 tickets are sold each week.
• The prize is 1,000,000.
• The chance of each ticket winning the prize is 1 in 1,400,000, independent of any other ticket.
• In a given week, there can be either no winner, one winner, or multiple winners.
• If there are multiple winners, each winner gets a 1,000,000 prize.
• The lottery commission is given a reserve fund of 2,000,000 at the beginning of the year.
• In any week where no prize is won, the lottery commission sends its receipts of 1 million to the
state department of revenue.
• In any week in which prize(s) are won, the lottery commission pays the prize(s) from receipts
and if necessary the reserve fund.
• If any week there is insufficient money to pay the prizes, the lottery commissioner must call
the governor of the state, in order to ask the governor to authorize the state department of
revenue to provide money to pay owed prizes and reestablish the reserve fund.
3.19 (3 points) What is the probability that the lottery commissioner has to call the governor the first
week?
A. 0.5% B. 0.6% C. 0.7% D. 0.8% E. 0.9%
3.20 (4 points) What is the probability that the lottery commissioner does not have to call the
governor the first year (52 weeks)?
A. 0.36% B. 0.40% C. 0.44% D. 0.48% E. 0.52%
3.21 (3 points) The number of children per family follows a Binomial Distribution m = 4 and q = 0.5.
For a child chosen at random, how many siblings (brothers and sisters) does he have on average?
A. 1.00 B. 1.25 C. 1.50 D. 1.75 E. 2.00
3.22 (2, 5/85, Q.2) (1.5 points) Suppose 30 percent of all electrical fuses manufactured by a
certain company fail to meet municipal building standards. What is the probability that in a random
sample of 10 fuses, exactly 3 will fail to meet municipal building standards?
⎛10 ⎞ ⎛10 ⎞
A. ⎜ ⎟ (0.37 ) (0.73 ) B. ⎜ ⎟ (0.33 ) (0.77 ) C. 10 (0.33 ) (0.77 )
⎝3⎠ ⎝3⎠
3⎛10 ⎞
D. ∑ ⎜ ⎟ (0.3 i) (0.7 10− i) E. 1
i=0
⎝i ⎠
3.23 (160, 11/86, Q.14) (2.1 points) In a certain population 40p 25 = 0.9.
From a random sample of 100 lives at exact age 25, the random variable X is the number of lives
who survive to age 65. Determine the value one standard deviation above the mean of X.
(A) 90 (B) 91 (C) 92 (D) 93 (E) 94
3.24 (160, 5/91, Q.14) (1.9 points)

From a study of 100 independent lives over the interval (x, x+1], you are given:
(i) The underlying mortality rate, qx, is 0.1.
(ii) lx+s is linear over the interval.
(iii) There are no unscheduled withdrawals or intermediate entrants.
(iv) Thirty of the 100 lives are scheduled to end observation, all at age x + 1/3.
(v) Dx is the random variable for the number of observed deaths.
Calculate Var(Dx).
(A) 6.9 (B) 7.0 (C) 7.1 (D) 7.2 (E) 7.3
3.25 (2, 2/96, Q.10) (1.7 points) Let X1 , X2 , and X3 , be independent discrete random variables
with probability functions
⎛ni⎞
P[Xi = k] = ⎜ ⎟ pk (1−p)ni − k for i = 1, 2, 3, where 0 < p < 1.
⎝k ⎠
Determine the probability function of S = X1 + X2 + X3 , where positive.
⎛n1 + n2 + n3 ⎞
A. ⎜ ⎟ ps (1− p)n 1 + n2 + n 3 - s
⎝ s ⎠
3 ⎛ ni⎞
B. ∑ n + nni + n3 ⎜ ⎟ ps (1− p)ni - s
⎝s ⎠
i=1 1 2
3 ⎛n i⎞
C. ∏ ⎜⎜ ⎟⎟ p s (1− p)n i - s
i = 1⎝s ⎠
3 ⎛n ⎞
D. ∑ ⎜⎝s i⎟⎠ ps (1− p)ni -
i=1
⎛n1 n2 n3 ⎞ n
E. ⎜ ⎟ ps (1− p)n1 n2 3 - s
⎝ s ⎠
3.26 (2, 2/96, Q.44) (1.7 points) The probability that a particular machine breaks down on any day
is 0.2 and is independent of the breakdowns on any other day.
The machine can break down only once per day.
Calculate the probability that the machine breaks down two or more times in ten days.
A. 0.0175 B. 0.0400 C. 0.2684 D. 0.6242 E. 0.9596
3.27 (4B, 11/96, Q.23) (2 points) Two observations are made of a random variable having a
binomial distribution with parameters m = 4 and q = 0.5.
Determine the probability that the sample variance is zero.
A. 0
B. Greater than 0, but less than 0.05
C. At least 0.05, but less than 0.15
D. At least 0.15, but less than 0.25
E. At least 0.25
3.28 (Course 1 Sample Exam, Q.40) (1.9 points) A small commuter plane has 30 seats.
The probability that any particular passenger will not show up for a flight is 0.10, independent of
other passengers. The airline sells 32 tickets for the flight. Calculate the probability that more
passengers show up for the flight than there are seats available.
A. 0.0042 B. 0.0343 C. 0.0382 D. 0.1221 E. 0.1564
3.29 (1, 5/00, Q.40) (1.9 points)

A company prices its hurricane insurance using the following assumptions:
(i) In any calendar year, there can be at most one hurricane.
(ii) In any calendar year, the probability of a hurricane is 0.05 .
(iii) The number of hurricanes in any calendar year is independent of the number of
hurricanes in any other calendar year.
Using the companyʼs assumptions, calculate the probability that there are fewer
than 3 hurricanes in a 20-year period.
(A) 0.06 (B) 0.19 (C) 0.38 (D) 0.62 (E) 0.92
3.30 (1, 5/01, Q.13) (1.9 points) A study is being conducted in which the health of two
independent groups of ten policyholders is being monitored over a one-year period of time.
Individual participants in the study drop out before the end of the study with probability 0.2
(independently of the other participants). What is the probability that at least 9 participants complete
the study in one of the two groups, but not in both groups?
(A) 0.096 (B) 0.192 (C) 0.235 (D) 0.376 (E) 0.469
3.31 (1, 5/01, Q.37) (1.9 points) A tour operator has a bus that can accommodate 20 tourists. The
operator knows that tourists may not show up, so he sells 21 tickets. The probability that an
individual tourist will not show up is 0.02, independent of all other tourists.
Each ticket costs 50, and is non-refundable if a tourist fails to show up. If a tourist shows
up and a seat is not available, the tour operator has to pay 100, the ticket cost plus a penalty of 50,
to the tourist. What is the expected revenue of the tour operator?
(A) 935 (B) 950 (C) 967 (D) 976 (E) 985
3.32 (1, 11/01, Q.27) (1.9 points) A company establishes a fund of 120 from which it wants to
pay an amount, C, to any of its 20 employees who achieve a high performance level during the
coming year. Each employee has a 2% chance of achieving a high performance level during the
coming year, independent of any other employee.
Determine the maximum value of C for which the probability is less than 1% that the
fund will be inadequate to cover all payments for high performance.
(A) 24 (B) 30 (C) 40 (D) 60 (E) 120
3.33 (CAS3, 11/03, Q.14) (2.5 points) The Independent Insurance Company insures 25 risks,
each with a 4% probability of loss. The probabilities of loss are independent.
On average, how often would 4 or more risks have losses in the same year?
A. Once in 13 years
B. Once in 17 years
C. Once in 39 years
D. Once in 60 years
E. Once in 72 years
3.34 (CAS3, 11/04, Q.22) (2.5 points) An insurer covers 60 independent risks.
Each risk has a 4% probability of loss in a year.
Calculate how often 5 or more risks would be expected to have losses in the same year.
A. Once every 3 years
B. Once every 7 years
C. Once every 11 years
D. Once every 14 years
E. Once every 17 years
3.35 (CAS3, 11/04, Q.24) (2.5 points) A pharmaceutical company must decide how many
experiments to run in order to maximize its profits.
• The company will receive a grant of $1 million if one or more of its experiments is successful.
• Each experiment costs $2,900.
• Each experiment has a 2% probability of success, independent of the other experiments.
• All experiments are run simultaneously.
• Fixed expenses are $500,000.
• Ignore investment income.
The company performs the number of experiments that maximizes its expected profit.
Determine the company's expected profit before it starts the experiments.
A. 77,818 B. 77,829 C. 77,840 D. 77,851 E. 77,862
3.36 (SOA3, 11/04, Q.8 & 2009 Sample Q.124) (2.5 points)
For a tyrannosaur with a taste for scientists:
(i) The number of scientists eaten has a binomial distribution with q = 0.6 and m = 8.
(ii) The number of calories of a scientist is uniformly distributed on (7000, 9000).
(iii) The numbers of calories of scientists eaten are independent, and are independent of
the number of scientists eaten.
Calculate the probability that two or more scientists are eaten and exactly two of those eaten
have at least 8000 calories each.
(A) 0.23 (B) 0.25 (C) 0.27 (D) 0.30 (E) 0.3
3.37 (CAS3, 5/05, Q.15) (2.5 points) A service guarantee covers 20 television sets.
Each year, each set has a 5% chance of failing. These probabilities are independent.
If a set fails, it is replaced with a new set at the end of the year of failure.
This new set is included under the service guarantee.
Calculate the probability of no more than 1 failure in the first two years.
A. Less than 40.5%
B. At least 40.5%, but less than 41.0%
C. At least 41.0%, but less than 41.5%
D. At least 41.5%, but less than 42.0%
E. 42.0% or more
3.1. B. Binomial with q =0 .2 and m = 9. Mean = (9)(.2) = 1.8.
3.2. A. Binomial with q = 0.2 and m = 9. Variance = (9)(.2)(1-.2) = 1.44.
3.3. A. Binomial with q = 0.2 and m = 9. f(4) = 9!/(4! 5!) .24 .85 = 6.61%.
3.4. B. Binomial with q = 0.2 and m = 9. f(0) = 9!/(0! 9!) .20 .89 = 13.4%.
3.5. D. Binomial with q = 0.2 and m = 9.

The chance of observing different numbers of deaths is:
0 13.4218% 1
1 30.1990% 9
2 30.1990% 36
3 17.6161% 84
4 6.6060% 126
5 1.6515% 126
6 0.2753% 84
7 0.0295% 36
8 0.0018% 9
9 0.0001% 1
Adding the chances of having 6, 7, 8, or 9 claims the answer is 0.307%.
Alternately one can add the chances of having 0, 1, 2, 3, 4 or 5 claims and subtract this sum from
unity.
Comment: Although you should not do so for the exam, one could also answer this question using
the Incomplete Beta Function. The chance of more than x claims is β(x+1, m-x; q).
The chance of more than 5 claims is: β(5+1, 9-5; .2) = β(6, 4; .2) = 0.00307.
3.6. C. For a discrete distribution such as we have here, employ the convention that the median is
the first value at which the distribution function is greater than or equal to .5.
F(1) = 0.134 + 0.302 = 0.436 < 50%, F(2) = 0.134 + 0.302 + 0.302 = 0.738 > 50%,
and therefore the median is 2.
3.7. E. The mode is the value at which f(n) is a maximum; f(1) = .302 = f(2) and both 1 and 2 are
modes. Alternately, in general for the Binomial the mode is the largest integer in mq + q; the largest
integer in 2 is 2, but when mq + q is an integer both it and the integer one less are modes.
Comment: This is a somewhat unfair question. While it seems to me that E is the best single
answer, one could also argue for B or C. If you are unfortunate enough to have an apparently unfair
question on your exam, do not let it upset you while taking the exam.
3.8. B. The chance that Joe is alive at the end of 5 years is (1-.2)5 = .32768. Therefore, the chance
that he died is 1 - 0.32768 = 0.67232.
3.9. D. Based on the solution of the previous problem, for each life the chance of dying during the
five year period is 0.67232. Therefore, the number of deaths for the 10 independent lives is
Binomial with m = 10 and q = 0.67232.
f(6) = ((10!)/{(6!)(4!)}) (0.672326 ) (0.327684 ) = (210)(0.0924)(0.01153) = 0.224.
The chances of other numbers of deaths are as follows:
0 0.001% 1
1 0.029% 10
2 0.270% 45
3 1.479% 120
4 5.312% 210
5 13.078% 252
6 22.360% 210
7 26.216% 120
8 20.171% 45
9 9.197% 10
10 1.887% 1
Sum 1 1024
3.10. B. The chance of observing a six on an individual six-sided die is 1/6. Assuming the results
of the dice are independent, one has a Binomial distribution with q =1/6 and m =13.
f(4) = 13!/(4! 9!) (1/6)4 (5/6)9 =10.7%.
3.11. D, 3.12. B, & 3.13. B. Binomial with q =1/6 and m =13. Mean = (1/6)(13) = 2.17.
For the Binomial the mode is the largest integer in mq + q = (13)(1/6) + (1/6) = 2.33; the largest
integer in 2.33 is 2. Alternately compute all of the possibilities and 2 is the most likely.
F(1) = .336 <.5 and F(2) = .628 ≥ .5, therefore the median is 2.
Number Chance Binomial Cumulative
of Deaths of Observation Coefficient Distribution
0 9.3463879% 1 9.346%
1 24.3006085% 13 33.647%
2 29.1607302% 78 62.808%
3 21.3845355% 286 84.192%
4 10.6922678% 715 94.885%
5 3.8492164% 1287 98.734%
6 1.0264577% 1716 99.760%
7 0.2052915% 1716 99.965%
8 0.0307937% 1287 99.996%
9 0.0034215% 715 100.000%
10 0.0002737% 286 100.000%
11 0.0000149% 78 100.000%
12 0.0000005% 13 100.000%
13 0.0000000% 1 100.000%
Sum 1 8192
3.14. A. Binomial with q =1/6 and m =13. Variance = (13)(1/6)(1-1/6) = 1.806.
3.15. D. S(100) = e-100/50 = .1353. Therefore, thinning the original Binomial, the number of large
losses is Binomial with m = 90 and q = (.1353)(.4) = (.05413).
f(3) = {(90)(89)(88)/3!} (.054133 )(1 - .05413)87 = .147.
3.16. D. Prob[2 claims in total] =

Prob[A = 0]Prob[B = 2] + Prob[A= 1]Prob[ B = 1] + Prob[B = 2]Prob[A = 0] =
(.92 )(.62 ) + {(2)(.1)(.9)}{(2)(.6)(.4)} + (.12 )(.42 ) = 37.96%.
Comment: The sum of A and B is not a Binomial distribution, since their q parameters differ.
3.17. B. For the two portfolios combined, total claim counts follow a Binomial distribution with
parameters m = 9 + 15 = 24 and q = 0.1.
⎛ 24⎞
f(4) = ⎜ ⎟ q4 (1-q)20 = {(24)(23)(22)(21)/4!}(.14 )(.920) = 12.9%.
⎝4⎠
3.18. B. A payment is of size greater than 10,000 if the loss is of size greater than:
10000/.8 + 5000 = 17,500.
Probability of a loss of size greater than 17,500 is: {50/(50 + 17.5)}4 = 30.1%.
The large losses are Binomial with m = 6 and q = (.301)(0.4) = 0.1204.
⎛6⎞
f(2) = ⎜ ⎟ .12042 (1 - .1204)4 = 13.0%.
⎝2⎠
Comment: An example of thinning a Binomial.
3.19. B. The number of prizes is Binomial with m = 1 million and q = 1/1,400,000.

f(0) = (1 - 1/1400000)1000000 = 48.95%.
f(1) = 1000000(1 - 1/1400000)999999 (1/1400000) = 34.97%.
f(2) = {(1000000)(999999)/2}(1 - 1/1400000)999998 (1/1400000)2 = 12.49%.
f(3) = {(1000000)(999999)(999998)/6}(1 - 1/1400000)999997 (1/1400000)3 = 2.97%.
n f(n)
0 48.95%
1 34.97%
2 12.49%
3 2.97%
4 0.53%
5 0.08%
6 0.01%
Sum 100.00%
The first week, the lottery has enough money to pay 3 prizes,
(1 million in receipts + 2 million in the reserve fund.)
The probability of more than 3 prizes is: 1 - (48.95% + 34.97% + 12.49% + 2.97%) = 0.62%.
3.20. C. Each week there is a .4895 + .3497 = .8392 chance of no need for the reserve fund.
Each week there is a .1249 chance of a 1 million need from the reserve fund.
Each week there is a .0297 chance of a 2 million need from the reserve fund.
Each week there is a .0062 chance of a 3 million or more need from the reserve fund.
The governor will be called if there is at least 3 weeks with 2 prizes each (since each such week
depletes the reserve fund by 1 million), or if there is 1 week with 2 prizes plus 1 week with 3 prizes,
or if there is a week with 4 prizes.
Prob[Governor not called] = Prob[no weeks with more than 1 prize] +
Prob[1 week @2, no weeks more than 2] + Prob[2 weeks @2, no weeks more than 2] +
Prob[0 week @2, 1 week @3, no weeks more than 3] =
(.839252) + (52)(.1249)(.839251) + ((52)(51)/2)(.12492 )(.839250) + (52)(.0297)(.839251) =
.00011 + .00085 + .00323 + .00020 = 0.00439.
Comment: The lottery can not keep receipts from good weeks in order to build up the reserve fund.
3.21. C. Let n be the number of children in a family.

The probability that the child picked is in a family of size n is proportional to the product of the size of
family and the proportion of families of that size: n f(n).
Thus, Prob[child is in a family of size n] = n f(n) / Σ n f(n) = n f(n) / E[N].
For n > 0, the number of siblings the child has is n - 1.
∑ n f(n) (n- 1) ∑ (n2 - n) f(n)

1 1 E[N2 ] - E[N]
Thus the mean number of siblings is: = = =
E[N] E[N] E[N]
E[N2 ] Var[N] + E[N]2 Var[N] mq(1- q)

-1= -1= + E[N] - 1 = + mq - 1 = 1 - q + mq - 1
E[N] E[N] E[N] mq
= (m - 1)q = (3)(0.5) = 1.5.

Alternately, assume for example 10,000 families.
Number Binomial Number Number Number Product of # of Children
of children Density of Families of Children of Siblings Times # of Siblings
0 0.0625 625 0 0 0
1 0.2500 2,500 2,500 0 0
2 0.3750 3,750 7,500 1 7,500
3 0.2500 2,500 7,500 2 15,000
4 0.0625 625 2,500 3 7,500
Total 1.0000 10,000 20,000 6 30,000
Mean of number of siblings for a child chosen at random is: 30,000 / 20,000 = 1.5.
Comment: The average size family has two children; each of these children has one sibling.
However, a child chosen at random is more likely to be from a large family.
For example, suppose we only had two families, one with one child and the other with 3 children.
Then by picking children at random, there is a 75% chance of picking a child from the second family.
⎛10 ⎞
3.22. B. A Binomial Distribution with m = 10 and q = 0.3. f(3) = ⎜ ⎟ (0.33 ) (0.77 ).
⎝3⎠
3.23. D. The number of people who survive is Binomial with m = 100 and q = 0.9.
Mean = (100)(0.9) = 90. Variance = (100)(0.9)(0.1) = 9. Mean + Standard Deviation = 93.
3.24. E. We have thirty lives, and for each one the chance of dying while we observe is 10%/3,
since we observe them for only for 1/3 of a year. We have seventy lives, and for each one the
chance of dying while we observe is 10%, since we observe them each for a full year.
The number of deaths is sum of two Binomials, one with m = 30 and q = 0.1/3, and the other with
m = 70 and q = 0.1.
The sum of their variances is: (30)(0.03333)(1 - 0.03333) + (70)(0.1)(0.9) = 0.967 + 6.3 = 7.267.
3.25. A. Each Xi is Binomial with parameters ni and p.

The sum is Binomial with parameters n1 + n2 + n3 and p.
3.26. D. Binomial with m = 10 and q = 0.2. 1 - f(0) - f(1) = 1 - .810 - (10)(.2)(.89 ) = 0.624.
3.27. E. The sample variance is the average squared deviation from the mean; thus the sample
variance is positive unless all the observations are equal. In this case, the sample variance is zero if
and only if the two observations are equal. For this Binomial the chance of observing a given
number of claims is:
number of claims: 0 1 2 3 4
probability: 1/16 4/16 6/16 4/16 1/16
Thus the chance that the two observations are equal is:
(1/16)2 + (4/16)2 + (6/16)2 + (4/16)2 + (1/16)2 = 70/256 = .273.
Comment: For example, the chance of 3 claims is: ((m!)/{(3!)((m-3)!)}) q3 (1-q) =
((4!)/{(3!)((1!)}) .53 (1-.5) = 4/16.
3.28. E. The number of passengers that show up for a flight is Binomial with m = 32 and
q = 0.90. Prob[more show up than seats] = f(31) + f(32) = 32(0.1)(0.931) + 0.932 = 0.1564.
3.29. E. The number of hurricanes is Binomial with m = 20 and q = 0.05.

Prob[< 3 hurricanes] = f(0) + f(1) + f(2) = 0.9520 + 20(0.05)(0.9519) + 190(0.052 )(0.9518) =
0.9245.
3.30. E. Each group is Binomial with m = 10 and q = 0.8.

Prob[at least 9 complete] = f(9) + f(10) = 10(.2)(0.89 ) + 0.810 = 0.376.
Prob[one group has at least 9 and one group does not] = (2)(.376)(1 - 0.376) = 0.469.
3.31. E. The bus driver collects (21)(50) = 1050 for the 21 tickets he sells. However, he may be
required to refund 100 to one passenger if all 21 ticket holders show up. The number of tourists who
show up is Binomial with m = 21 and q = 0.98.
Expected penalty is: 100 f(21) = 100(0.9821) = 65.425.
Expected revenue is: (21)(50) - 65.425 = 984.6.
3.32. D. The fund will be inadequate if there are more than 120/C payments.
The number of payments is Binomial with m = 20 and q = .02.
x f F
0 0.66761 0.66761
1 0.27249 0.94010
2 0.05283 0.99293
3 0.00647 0.99940
There is a 1 - .94010 = 5.990% chance of needing more than one payment.
There is a 1 - .992930 = 0.707% chance of needing more than two payments.
Thus we need to require that the fund be able to make two payments. 120/C = 2. ⇒ C = 60.
3.33. D. This is the sum of 25 independent Bernoullis, each with q = .04.

The number of losses per year is Binomial with m = 25 and q = .04.
f(0) = (1 - q)m = (1 - 0.04)25 = 0.3604.
f(1) = mq(1 - q)m-1 = (25)(0.04)(1 - 0.04)24 = 0.3754.
f(2) = {m(m-1)/2!}q2 (1 - q)m-2 = (25)(24/2)(0.042 )(1 - 0.04)23 = 0.1877.
f(3) = {m(m-1)(m-2)/3!}q3 (1 - q)m-3 = (25)(24)(23/6)(0.043 )(1 - 0.04)22 = 0.0600.
Prob[at least 4] = 1 - {f(0) + f(1) + f(2) + f(3)} = 1 - 0.9835 = 0.0165.
4 or more risks have losses in the same year on average once in: 1/0.0165 = 60.6 years.
3.34. C. A Binomial Distribution with m = 60 and q = .04.

f(0) = 0.9660 = 0.08635. f(1) = (60)(0.04)0.9659 = 0.21588.
f(2) = {(60)(59)/2}(0.042 )0.9658 = 0.26535. f(3) = {(60)(59)(58)/6}(0.043 )0.9657 = 0.21376.
f(4) = {(60)(59)(58)(57)/24}(0.044 )0.9656 = 0.12692.
1 - f(0) - f(1) - f(2) - f(3) - f(4) = 1 - 0.08635 - 0.21588 - 0.26535 - 0.21376 - 0.12692 = 0.09174.
1/0.09174 = Once every 11 years.
3.35. A. Assume n experiments are run. Then the probability of no successes is 0.98n .
Thus the probability of at least one success is: 1 - 0.98n .
Expected profit is:
(1,000,000)(1 - 0.98n ) - 2900n - 500,000 = 500,000 - (1,000,000)0.98n - 2900n.
Setting the derivative with respect to n equal to zero:
0 = -ln(0.98)(1,000,000)0.98n - 2900. ⇒ 0.98n = .143545. ⇒ n = 96.1.
Taking n = 96, the expected profit is 77,818.
Comment: For n = 97, the expected profit is 77,794.
3.36. D. (9000 - 8000)/(9000 - 7000) = 1/2. Half the scientists are “large”.
Therefore, thinning the original Binomial, the number of large scientist is Binomial with m = 8 and
q = 0.6/2 = 0.3. f(2) = {(8)(7)/2} (.76 )(.32 ) = 0.2965.
Alternately, this is a compound frequency distribution, with primary distribution a Binomial with
q = 0.6 and m = 8, and secondary distribution a Bernoulli with q = 1/2 (half chance a scientist is large.)
One can use the Panjer Algorithm. For the primary Binomial Distribution,
a = -q/(1-q) = -.6/.4 = -1.5. b = (m+1)q/(1-q) = (9)(1.5) = 13.5. P(z) = {1 + q(z-1)}m.
c(0) = Pp (s(0)) = Pp (.5) = {1 + (.6)(.5-1)}8 = .057648.
x x
∑ (a + jb / x) s(j) c(x - j) = (1/1.75) ∑ (-1.5 + j13.5 / x) s(j) c(x - j).

1
c(x) =
1 - a s(0)
j=1 j=1
c(1) = (1/1.75)(-1.5 + 13.5) s(1) c(0) = (1/1.75)(12)(1/2)(.057648) = .197650.

c(2) = (1/1.75){(-1.5 + 13.5/2) s(1) c(1) + (-1.5 + (2)13.5/2) s(2) c(0)} =
(1/1.75){(5.25)(1/2)(.197650) + (12)(0)(.057648)} = 0.296475.
Alternately, one can list all the possibilities:
Number Binomial Given the number of Extension
of Probability Scientist, the Probability
Scientist that exactly two are large
0 0.00066 0 0.00000
1 0.00786 0 0.00000
2 0.04129 0.25 0.01032
3 0.12386 0.375 0.04645
4 0.23224 0.375 0.08709
5 0.27869 0.3125 0.08709
6 0.20902 0.234375 0.04899
7 0.08958 0.1640625 0.01470
8 0.01680 0.109375 0.00184
Sum 1.00000 0.29648
For example if 6 scientist have been eaten, then the chance that exactly two of them are large is:
(0.56 ) 6! / (4!2!) = 0.234375. In algebraic form, this solution is:
n=8 n=8
∑ ∑ (n- 2)! (8 - n)! 0.3n 0.48 - n

8! n! 8!
0.6n 0.48 - n 0.5n = (1/2)
n! (8 - n)! 2! (n- 2)!
n=2 n=2
i=6
∑ i! (6 - i)! 0.3i 0.46 - i = (28)(0.09)(0.3 + 0.4)6 = 0.2965.

6!
= (1/2)(8)(7)(0.32 )
i=0
Comment: The Panjer Algorithm (Recursive Method) is discussed in

“two or more scientists are eaten and exactly two of those eaten have at least 8000 calories each”
⇔ exactly two “large” scientists are eaten as well as some unknown number of “small” scientists.
At least 2 claims of which exactly two are large.
⇔ exactly 2 large claims and some unknown number of small claims.
3.37. A. One year is Binomial Distribution with m = 20 and q = 0.05.

The years are independent of each other.
Therefore, the number of failures over 2 years is Binomial Distribution with m = 40 and q = 0.05.
Prob[0 or 1 failures] = 0.9540 + (40)(0.9539)(0.05) = 39.9%.
Comment: In this question, when a TV fails it is replaced. Therefore, we can have a failure in both
years for a given customer. A somewhat different question than asked would be,
assuming each customer owns one set, calculate the probability that no more than one customer
suffers a failure during the two years.
For a given customer, the probability of no failure in the first two years is: 0.952 = 0.9025.
The probability of 0 or 1 customer suffering a failure is:
0.902520 + (20)(0.0975)(0.902519) = 40.6%.
2016-C-1, Frequency Distributions, §4 Poisson Dist. HCM 10/21/15, Page 42
Section 4, Poisson Distribution
The Poisson Distribution is the single most important frequency distribution to study for the exam.14
The density function for the Poisson is:
f(x) = λx e−λ / x!, x ≥ 0.
Note that unlike the Binomial, the Poisson density function is positive for all x ≥ 0; there is no limit on
the possible number of claims. The Poisson has a single parameter λ.
The Distribution Function is 1 at infinity since λx / x! is the series for eλ .
For example, hereʼs a Poisson for λ = 2.5:

n 0 1 2 3 4 5 6 7 8 9 10
f(n) 0.082 0.205 0.257 0.214 0.134 0.067 0.028 0.010 0.003 0.001 0.000
F(n) 0.082 0.287 0.544 0.758 0.891 0.958 0.986 0.996 0.999 1.000 1.000
Prob.
0.25
0.2
0.15
0.1
0.05
x
0 2 4 6 8 10
For example, the chance of 4 claims is: f(4) = λ4 e−λ / 4! = 2.54 e-2.5 / 4! = .1336.
Remember, there is a small chance of a very large number of claims.
For example, f(15) = 2.515 e-2.5 / 15! = 6 x 10-8.
Such large numbers of claims can contribute significantly to the higher moments of the distribution.
14
The Poisson comes up among other places in the Gamma-Poisson frequency process, to be discussed in a
subsequent section.
Letʼs calculate the first two moments for this Poisson distribution with λ = 2.5:
Probability
Probability
x x
Number Probability Probability x Square ofCube ofDistribution
of Claims Density Function # of Claims # of Claims
# of Claims Function
0 0.08208500 0.00000000 0.00000000 0.08208500
1 0.20521250 0.20521250 0.20521250 0.28729750
2 0.25651562 0.51303124 1.02606248 0.54381312
3 0.21376302 0.64128905 1.92386716 0.75757613
4 0.13360189 0.53440754 2.13763017 0.89117802
5 0.06680094 0.33400471 1.67002357 0.95797896
6 0.02783373 0.16700236 1.00201414 0.98581269
7 0.00994062 0.06958432 0.48709021 0.99575330
8 0.00310644 0.02485154 0.19881233 0.99885975
9 0.00086290 0.00776611 0.06989496 0.99972265
10 0.00021573 0.00215725 0.02157252 0.99993837
11 0.00004903 0.00053931 0.00593244 0.99998740
12 0.00001021 0.00012257 0.00147085 0.99999762
13 0.00000196 0.00002554 0.00033196 0.99999958
14 0.00000035 0.00000491 0.00006875 0.99999993
15 0.00000006 0.00000088 0.00001315 0.99999999
16 0.00000001 0.00000015 0.00000234 1.00000000
17 0.00000000 0.00000002 0.00000039 1.00000000
18 0.00000000 0.00000000 0.00000006 1.00000000
19 0.00000000 0.00000000 0.00000001 1.00000000
20 0.00000000 0.00000000 0.00000000 1.00000000
Sum 1.00000000 2.50000000 8.75000000
The mean is 2.5 = λ. The variance is: E[X2 ] - E[X]2 = 8.75 - 2.52 = 2.5 = λ.
In general, the mean of the Poisson is λ and the variance is λ.
In this case the mode is 2, since f(2) = .2565, larger than any other value of the probability density
function. In general, the mode of the Poisson is the largest integer in λ.15 This follows from the fact that
for the Poisson f(x+1) / f(x) = λ / (x+1). Thus for the Poisson the mode is less than or equal to the
mean λ.
The median in this case is 2, since F(2) = .544 ≥ .5, while F(1) = .287 < .5. The median as well as
the mode are less than the mean, which is typical for distributions skewed to the right.
15
If λ is an integer then f(λ) = f(λ-1), and both λ and λ-1 are modes.
Claim Intensity, Derivation of the Poisson:
Assume one has a claim intensity of ξ. The chance of having a claim over an extremely small period
of time Δt is approximately ξ (Δt). (The claim intensity is analogous to the force of mortality in Life
Contingencies.) If the claim intensity is a constant over time and the chance of having a claim in any
interval is independent of the chance of having a claim in any other disjoint interval, then the number
of claims observed over a period time t is given by a Poisson Distribution, with parameter ξt.
A Poisson is characterized by a constant independent claim intensity and vice versa. For
example, if the chance of a claim each month is 0.1%, and months are independent of each other,
the distribution of number of claims over a 5 year period (60 months) is Poisson with mean = 6%.
For the Poisson, the parameter λ = mean =
(claim intensity)(total time covered). Therefore, if for example one has a Poisson in each of five
years with parameter λ, then over the entire 5 year period one has a Poisson with parameter 5λ.
Adding Poissons:
The sum of two independent variables each of which is Poisson with parameters λ1 and
λ 2 is also Poisson, with parameter λ1 + λ2 . 16 This follows from the fact that for a very small
time interval the chance of a claim is the sum of the chance of a claim from either variable, since they
are independent.17 If the total time interval is one, then the chance of a claim from either variable over
a very small time interval Δt is λ1 Δt + λ2 Δt = (λ1 + λ2 )Δt. Thus the sum of the variables has constant
claim intensity (λ1 + λ2 ) over a time interval of one, and is therefore a Poisson with parameter
λ 1 + λ2 .
For example, the sum of a two independent Poisson variables with means 3% and 5%
is a Poisson variable with mean 8%. So if a portfolio consists of one risk Poisson with mean 3% and
one risk Poisson with mean 5%, the number of claims observed for the whole portfolio is Poisson
with mean 8%.
16
See Theorem 6.1 in Loss Models.
17
This can also be shown from simple algebra, by summing over i + j = k the terms (λi e−λ / i!) (µj e−µ / j!) =
e −λ+µ (λi µj / i! j!). By the Binomial Theorem, these terms sum to e−λ+µ (λ+µ)k / k!.
Exercise: Assume one had a portfolio of 25 exposures. Assume each exposure has an
independent Poisson frequency process, with mean 3%. What is the frequency distribution for the
claims from the whole portfolio?
[Solution: A Poisson Distribution with mean: (25)(3%) = 0.75.]
If one has a large number of independent events each with a small probability of occurrence,
then the number of events that occurs has approximately a constant claims intensity and is
thus approximately Poisson Distributed. Therefore the Poisson Distribution can be useful in
modeling such situations.
Thinning a Poisson:18
Sometimes one selects only some of the claims. This is sometimes referred to as “thinning” the
Poisson distribution. For example, if frequency is given by a Poisson and severity is
independent of frequency, then the number of claims above a certain amount (in
constant dollars) is also a Poisson.
For example, assume that we have a Poisson with mean frequency of 30 and that the size of loss
distribution is such that 20% of the losses are greater than $1 million (in constant dollars). Then the
number of losses observed greater than $1 million (in constant dollars) is also Poisson but with a
mean of (20%)(30) = 6. Similarly, losses observed smaller than $1 million (in constant dollars) is
also Poisson, but with a mean of (80%)(30) = 24.
Exercise: Frequency is Poisson with λ = 5.

Sizes of loss are Exponential with mean = 100. F(x) = 1 - e-x/100.
Frequency and severity are independent.
What is the distribution of the number of losses of size less than 50?
What is the distribution of the number of losses of size more than 200?
What is the distribution of the number of losses of size between 50 and 200?
[Solution: For the Exponential, F(50) = 1 - e-50/100 = 0.393.
Thus the number of small losses are Poisson with mean: 5 F(50) = (5)(0.393) = 1.97.
For the Exponential, 1 - F(200) = e-200/100 = 0.135.
Thus the number of large losses are Poisson with mean: (0.135)(5) = 0.68.
For the Exponential, F(200) - F(50) = e-0.5 - e-2 = 0.471.
Thus the number of medium sized losses are Poisson with mean: (0.471)(5) = 2.36.
Comment: As will be discussed, these three Poisson Distributions are independent of each other.]
18
See for example, 3, 5/00, Q.2.
In this example, the total number of losses are Poisson and therefore has a constant independent
claims intensity of 5. Since frequency and severity are independent, the large losses also have a
constant independent claims intensity of 5 {1 - F(200)}, which is therefore Poisson with this mean.
Similarly, the small losses have constant independent claims intensity of 5F(50) and therefore are
Poisson. Also, these two processes are independent of each other.
If in this example we had a deductible of 200, then only losses of size greater than 200 would result
in a (non-zero) payment. Loss Models refers to the number of payments as NP, in contrast to NL
the number of losses.19 In this example, NL is Poisson with mean 5, while for a 200 deductible NP is
Poisson with mean: 5 {1 - F(200)} = (5)(0.135) = 0.68.
Thinning a Poisson based on size of loss is a special case of decomposing Poisson frequencies.
The key idea is that there is some way to divide the claims up into mutually exclusive types that are
independent. Then each type is also Poisson, and the Poisson Distributions for each
type are independent.
Exercise: Claim frequency follows a Poisson Distribution with a mean of 20% per year. 1/4 of all
claims involve attorneys. If attorney involvement is independent between different claims, what is
the probability of getting 2 claims involving attorneys in the next year?
[Solution: Claims with attorney involvement are Poisson with mean frequency 20%/4 = 5%.
Thus f(2) = (0.05)2 e-0.05 / 2! = 0.00119.]
Derivation of Results for Thinning Poissons:20
If losses are Poisson with mean λ, and one selects a portion, t, of the losses in a manner
independent of the frequency, then the selected losses are also Poisson but with mean λt.
∞
Prob[# selected losses = n] = ∑ Prob[m total # losses] Prob[n of m losses are selected]
m=n
∞ ∞
e- λ λ m t n (1- t)m - n m! e- λ tn λn λm - n (1- t)m - n
= ∑ = ∑ =
m = n m! n! (m- n)! n! m=n (m- n)!
∞
e- λ tn λn e- λ tn λn λ(1-t)
n!
∑ λi (1- t)i / i! =
n!
e = e−λt(tλ)n /n! = f(n) for a Poisson with mean tλ.
i=0
In a similar manner, the number not selected follows a Poisson with mean (1-t)λ.
19
I do not regard this notation as particularly important, although it is possible that it will be used in an exam question.
See Section 8.6 of Loss Models.
20
I previously discussed how these results follow from the constant, independent claims intensity.
Prob[# selected losses = n | # not selected losses = j] =

Prob[total # = n + j and # not selected losses = j] / Prob[# not selected losses = j] =
Prob[total # = n + j] Prob[# not selected losses = j | total # = n + j]
=
Prob[# not selected losses = j]
e- λ λn + j (1- t)j tn (n + j)!

(n + j)! n! j!
= e−λt(tλ)n /n! =
e- (1- t)λ {(1- t)λ}j
j!
f(n) for a Poisson with mean tλ = Prob[# selected losses = n].

Thus the number selected and the number not selected are independent. They are independent
Poisson distributions. The same result follows when dividing into more than 2 disjoint subsets.
Effect of Exposures:21
Assume one has 100 exposures with independent, identically distributed frequency distributions. If
each one is Poisson, then so is the sum, with mean 100λ. If we change the number of exposures to
for example 150, then the sum is Poisson with mean 150λ, or 1.5 times the mean in the first case. In
general, as the exposures change, the distribution remains Poisson with the mean changing in
proportion.
Exercise: The total number of claims from a portfolio of private passenger automobile insured has a
Poisson Distribution with λ = 60. If next year the portfolio has only 80% of the current exposures,
what is its frequency distribution?
[Solution: Poisson with λ = (.8)(60) = 48.]
This same result holds for a Compound Frequency Distribution, to be discussed subsequently, with
a primary distribution that is Poisson.
21
See Section 7.4 of Loss Models, not on the syllabus.
Poisson Distribution
Support: x = 0, 1, 2, 3, ... Parameters: λ > 0
D. f. : F(x) = 1 - Γ(x+1 ; λ) Incomplete Gamma Function22
P. d. f. : f(x) = λx e−λ / x!
Mean = λ Variance = λ Variance / Mean = 1.
1 1
Coefficient of Variation = Skewness = = CV. Kurtosis = 3 + 1/λ = 3 + CV2 .
λ λ
Mode = largest integer in λ (if λ is an integer then both λ and λ-1 are modes.)
nth Factorial Moment = λn .
Probability Generating Function: P(z) = eλ(z-1) , λ > 0.

Moment Generating Function: M(s) = exp[λ(es - 1)].
f(x+1) / f(x) = a + b / (x+1), a = 0, b = λ , f(0) = e−λ.
A Poisson Distribution for λ = 10:

Prob.
0.12
0.1
0.08
0.06
0.04
0.02
x
0 5 10 15 20 25
22
x+1 is the shape parameter of the Incomplete Gamma which is evaluated at the point λ. Thus one can get the sum
of terms for the Poisson Distribution by using the Incomplete Gamma Function.
Problems:
Use the following information for the next five questions:

The density function for n is: f(n) = 6.9n e-6.9 / n!, n = 0, 1, 2, ...
4.1 (1 point) What is the mean of the distribution?

A. less than 6.9
E. at least 7.2
4.2 (1 point) What is the variance of the distribution?

A. less than 6.9
E. at least 7.2
4.3 (2 points) What is the chance of having less than 4 claims?

A. less than 9%
E. at least 12%
4.4 (2 points) What is the mode of the distribution?

A. 5 B. 6 C. 7 D. 8 E. None of the Above.
4.5 (2 points) What is the median of the distribution?

A. 5 B. 6 C. 7 D. 8 E. None of the Above.
4.6 (2 points) The male drivers in the State of Grace each have their annual claim frequency given
by a Poisson distribution with parameter equal to 0.05.
The female drivers in the State of Grace each have their annual claim frequency given by a Poisson
distribution with parameter equal to 0.03.
You insure in the State of Grace 20 male drivers and 10 female drivers.
Assume the claim frequency distributions of the individual drivers are independent.
What is the chance of observing 3 claims in a year?
A. less than 9.6%
B. at least 9.6% but less than 9.7%
C. at least 9.7% but less than 9.8%
D. at least 9.8% but less than 9.9%
E. at least 9.9 %
4.7 (2 points) Assume that the frequency of hurricanes hitting the State of Windiana is given by a
Poisson distribution, with an average annual claim frequency of 82%. Assume that the losses in
millions of constant 1998 dollars from such a hurricane are given by a Pareto Distribution with
α = 2.5 and θ = 400 million. Assuming frequency and severity are independent, what is chance of
two or more hurricanes each with more than $250 million (in constant 1998 dollars) of loss hitting the
State of Windiana next year?
(There may or may not be hurricanes of other sizes.)
A. less than 2.1%
E. at least 2.4%

The claim frequency follows a Poisson Distribution with a mean of 10 claims per year.
4.8 (2 points) What is the chance of having more than 5 claims in a year?
A. 92% B. 93% C. 94% D. 95% E. 96%
4.9 (2 points) What is the chance of having more than 8 claims in a year?
A. 67% B. 69% C. 71% D. 73% E. 75%
4.10 (1 point) What is the chance of having 6, 7, or 8 claims in a year?

A. 19% B. 21% C. 23% D. 25% E. 27%
4.11 (2 points) You are given the following:

• Claims follow a Poisson Distribution, with a mean of 27 per year.
• The size of claims are given by a Weibull Distribution with θ = 1000 and τ = 3.
• Frequency and severity are independent.
Given that during a year there are 7 claims of size less than 500, what is the expected number of
claims of size greater than 500 during that year?
(A) 20 (B) 21 (C) 22 (D) 23 (E) 24
4.12 (1 point) Frequency follows a Poisson Distribution with λ = 7.

20% of losses are of size greater than $50,000.
Let N be the number of losses of size greater than $50,000.
What is the probability that N = 3?
A. less than 9%
E. at least 12%
4.13 (1 point) N follows a Poisson Distribution with λ = 0.1. What is Prob[N = 1 | N ≤ 1]?
A. 8% B. 9% C. 10% D. 11% E. 12%
4.14 (1 point) N follows a Poisson Distribution with λ = 0.1. What is Prob[N = 1 | N ≥ 1]?
A. 91% B. 92% C. 93% D. 94% E. 95%
4.15 (2 points) N follows a Poisson Distribution with λ = 0.2. What is E[1/(N+1)]?

A. less than 0.75
E. at least 0.90
4.16 (2 points) N follows a Poisson Distribution with λ = 2. What is E[N | N > 1]?
A. 2.6 B. 2.7 C. 2.8 D. 2.9 E. 3.0
4.17 (2 points) The total number of claims from a book of business with 500 exposures has a
Poisson Distribution with λ = 27. Next year, this book of business will have 600 exposures.
Next year, what is the probability of this book of business having a total of 30 claims?
A. 5.8% B. 6.0% C. 6.2% D. 6.4% E. 6.6%

N follows a Poisson Distribution with λ = 1.3. Define (N-j)+ = n-j if n ≥ j, and 0 otherwise.
4.18 (2 points) Determine E[(N-1)+].

A. 0.48 B. 0.51 C. 0.54 D. 0.57 E. 0.60

A. 0.19 B. 0.20 C. 0.21 D. 0.22 E. 0.23
4.20 (2 points) The total number of non-zero payments from a policy with a $500 deductible
follows a Poisson Distribution with λ = 3.3.
The ground up losses follow a Weibull Distribution with τ = 0.7 and θ = 2000.
If this policy instead had a $1000 deductible, what would be the probability of having 4
non-zero payments?
A. 14% B. 15% C. 16% D. 17% E. 18%
4.21 (3 points) The number of major earthquakes that hit the state of Allshookup is given by a
Poisson Distribution with 0.05 major earthquakes expected per year.
• Allshookup establishes a fund that will pay 1000/major earthquake.
• The fund charges an annual premium, payable at the start of each year, of 60.
• At the start of this year (before the premium is paid) the fund has 300.
• Claims are paid immediately when there is a major earthquake.
• If the fund ever runs out of money, it immediately ceases to exist.
• Assume no investment income and no expenses.
What is the probability that the fund is still functioning in 40 years?
A. Less than 40%
B. At least 40%, but less than 41%
C. At least 41%, but less than 42%
D. At least 42%, but less than 43%
E. At least 43%

• A business has bought a collision policy to cover its fleet of automobiles.
• The number of collision losses per year follows a Poisson Distribution.
• The size of collision losses follows an Exponential Distribution with a mean of 600.
• This policy has an ordinary deductible of 1000 per collision.
• The probability of no payments on this policy during a year is 74%.
Determine the probability that of the collision losses this business has during a year, exactly three of
them result in no payment on this policy.
(A) 8% (B) 9% (C) 10% (D) 11% (E) 12%
4.23 (2 points) A Poisson Distribution has a coefficient of variation 0.5.

Determine the probability of exactly seven claims.
(A) 4% (B) 5% (C) 6% (D) 7% (E) 8%
4.24 (2 points) X and Y are each Poisson random variables.

Sample values of X and Y are drawn together in pairs.
Z i = Xi + Yi.
E[Z] = 10. Var[Z] = 12.
Find the covariance of X and Y.
A. 0 B. 0.5 C. 1.0 D. 1.5 E. 2.0
4.25 (2 points) The size of losses prior to the effect of any deductible follow a Pareto Distribution
with α = 3 and θ = 1000.
With a deductible of 100, the number of non-zero payments is Poisson with mean 5.
If instead there is a deductible of 250, what is the probability of exactly 3 non-zero payments?
(A) 16% (B) 18% (C) 20% (D) 22% (E) 24%
4.26 (2 points) The number of losses is Poisson with mean 10.

Loss sizes follow an Exponential Distribution with mean 400.
The number of losses and the size of those losses are independent.
What is the probability of exactly 5 losses of size less than 200 and exactly 6 losses of size greater
than 200?
(A) 2.0% (B) 2.5% (C) 3.0% (D) 3.5% (E) 4.0%
4.27 (2, 5/83, Q.4) (1.5 points) If X is the mean of a random sample of size n from a Poisson
distribution with parameter λ, then which of the following statements is true?
A. X has a Normal distribution with mean λ and variance λ.
B. X has a Normal distribution with mean λ and variance λ/n.
C. X has a Poisson distribution with parameter λ.
D. n X has a Poisson distribution with parameter λn .
E. n X has a Poisson distribution with parameter nλ.
4.28 (2, 5/83, Q.28) (1.5 points) The number of traffic accidents per week in a small city has a
Poisson distribution with mean equal to 3.
What is the probability of exactly 2 accidents in 2 weeks?
A. 9e-6 B. 18e-6 C. 25e-6 D. 4.5e-3 E. 9.5e-3
4.29 (2, 5/83, Q.45) (1.5 points) Let X have a Poisson distribution with parameter λ = 1.
What is the probability that X ≥ 2, given that X ≤ 4?
A. 5/65 B. 5/41 C. 17/65 D. 17/41 E. 3/5
4.30 (2, 5/85, Q.9) (1.5 points) The number of automobiles crossing a certain intersection during
any time interval of length t minutes between 3:00 P.M. and 4:00 P.M. has a Poisson distribution
with mean t. Let W be time elapsed after 3:00 P.M. before the first automobile crosses the
intersection. What is the probability that W is less than 2 minutes?
A. 1 - 2e-1 - e-2 B. e-2 C. 2e-1 D. 1 - e-2 E. 2e-1 + e-2
4.31 (2, 5/85, Q.16) (1.5 points) In a certain communications system, there is an average of 1
transmission error per 10 seconds. Let the distribution of transmission errors be Poisson.
What is the probability of more than 1 error in a communication one-half minute in duration?
A. 1 - 2e-1 B. 1 - e-1 C. 1 - 4e-3 D. 1 - 3e-3 E. 1 - e-3
4.32 (2, 5/88, Q.49) (1.5 points) The number of power surges in an electric grid has a Poisson
distribution with a mean of 1 power surge every 12 hours.
What is the probability that there will be no more than 1 power surge in a 24-hour period?
A. 2e-2 B. 3e-2 C. e-1/2 D. (3/2)e-1/2 E. 3e-1
4.33 (4, 5/88, Q.48) (1 point) An insurer's portfolio is made up of 3 independent policyholders
with expected annual frequencies of 0.05, 0.1, and 0.15.
Assume that each insured's number of claims follows a Poisson distribution.
What is the probability that the insurer experiences fewer than 2 claims in a given year?
A. Less than 0.9
B. At least 0.9, but less than 0.95
E. Greater than 0.99
4.34 (2, 5/90, Q.39) (1.7 points) Let X, Y, and Z be independent Poisson random variables with
E(X) = 3, E(Y) = 1, and E(Z) = 4. What is P[X + Y + Z ≤ 1]?
A. 13e-12 B. 9e-8 C. (13/12)e-1/12 D. 9e-1/8 E. (9/8)e-1/8
4.35 (4B, 5/93, Q.1) (1 point) You are given the following:
• A portfolio consists of 10 independent risks.
• The distribution of the annual number of claims for each risk in the portfolio is given
by a Poisson distribution with mean µ = 0.1.
Determine the probability of the portfolio having more than 1 claim per year.
A. 5% B. 10% C. 26% D. 37% E. 63%
4.36 (4B, 11/94, Q.19) (3 points) The density function for a certain parameter, α, is
f(α) = 4.6α e-4.6 / α!, α = 0, 1, 2, ...

Which of the following statements are true concerning the distribution function for α?
1. The mode is less than the mean.
2. The variance is greater than the mean.
3. The median is less than the mean.
A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3
4.37 (4B, 5/95, Q.9) (2 points) You are given the following:
• The number of claims for each risk in a group of identical risks follows a Poisson distribution.
• The expected number of risks in the group that will have no claims is 96.
• The expected number of risks in the group that will have 2 claims is 3.
Determine the expected number of risks in the group that will have 4 claims.
A. Less than .01
B. At least .01, but less than .05
C. At least .05, but less than .10
D. At least .10, but less than .20
E. At least .20
4.38 (2, 2/96, Q.21) (1.7 points) Let X be a Poisson random variable with E(X) = ln(2).
Calculate E[cos(πx)].
A. 0 B. 1/4 C. 1/2 D. 1 E. 2ln(2)
• The number of claims follows a Poisson distribution.
• Claim sizes follow a Pareto distribution.
Determine the type of distribution that the number of claims with sizes greater than 1,000 follows.
A. Poisson B. Pareto C. Gamma D. Binomial E. Negative Binomial
4.40 (4B, 11/98, Q.2) (2 points) The random variable X has a Poisson distribution with mean
n - 1/2, where n is a positive integer greater than 1. Determine the mode of X.
A. n-2 B. n-1 C. n D. n+1 E. n+2
4.41 (4B, 11/98, Q.18) (2 points) The number of claims per year for a given risk follows a
distribution with probability function p(n) = λn e−λ / n! , n = 0, 1,..., λ > 0 .
Determine the smallest value of λ for which the probability of observing three or more claims during
two given years combined is greater than 0.1.
A. Less than 0.7
E. At least 1.6
• Each loss event is either an aircraft loss or a marine loss.
• The number of aircraft losses has a Poisson distribution with a mean of 0.1 per year.
Each loss is always 10,000,000.
• The number of marine losses has a Poisson distribution with a mean of 0.2 per year.
Each loss is always 20,000,000.
• Aircraft losses occur independently of marine losses.
• From the first two events each year,
the insurer pays the portion of the combined losses that exceeds 10,000,000.
Determine the insurer's expected annual payments.
A. Less than 1,300,000
B. At least 1,300,000, but less than 1,800,000
C. At least 1,800,000, but less than 2,300,000
D. At least 2,300,000, but less than 2,800,000
E. At least 2,800,000
4.43 (IOA 101, 4/00, Q.5) (2.25 points) An insurance companyʼs records suggest that
experienced drivers (those aged over 21) submit claims at a rate of 0.1 per year, and
inexperienced drivers (those 21 years old or younger) submit claims at a rate of 0.15 per year.
A driver can submit more than one claim a year.
The company has 40 experienced and 20 inexperienced drivers insured with it.
The number of claims for each driver can be modeled by a Poisson distribution, and claims are
independent of each other.
Calculate the probability the company will receive three or fewer claims in a year.
4.44 (1, 5/00, Q.24) (1.9 points) An actuary has discovered that policyholders are three times as
likely to file two claims as to file four claims.
If the number of claims filed has a Poisson distribution, what is the variance of the
number of claims filed?
(A) 1/ 3 (B) 1 (C) 2 (D) 2 (E) 4
4.45 (3, 5/00, Q.2) (2.5 points) Lucky Tom finds coins on his way to work at a Poisson rate of 0.5
coins/minute. The denominations are randomly distributed:
(i) 60% of the coins are worth 1;
(ii) 20% of the coins are worth 5; and
(iii) 20% of the coins are worth 10.
Calculate the conditional expected value of the coins Tom found during his one-hour walk today,
given that among the coins he found exactly ten were worth 5 each.
(A) 108 (B) 115 (C) 128 (D) 165 (E) 180
4.46 (1, 11/00, Q.23) (1.9 points) A company buys a policy to insure its revenue in the event of
major snowstorms that shut down business. The policy pays nothing for the first such snowstorm of
the year and 10,000 for each one thereafter, until the end of the year.
The number of major snowstorms per year that shut down business is assumed to have a Poisson
distribution with mean 1.5.
What is the expected amount paid to the company under this policy during a one-year period?
(A) 2,769 (B) 5,000 (C) 7,231 (D) 8,347 (E) 10,578
4.47 (3, 11/00, Q.29) (2.5 points) Job offers for a college graduate arrive according to a Poisson
process with mean 2 per month. A job offer is acceptable if the wages are at least 28,000.
Wages offered are mutually independent and follow a lognormal distribution,
with µ = 10.12 and σ = 0.12.
Calculate the probability that it will take a college graduate more than 3 months to receive an
acceptable job offer.
(A) 0.27 (B) 0.39 (C) 0.45 (D) 0.58 (E) 0.61
4.48 (1, 11/01, Q.19) (1.9 points) A baseball team has scheduled its opening game for April 1.
If it rains on April 1, the game is postponed and will be played on the next day that it does not rain.
The team purchases insurance against rain. The policy will pay 1000 for each day, up to 2 days, that
the opening game is postponed. The insurance company determines that the number of
consecutive days of rain beginning on April 1 is a Poisson random variable with mean 0.6.
What is the standard deviation of the amount the insurance company will have to pay?
(A) 668 (B) 699 (C) 775 (D) 817 (E) 904
4.49 (CAS3, 11/03, Q.31) (2.5 points) Vehicles arrive at the Bun-and-Run drive-thru at a Poisson
rate of 20 per hour. On average, 30% of these vehicles are trucks.
Calculate the probability that at least 3 trucks arrive between noon and 1:00 PM.
A. Less than 0.80
E. At least 0.95
4.50 (CAS3, 5/04, Q.16) (2.5 points) The number of major hurricanes that hit the island nation of
Justcoast is given by a Poisson Distribution with 0.100 storms expected per year.
• Justcoast establishes a fund that will pay 100/storm.
• The fund charges an annual premium, payable at the start of each year, of 10.
• At the start of this year (before the premium is paid) the fund has 65.
• Claims are paid immediately when there is a storm.
• If the fund ever runs out of money, it immediately ceases to exist.
• Assume no investment income and no expenses.
What is the probability that the fund is still functioning in 10 years?
A. Less than 60%
E. At least 63%
4.51 (CAS3, 11/04, Q.17) (2.5 points) You are given:

• Claims are reported at a Poisson rate of 5 per year.
• The probability that a claim will settle for less than $100,000 is 0.9.
What is the probability that no claim of $100,000 or more is reported during the next 3 years?
A. 20.59% B. 22.31% C. 59.06% D. 60.65% E. 74.08%
4.52 (CAS3, 11/04, Q.23) (2.5 points) Dental Insurance Company sells a policy that covers two
types of dental procedures: root canals and fillings.
There is a limit of 1 root canal per year and a separate limit of 2 fillings per year.
The number of root canals a person needs in a year follows a Poisson distribution with λ = 1,
and the number of fillings a person needs in a year is Poisson with λ = 2.
The company is considering replacing the single limits with a combined limit of 3 claims per year,
regardless of the type of claim.
Determine the change in the expected number of claims per year if the combined limit is adopted.
A. No change
B. More than 0.00, but less than 0.20 claims
C. At least 0.20, but less than 0.25 claims
D. At least 0.25, but less than 0.30 claims
E. At least 0.30 claims
4.53 (SOA M, 5/05, Q.5) (2.5 points)

Kings of Fredonia drink glasses of wine at a Poisson rate of 2 glasses per day.
Assassins attempt to poison the kingʼs wine glasses. There is a 0.01 probability that any
given glass is poisoned. Drinking poisoned wine is always fatal instantly and is the only
cause of death. The occurrences of poison in the glasses and the number of glasses drunk are
independent events.
Calculate the probability that the current king survives at least 30 days.
(A) 0.40 (B) 0.45 (C) 0.50 (D) 0.55 (E) 0.60
4.54 (CAS3, 11/05, Q.24) (2.5 points) For a compound loss model you are given:
• The claim count follows a Poisson distribution with λ = 0.01.
• Individual losses are distributed as follows:
x F(x)
100 0.10
300 0.20
500 0.25
600 0.40
700 0.50
800 0.70
900 0.80
1,000 0.90
1,200 1.00
Calculate the probability of paying at least one claim after implementing a $500 deductible.
A. Less than 0.005
E. At least 0.020
4.55 (CAS3, 11/05, Q.31) (2.5 points) The Toronto Bay Leaves attempt shots in a hockey game
according to a Poisson process with mean 30. Each shot is independent.
For each attempted shot, the probability of scoring a goal is 0.10.
Calculate the standard deviation of the number of goals scored by the Bay Leaves in a game.
A. Less than 1.4
E. At least 2.0

• Annual frequency follows a Poisson distribution with mean 0.3.
• Severity follows a normal distribution with F(100,000) = 0.6.
Calculate the probability that there is at least one loss greater than 100,000 in a year.
A. Less than 11 %
E. At least 17%
4.57 (SOA M, 11/06, Q.9) (2.5 points) A casino has a game that makes payouts at a Poisson rate
of 5 per hour and the payout amounts are 1, 2, 3,… without limit.
The probability that any given payout is equal to i is 1/ 2i. Payouts are independent.
Calculate the probability that there are no payouts of 1, 2, or 3 in a given 20 minute period.
(A) 0.08 (B) 0.13 (C) 0.18 (D) 0.23 (E) 0.28
4.58 (CAS3L, 5/09, Q.8) (2.5 points) Bill receives mail at a Poisson rate of 10 items per day.
The contents of the items are randomly distributed:
• 50% of the items are credit card applications.
• 30% of the items are catalogs.
• 20% of the items are letters from friends.
Bill has received 20 credit card applications in two days.
Calculate the probability that for those same two days, he receives at least 3 letters from friends and
exactly 5 catalogs.
A. Less than 6%
E. At least 18%
4.59 (CAS3L, 5/09, Q.9) (2.5 points) You are given the following information:
• Policyholder calls to a call center follow a homogenous Poisson process with λ = 250 per day.
• Policyholders may call for 3 reasons: Endorsement, Cancellation, or Payment.
• The distribution of calls is as follows:
Call Type Percent of Calls
Endorsement 50%
Cancellation 10%
Payment 40%
Using the normal approximation with continuity correction, calculate the probability of receiving more
than 156 calls in a day that are either endorsements or cancellations.
A. Less than 27%
E. At least 33%
• Claims follow a compound Poisson process.
• Claims occur at the rate of λ = 10 per day.
• Claim severity follows an exponential distribution with θ = 15,000.
• A claim is considered a large loss if its severity is greater than 50,000.
What is the probability that there are exactly 9 large losses in a 30-day period?
A. Less than 5%
B. At least 5%, but less than 7.5%
C. At least 7.5%, but less than 10%
D. At least 10%, but less than 12.5%
E. At least 12.5%
4.1. B. This is a Poisson distribution with a parameter of 6.9. The mean is therefore 6.9.
4.2. B. This is a Poisson distribution with a parameter of 6.9.

The variance is therefore 6.9.
4.3. A. One needs to sum the chances of having 0, 1, 2, and 3 claims:

n 0 1 2 3
f(n) 0.001 0.007 0.024 0.055
F(n) 0.001 0.008 0.032 0.087
For example, f(3) = 6.93 e-6.9 / 3! = (328.5)(.001008)/6 = .055.
4.4. B. The mode is the value at which f(n) is a maximum; f(6) = .151 and the mode is therefore 6.
n 0 1 2 3 4 5 6 7 8
f(n) 0.001 0.007 0.024 0.055 0.095 0.131 0.151 0.149 0.128
Alternately, in general for the Poisson the mode is the largest integer in the parameter; the largest
integer in 6.9 is 6.
4.5. C. For a discrete distribution such as we have here, employ the convention that the median is
the first value at which the distribution function is greater than or equal to .5.
F(7) ≥ 50% and F(6) < 50%, and therefore the median is 7.
n 0 1 2 3 4 5 6 7 8
f(n) 0.001 0.007 0.024 0.055 0.095 0.131 0.151 0.149 0.128
F(n) 0.001 0.008 0.032 0.087 0.182 0.314 0.465 0.614 0.742
4.6. E. The sum of Poisson variables is a Poisson with the sum of the parameters.
The sum has a Poisson parameter of (20)(.05) + (10)(.03) = 1.3.
The chance of three claims is (1.33 )e-1.3 / 3! = 9.98%.
4.7. E. For the Pareto Distribution, S(x) = 1 - F(x) = {θ/(θ+x)}α.

S(250) = {400/(400+250)}2.5 = 0.2971.
Thus the distribution of hurricanes with more than $250 million of loss is Poisson with mean frequency
of (82%)(.2971) = 24.36%.
The chance of zero such hurricanes is e-0.2436 = 0.7838.
The chance of one such hurricane is: (0.2436)e-0.2436 = 0.1909.
The chance of more than one such hurricane is: 1 - (0.7838 + 0.1909) = 0.0253.
4.8. B. f(n) = e−λ λ n / n! = e-10 10n / n!

n 0 1 2 3 4 5
f(n) 0.0000 0.0005 0.0023 0.0076 0.0189 0.0378
F(n) 0.0000 0.0005 0.0028 0.0103 0.0293 0.0671
Thus the chance of having more than 5 claims is 1 - .0671 = .9329.
Comment: Although one should not do so on the exam, one can also solve this using the
Incomplete Gamma Function. The chance of having more than 5 claims is
the Incomplete Gamma with shape parameter 5+1 =6 at the value 10: Γ(6;10) = .9329.
4.9. A. f(n) = e−λ λ n / n! = e-10 10n / n!

n 0 1 2 3 4 5 6 7 8
f(n) 0.0000 0.0005 0.0023 0.0076 0.0189 0.0378 0.0631 0.0901 0.1126
F(n) 0.0000 0.0005 0.0028 0.0103 0.0293 0.0671 0.1301 0.2202 0.3328
Thus the chance of having more than 8 claims is 1 - .3328 = .6672.
Comment: The chance of having more than 8 claims is the incomplete Gamma with shape
parameter 8+1 = 9 at the value 10: Γ(9;10) = 0.6672.
4.10. E. One can add up: f(6) + f(7) + f(8) = 0.0631 + 0.0901 + 0.1126 = 0.2657.
Alternately, one can use the solutions to the two previous questions.
F(8) - F(5) = {1-F(5)} - {1-F(8)} = 0.9239 - 0.6672 = 0.2657.
Comment: Prob[6, 7, or 8 claims] = Γ(6;10) - Γ(9;10) = 0.9239 - 0.6672 = 0.2657.
4.11. E. The large and small claims are independent Poisson Distributions. Therefore, the
observed number of small claims has no effect on the expected number of large claims.
S(500) = exp(-(500/1000)3 ) = 0.8825. Expected number of large claims is: (27)(0.8825) = 23.8.
4.12. D. Frequency of large losses follows a Poisson Distribution with λ = (20%)(7) = 1.4.
f(3) = 1.43 e-1.4/3! = 11.3%.
4.13. B. Prob[N = 1 | N ≤ 1] = Prob[N=1]/Prob[N ≤ 1] = λe−λ/(e−λ + λe−λ) = λ/(1 + λ) = 0.0909.
4.14. E. Prob[N = 1 | N ≥ 1] = Prob[N = 1]/Prob[N ≥ 1] = λe−λ/(1 - e−λ) = λ/(eλ - 1) = 0.9508.

∞ ∞ ∞
4.15. E. E[1/(N+1)] = ∑ f(n)/ (n +1) = ∑(e- .2 .2n / n!) / (n +1) = (e-.2/.2) ∑0.2n + 1/ (n +1)!
0 0 0
∞ ∞
= (e-.2/.2) ∑ 0.2i / i! = (e-.2/.2){ ∑0.2i / i! − 0.20 / 0! } = (e-0.2/.2)(e.2 - 1) = (1 - e-.2)/.2 = 0.906.
1 0
∞
Comment: The densities of a Poisson with λ = 0.2 add to one. ⇒ ∑ 0.2i / i! = e-0.2.
0
4.16. D. E[N] = P[N = 0]0 + P[N = 1]1 + P[N > 1]E[N | N > 1].
2 = 2e-2 + (1 - e-2 - 2e-2)E[N | N > 1]. E[N | N > 1] = (2 - 2e-2)/(1 - e-2 - 2e-2) = 2.91.
∞ ∞ ∞
∑n e- λ λn / n! ∑ e- λ λn / (n-1)! λ ∑ e - λ λi / i!
n=2 n=2 i=1
Alternately, E[N | N > 1] = = =
∞ Prob[N > 1] 1 - e- λ - λe-λ
∑ e- λ λn / n!
n=2
λ (1 - e-λ )
= . Plugging in λ = 2, the result is 2.91.
1 - e- λ - λe-λ
4.17. E. Next year the frequency is Poisson with λ = (600/500)(27) = 32.4.

f(30) = e-32.4 32.430/30! = 6.63%.
4.18. D. E[N | N ≥ 1] Prob[N ≥ 1] + 0 Prob[N = 0] = E[N] = λ. ⇒ E[N | N ≥ 1] Prob[N ≥ 1] = λ.

E[(N-1)+] = E[N - 1 | N ≥ 1] Prob[N ≥ 1] + 0 Prob[N = 0] = (E[N | N ≥ 1] - 1) Prob[N ≥ 1] =
E[N | N ≥ 1] Prob[N ≥ 1] - Prob[N ≥ 1] = λ - (1 - e−λ) = λ + e−λ - 1 = 1.3 + e-1.3 - 1 = 0.5725.
∞ ∞ ∞
Alternately, E[(N-1)+] = ∑ (n -1) f(n) = ∑ n f(n) - ∑ f(n) = E[N] - Prob[N≥ 1] = λ + e−λ - 1.
n=1 n=1 n=1
Alternately, E[(N-1)+] = E[N] - E[N ∧ 1] = λ - Prob[N ≥ 1] = λ + e−λ - 1.
Alternately, E[(N-1)+] = E[(1-N)+] + E[N] - 1 = Prob[N = 0] + λ - 1 = e−λ + λ - 1.

Comment: For the last two alternate solutions, see “Mahlerʼs Guide to Loss Distributions.”
4.19. B. E[N | N ≥ 2] Prob[N ≥ 2] + 1 Prob[N = 1] + 0 Prob[N = 0] = E[N] = λ.
⇒ E[N | N ≥ 2] Prob[N ≥ 2] = λ − λe−λ.

E[(N-2)+] = E[N - 2 | N ≥ 2] Prob[N ≥ 2] + 0 Prob[N < 2] = (E[N | N ≥ 2] - 2) Prob[N ≥ 2] =
E[N | N ≥ 2] Prob[N ≥ 2] - 2Prob[N ≥ 2] = λ - λe−λ - 2(1 - e−λ - λe−λ) = λ + 2e−λ + λe−λ - 2 =
1.3 + 2e-1.3 + 1.3e-1.3 - 2 = 0.199.
∞ ∞ ∞
Alternately, E[(N-2)+] = ∑ (n- 2) f(n) = ∑ n f(n) - 2 ∑ f(n) = E[N] - λe−λ - 2 Prob[N≥ 2] =
n=2 n=2 n=2
= λ - λe−λ - 2(1 - e−λ - λe−λ) = λ + 2e−λ + λe−λ - 2.
Alternately, E[(N-2)+] = E[N] - E[N ∧ 2] = λ - (Prob[N = 1] + 2Prob[N ≥ 2]) = λ + 2e−λ + λe−λ - 2.
Alternately, E[(N-2)+] = E[(2-N)+] + E[N] - 2 = 2Prob[N = 0] + Prob[N = 1] + λ - 2 =
2e−λ + λe−λ + λ - 2.
Comment: For the last two alternate solutions, see “Mahlerʼs Guide to Loss Distributions.”
4.20. A. For the Weibull, S(500) = exp[-(500/2000).7] = .6846.

S(1000) = exp[-(1000/2000).7] = .5403. Therefore, with the $1000 deductible, the non-zero
payments are Poisson, with λ = (.5403/.6846)(3.3) = 2.60. f(4) = e-2.6 2.64 /4! = 14.1%.
4.21. D. In the absence of losses, by the beginning of year 12, the fund would have:
300 + (12)(60) = 1020 > 1000.
In the absence of losses, by the beginning of year 29, the fund would have:
300 + (29)(60) = 2040 > 2000.
Thus in order to survive for 40 years there have to be 0 events in the first 11 years,
at most one event during the first 28 years, and at most two events during the first 40 years.
Prob[survival through 40 years] =
Prob[0 in first 11 years]{Prob[0 in next 17 years]Prob[0, 1, or 2 in final 12 years] +
Prob[1 in next 17 years]Prob[0 or 1 in final 12 years]} =
e-0.55 {(e-0.85)(e-0.6 + 0.6e-0.6 + 0.62 e-0.6/2) + (0.85e-0.85)(e-0.6 + 0.6e-0.6)} = 3.14e-2 = 0.425
Comment: Similar to CAS3, 5/04, Q.16.
4.22. C. The percentage of large losses is: e-1000/600 = 18.89%.

Let λ be the mean of the Poisson distribution of all losses.
Then the large losses, those of size greater than 1000, are also Poisson with mean 0.1889λ.
74% = Prob[no payments] = Prob[0 large losses] = exp[-0.1889λ]. ⇒ λ = 1.59.

The small losses, those of size less than 1000, are also Poisson with mean:
(1 - 0.1889) (1.59) = 1.29.
Prob[3 small losses] = 1.293 e-1.29 / 6 = 9.8%.
Comment: Only those losses of size greater than 1000 result in a payment.
There can be any number of small losses without affecting the payments.
4.23. C. The coefficient of variation is the ratio of the standard deviation to the mean, which for a
Poisson Distribution is: λ / λ = 1/ λ . 1/ λ = 0.5. ⇒ λ = 4. ⇒ f(3) = 47 e-4 / 7! = 5.95%.
4.24. C. 10 = E[Z] = E[X + Y] = E[X] + E[Y] = Var[X] + Var[Y].

12 = Var[Z] = Var[X + Y] = Var[X] + Var[Y] + 2 Cov[X, Y] = 10 + 2Cov[X, Y].
Therefore, Cov[X, Y] = (12 - 10)/2 = 1.
4.25. D. The survival function of the Pareto is: S(x) = {1000/(1000+x)}3 .

S(100) = 0.7513. S(250) = 0.5120.
Thus with a deductible of 250 the number of non-zero payments is Poisson with mean:
(5)(0.5120/0.7513) = 3.407.
Probability of exactly 3 non-zero payments is: (3.4073 ) e-3.407 / 6 = 21.8%.
4.26. B. For the Exponential, F(200) = 1 - exp[-200/400] = 0.3935.

Thus the small loses are Poisson with mean (0.3935)(10) = 3.935,
the large loses are Poisson with mean (1 - 0.3935)(10) = 6.065,
and the number of small and large losses are independent.
Probability of 5 small losses is: (3.9355 ) e-3.935 / 5! = 0.1537..
Probability of 6 large losses is: (6.0656 ) e-6.065 / 6! = 0.1606.
Probability of 5 small losses and 6 large losses is: (0.1537)(0.1606) = 2.47%.
4.27. E. n X , the sum of n Poissons each with mean λ, is a Poisson with mean nλ.
Comment: X can be non-integer, and therefore it cannot have a Poisson distribution.
As n → ∞, X → a Normal distribution with mean λ and variance λ/n.

4.28. B. Over two weeks, the number of accidents is Poisson with mean 6.
f(2) = e−λλ 2/2 = e-6 62 /2 = 18e- 6.
4.29. C. Prob[X ≥ 2 | X ≤ 4] = {f(2) + f(3) + f(4)}/{f(0) + f(1) + f(2) + f(3) + f(4)} =

e -1(1/2 + 1/6 + 1/24)/{e-1(1 + 1 + 1/2 + 1/6 + 1/24)} = (12 + 4 + 1)/(24 + 24 + 12 + 4 + 1)
= 17/65.
4.30. D. Prob[W ≤ 2] = 1 - Prob[no cars by time 2] = 1 - e- 2.
4.31. C. Prob[0 errors in 30 seconds] = e-30/10 = e-3. Prob[1 error in 30 seconds] = 3e-3.
Prob[more than one error in 30 seconds] = 1 - 4e- 3.
4.32. B. Prob[0 or 1 surges] = e-24/12 + 2e-2 = 3e- 2.
4.33. C. The sum of three independent Poissons is also a Poisson, whose mean is the sum of the
individual means. Thus the portfolio of three insureds has a Poisson distribution with mean
0.05 + 0.10 + 0.15 = 0.30. For a Poisson distribution with mean θ, the chance of zero claims is e−θ
and that of 1 claim is θe−θ. Thus the chance of fewer than 2 claims is: (1+θ)e−θ.
Thus for this portfolio, the chance of 2 or fewer claims is: (1+.3)e-.3 = 0.963.
4.34. B. X + Y + Z is Poisson with λ = 3 + 1 + 4 = 8. f(0) + f(1) = e-8 + 8e-8 = 9e- 8.
4.35. C. The sum of independent Poissons is a Poisson, with a parameter the sum of the
individual Poisson parameters. In this case the portfolio is Poisson with a parameter = (10)(.1) = 1.
Chance of zero claims is e-1. Chance of one claim is (1)e-1.
Chance of more than one claim is: 1 - (e-1 + e-1) = 0.264.
4.36. E. This is a Poisson distribution with a parameter of 4.6. The mean is therefore 4.6. The
mode is the value at which f(α) is a maximum; f(4) = .188 and the mode is 4. Therefore statement
1 is true. For the Poisson the variance equals the mean, and therefore statement 2 is false.
For a discrete distribution such as we have here, the median is the first value at which the distribution
function is greater than or equal to .5. F(4) > 50%, and therefore the median is 4 and less than the
mean of 4.6. Therefore statement 3 is true.
n 0 1 2 3 4 5 6 7 8
f(n) 0.010 0.046 0.106 0.163 0.188 0.173 0.132 0.087 0.050
F(n) 0.010 0.056 0.163 0.326 0.513 0.686 0.818 0.905 0.955
Comment: For a Poisson with parameter λ, the mode is the largest integer in λ. In this case
λ = 4.6 so the mode is 4. Items 1 and 3 can be answered by computing enough values of the
density and adding them up. Alternately, since the distribution is skewed to the right (has positive
skewness), both the peak of the curve and the 50% point are expected to be to the left of the
mean. The median is less affected by the few extremely large values than is the mean, and therefore
for curves skewed to the right the median is smaller than the mean. For curves skewed to the right,
the largest single probability most commonly occurs at less than the mean, but this is not true of all
such curves.
4.37. B. Assume we have R risks in total. The Poisson distribution is given by:
f(n) = e−λλ n / n!, n=0,1,2,3,... Thus for n=0 we have R e−λ = 96. For n = 2 we have
R e−λ λ 2 / 2 = 3. By dividing these two equations we can solve for λ = (6/96).5 = 1/4.
The number of risks we expect to have 4 claims is: R e−λ λ 4 / 4! = (96)(1/4)4 / 24 = 0.0156.
4.38. B. cos(0) = 1. cos(π) = -1. cos(2π) = 1. cos(3π) = -1.
E[cos(πx)] = Σ (-1)x f(x) = e−λ{1 - λ + λ2/2! - λ3/3! + λ4/3! - λ5/5! + ...} =
e−λ{1 + (-λ) + (-λ)2/2! + (-λ)3/3! + (-λ)4/3! + (-λ)5/5! + ...} = e−λe−λ = e−2λ.
For λ = ln(2), e−2λ = 2-2 = 1/4.
4.39. A. If frequency is given by a Poisson and severity is independent of frequency, then the
number of claims above a certain amount (in constant dollars) is also a Poisson.
4.40. B. The mode of the Poisson with mean λ is the largest integer in λ.
The largest integer in n - 1/2 is n-1.
Alternately, for the Poisson f(x)/ f(x-1) = λ/x. Thus f increases when λ > x and decreases for λ < x.
Thus f increases for x < λ = n - .5. For integer n, x < n - .5 for x ≤ n -1.
Thus the density increases up to n - 1 and decreases thereafter. Therefore the mode is n - 1.
4.41. A. We want the chance of less than 3 claims to be less than .9. For a Poisson with mean λ,
the probability of 0, 1 or 2 claims is: e−λ(1 + λ + λ2 /2). Over two years we have a Poisson with mean
2λ. Thus we want e−2λ(1 + 2λ + 2λ2 ) < .9. Trying the endpoints of the given intervals we determine
that the smallest such λ must be less than 0.7.
Comment: In fact the smallest such λ is about 0.56.
4.42. D. If there are more than two losses, we are not concerned about those beyond the first two.
Since the sum of two independent Poissons is a Poisson, the portfolio has a Poisson frequency
distribution with mean of 0.3. Therefore, the chance of zero claims is e-0.3 = 0.7408, one claim is
0.3 e-0.3 = 0.2222, and of two or more claims is 1 - 0.7408 - 0.2222 = 0.0370.
0.1/0.3 = 1/3 of the losses are aircraft , and 0.2/0.3 = 2/3 of the losses are marine.
Thus the probability of the first two events, in the case of two or more events, is divided up as
(1/3)(1/3) = 1/9, 2(1/3)(2/3) = 4/9, (2/3)(2/3) = 4/9, between 2 aircraft, 1 aircraft and 1 marine, and 2
marine, using the binomial expansion for two events.
⇒ (0.2222)/(1/3) = 0.0741 = probability of one aircraft, (0.2222)(2/3) = probability of one marine,
(0.0370)(1/9) = 0.0041 = probability of 2 aircraft, (0.0370)(4/9) = probability one of each type,
(0.0370)(4/9) = 0.0164 = probability of 2 marine.
If there are zero claims, the insurer pays nothing. If there is one aircraft loss, the insurer pays nothing.
If there is one marine loss, the insurer pays 10 million. If there are two or more events there are three
possibilities for the first two events. If the first two events are aircraft, the insurer pays 10 million. If the
first two events are one aircraft and one marine, the insurer pays 20 million.
If the first two events are marine, the insurer pays 30 million.
Events Probability Losses from First 2 events Amount Paid by the Insurer
(first 2 only) ($ million) ($ million)
None 0.7408 0 0
1 Aircraft 0.0741 10 0
1 Marine 0.1481 20 10
2 Aircraft 0.0041 20 10
1 Aircraft, 1 Marine 0.0164 30 20
2 Marine 0.0164 40 30
1 2.345
Thus the insurerʼs expected annual payment is $2.345 million.
Comment: Beyond what you can expect to be asked on your exam.
4.43. The total number of claims from inexperienced drivers is Poisson with mean: (20)(.15) = 3.
The total number of claims from experienced drivers is Poisson with mean: (40)(.1) = 4.
The total number of claims from all drivers is Poisson with mean: 3 + 4 = 7.
Prob[# claims ≤ 3] = e-7(1 + 7 + 72 /2 + 73 /6) = 8.177%.
4.44. D. f(2) =3 f(4). ⇒ e−λλ 2/2 = 3e−λλ 4/24. ⇒ λ = 2. Variance = λ = 2.
4.45. C. The finding of the three different types of coins are independent Poisson processes.
Over the course of 60 minutes, Tom expects to find (.6)(.5)(60) = 18 coins worth 1 each and
(.2)(.5)(60) = 6 coins worth 10 each. Tom finds 10 coins worth 5 each. The expected worth of the
coins he finds is: (18)(1) + (10)(5) + (6)(10) = 128.
4.46. C. E[(X-1)+] = E[X] - E[X ∧ 1] = 1.5 - {0f(0) + 1(1 - f(0))} = .5 + f(0) = .5 + e-1.5 = .7231.
Expected Amount Paid is: 10,000E[(X-1)+] = 7231.
Alternately, Expected Amount Paid is: 10,000{1f(2) + 2f(3) + 3f(4) + 4f(5) + 5f(6) + ...} =
(10,000)e-1.5{1.52 /2 + (2)(1.53 /6) + (3)(1.54 /24) + (4)(1.55 /120) + (5)(1.56 /720) + ...} =
2231{1.125 + 1.125 + .6328 + .2531 + .0791 + .0203 + ...) ≅ 7200.
4.47. B. For this Lognormal Distribution, S(28,000) = 1 - Φ[ln(28000) - 10.12)/0.12] =

1 - Φ(1) = 1 - 0.8413 = 0.1587. Acceptable offers arrive via a Poisson Process at rate
2 S(28000) = (2)(.01587) = 0.3174 per month. Thus the number of acceptable offers over the first
3 months is Poisson distributed with mean (3)(0.3174) = .9522.
The probability of no acceptable offers over the first 3 months is: e -0.9522 = 0.386.
Alternately, the probability of no acceptable offers in a month is: e-0.3174.
Probability of no acceptable offers in 3 months is: (e-0.3174)3 = e -0.9522 = 0.386.
4.48. B. E[X ∧ 2] = E[Min[X, 2]] = (0)f(0) + 1f(1) + 2{1 - f(0) - f(1)}

= 0.6e-0.6 + 2{1 - e-0.6 - 0.6e-0.6} = 0.573.
E[(X ∧ 2)2 ] = E[Min[X, 2]2 ] = (0)f(0) + 1f(1) + 4{1 - f(0) - f(1)}
= 0.6e-0.6 + 4{1 - e-0.6 - 0.6e-0.6} = 0.8169.
Var[X ∧ 2] = Var[Min[X, 2]] = 0.8169 - 0.5732 = 0.4886.
Var[1000(X ∧ 2)] = (10002 )(0.4886) = 488,600.
StdDev[1000(X ∧ 2)] = 488,600 = 699.
Comment: The limited expected value, E[X ∧ x], is discussed in “Mahlerʼs Guide to Loss Dists.”
4.49. D. Trucks arrive at a Poisson rate of: (30%)(20) = 6 per hour.

f(0) = e-6. f(1) = 6e-6. f(2) = 62 e-6/2. 1 - {f(0) + f(1) + f(2)} = 1 - 25e-6 = 0.938.
4.50. D. If there is a storm within the first three years, then there is ruin, since the fund would have
only 65 + 30 = 95 or less. If there are two or more storms in the first ten years, then the fund is
ruined. Thus survival requires no storms during the first three years and at most one storm during the
next seven years. Prob[survival through 10 years] =
Prob[0 storms during 3 years] Prob[0 or 1 storm during 7 years] = (e-.3)(e-.7 + .7e-.7) = 0.625.
4.51. B. Claims of $100,000 or more are Poisson with mean: (5)(1 - 0.9) = 0.5 per year.
The number of large claims during 3 years is Poisson with mean: (3)(0.5) = 1.5.
f(0) = e-1.5 = 0.2231.
4.52. C. For λ = 1, E[N ∧ 1] = 0f(0) + 1(1 - f(0)) = 1 - e-1 = .6321.

For λ = 2, E[N ∧ 2] = 0f(0) + 1f(1) + 2(1 - f(0) - f(1)) = 2e-2 + 2(1 - e-2 - 2e-2) = 1.4587.
Expected number of claims before change: .6321 + 1.4587 = 2.091.
The sum of number of root canals and the number of fillings is Poisson with λ = 3.
For λ = 3, E[N ∧ 3] = 0f(0) + 1f(1) + 2f(2) + 3(1 - f(0) - f(1) - f(2)) =
3e-3 + (2)(9e-3/2) + 3(1 - e-3 - 3e-3 - 4.5e-3) = 2.328. Change is: 2.328 - 2.091 = 0.237.
Comment: Although it is not stated, we must assume that the number of root canals and the number
of fillings are independent.
4.53. D. Poisoned glasses of wine are Poisson with mean: (0.01)(2) = 0.02 per day.
The probability of no poisoned glasses over 30 days is: e-(30)(0.02) = e-0.6 = 0.549.
Comment: Survival corresponds to zero poisoned glasses of wine.
The king can drink any number of non-poisoned glasses of wine.
The poisoned and non-poisoned glasses are independent Poisson Processes.
4.54. B. After implementing a $500 deductible, only losses of size greater than 500 result in a
claim payment. Prob[loss > 500] = 1 - F(500) = 1 - .25 = .75.
Via thinning, large losses are Poisson with mean: (.75)(.01) = .0075.
Prob[at least one large loss] = 1 - e-.0075 = 0.00747.
4.55. C. By thinning, the number of goals is Poisson with mean (30)(0.1) = 3.

This Poisson has variance 3, and standard deviation: 3 = 1.732.
4.56. B. Large losses are Poisson with mean: (1 - .6)(.3) = 0.12.

Prob[at least one large loss] = 1 - e-.12 = 11.3%.
4.57. D. Payouts of size one are Poisson with λ = (1/2)(5) = 2.5 per hour.
Payouts of size one are Poisson with λ = (1/4)(5) = 1.25 per hour.
Payouts of size one are Poisson with λ = (1/8)(5) = 0.625 per hour.
Prob[0 of size 1 over 1/3 of an hour] = e-2.5/3.
The three Poisson Processes are independent, so we can multiply the above probabilities:
e-2.5/3e-1.25/3e-0.625/3 = e-1.458 = 0.233.
Alternately, payouts of sizes one, two, or three are Poisson with
λ = (1/2 + 1/4 + 1/8)(5) = 4.375 per hour.
Prob[0 of sizes 1, 2, or 3, over 1/3 of an hour] = e-4.375/3 = 0.233.
4.58. C. Catalogs are Poisson with mean over two days of: (2)(30%)(10) = 6.
Letters are Poisson with mean over two days of: (2)(20%)(10) = 4.
The Poisson processes are all independent. Therefore, knowing he got 20 applications tells us
nothing about the number of catalogs or letters.
Prob[ at least 3 letters] = 1 - e-4 - e-4 4 - e-4 42 /2 = 0.7619.
Prob[ 5 catalogs] = e-6 65 /120 = 0.1606.
Prob[ at least 3 letters and exactly 5 catalogs] = (0.7619)(0.1606) = 12.2%.
4.59. C. The number of endorsements and cancellations is Poisson with λ = (250)(60%) = 150.
Applying the normal approximation with mean and variance equal to 150:
Prob[more than 156] ≅ 1 - Φ[(156.5 - 150)/ 150 ] = 1 - Φ[0.53] = 29.8%.
Comment: Using the continuity correction, 156 is out and 157 is in:
156 156.5 157
|→
4.60. D. S(50) = e-50/15 = 0.03567. Large losses are Poisson with mean: 10 S(50) = 0.3567.
Over 30 days, large losses are Poisson with mean: (30)(0.3567) = 10.70.
Prob[exactly 9 large losses in a 30-day period] = e-10.7 10.79 / 9! = 11.4%.
2016-C-1, Frequency Distributions, §5 Geometric Dist. HCM 10/21/15, Page 75
Section 5, Geometric Distribution
The Geometric Distribution, a special case of the Negative Binomial Distribution, will be discussed
first.
Geometric Distribution
Support: x = 0, 1, 2, 3, ... Parameters: β > 0.
⎛ β ⎞x + 1
D. f. : F(x) = 1 - ⎜ ⎟
⎝ 1+ β ⎠
βx
P. d. f. : f(x) =
(1 + β) x + 1
f(0) = 1/ (1+β). f(1) = β / (1 + β)2 . f(2) = β2 / (1 + β)3 . f(3) = β3 / (1 + β)4 .
Mean = β
Variance = β(1+β) Variance / Mean = 1 + β > 1.
1 + β 1 + 2β
Coefficient of Variation = . Skewness = .
β β(1 + β)
6β 2 + 6 β + 1
Kurtosis = 3 + .
β (1+ β )
Mode = 0.
1
Probability Generating Function: P(z) = , z < 1 + 1/β.
1 - β(z - 1)
f(x+1)/f(x) = a + b/(x+1), a = β/(1+β), b = 0, f(0) = 1/(1+β).
Moment Generating Function: M(s) = 1/{1- β(es-1)}, s < ln(1+β) - ln(β).

Using the notation in Loss Models, the Geometric Distribution is:

⎛ β ⎞x
⎜ ⎟
⎝ 1+ β ⎠ βx
f(x) = = , x = 0, 1, 2, 3, ...
1+ β (1 + β) x + 1
For example, for β = 4, f(x) = 4x/5x+1, x = 0, 1, 2, 3, ...
A Geometric Distribution for β = 4:
Prob.
0.2
0.15
0.1
0.05
x
0 5 10 15 20
The densities decline geometrically by a factor of β/(1+β); f(x+1)/f(x) = β/(1+β).
This is similar to the Exponential Distribution, f(x+1)/f(x) = e−1/θ.

The Geometric Distribution is the discrete analog of the continuous Exponential Distribution.
For q = 0.3 or β = 0.7/0.3 = 2.333, the Geometric distribution is:

Number of Number of Claims Square of Number of Claims
Claims f(x) F(x) times f(x) times f(x)
0 0.30000 0.30000 0.00000 0.00000
1 0.21000 0.51000 0.21000 0.21000
2 0.14700 0.65700 0.29400 0.58800
3 0.10290 0.75990 0.30870 0.92610
4 0.07203 0.83193 0.28812 1.15248
5 0.05042 0.88235 0.25211 1.26052
6 0.03529 0.91765 0.21177 1.27061
7 0.02471 0.94235 0.17294 1.21061
8 0.01729 0.95965 0.13836 1.10684
9 0.01211 0.97175 0.10895 0.98059
10 0.00847 0.98023 0.08474 0.84743
11 0.00593 0.98616 0.06525 0.71777
12 0.00415 0.99031 0.04983 0.59794
13 0.00291 0.99322 0.03779 0.49123
14 0.00203 0.99525 0.02849 0.39880
15 0.00142 0.99668 0.02136 0.32046
16 0.00100 0.99767 0.01595 0.25523
17 0.00070 0.99837 0.01186 0.20169
18 0.00049 0.99886 0.00879 0.15828
19 0.00034 0.99920 0.00650 0.12345
20 0.00024 0.99944 0.00479 0.09575
21 0.00017 0.99961 0.00352 0.07390
22 0.00012 0.99973 0.00258 0.05677
23 0.00008 0.99981 0.00189 0.04343
24 0.00006 0.99987 0.00138 0.03311
25 0.00004 0.99991 0.00101 0.02515
Sum 2.33 13.15
As computed above, the mean is about 2.33. The second moment is about 13.15, so that the
variance is about 13.15 - 2.332 = 7.72. Since the Geometric has a significant tail, the terms involving
the number of claims greater than 25 would have to be taken into account in order to compute a
more accurate value of the variance or higher moments. Rather than taking additional terms it is better
to have a general formula for the moments.
The mean can be computed as follows:

β
∞ ⎛ β ⎞ j+1
∞
1+ β
E[X] = ∑ Prob[X > j] = ∑ ⎜ ⎟ = = β.
j = 0 ⎝ 1+ β ⎠ β
j=0 1 −
1+ β
Thus the mean for the Geometric distribution is β. For the example, β = 2.333 = mean.
The variance of the Geometric is β(1+β), which for β = 2.333 is 7.78.23
Survival Function:
Note that there is a small but positive chance of any very large number of claims.
Specifically for the Geometric distribution the chance of x > j is:
∞ ∞
⎛ β ⎞i
∑
βi 1 {β / (1+ β)} j + 1
∑ (1+ β)i + 1 = (1+1β) ⎜ ⎟ =
⎝ (1+β) ⎠ (1+ β) 1 - β / (1+β)
= {β/(1+β)}j+1.
i=j+1 i=j+1
1 - F(x) = S(x) = {β/(1+β)}x + 1.
For example, the chance of more than 19 claims is .720 = .00080, so that
F(19) = 1 - 0.00080 = 0.99920, which matches the result above.
Thus for a Geometric Distribution, for n > 0, the chance of at least n claims is (β/(1+β))n .
The survival function decreases geometrically. The chance of 0 claims from a Geometric is:
1/(1+β) = 1 - β/(1+β) = 1 - geometric factor of decline of the survival function.
Exercise: There is a 0.25 chance of 0 claims, 0.75 chance of at least one claim,
0.752 chance of at least 2 claims, 0.753 chance of at least 3 claims, etc. What distribution is this?
[Solution: This is a Geometric Distribution with 1/(1+β) = 0.25, β/(1+β) = 0.75, or β = 3.]
For the Geometric, F(x) = 1 - {β/(1+β)}x+1. Thus the Geometric distribution is the discrete analog of
the continuous Exponential Distribution which has F(x) = 1 - e-x/θ = 1 - (exp[-1/θ])x.
In each case the density function decreases by a constant multiple as x increases.
For the Geometric Distribution: f(x) = {β/(1+β)}x / (1+β) ,
while for the Exponential Distribution: f(x) = e-x/θ /θ = (exp[-1/θ])x / θ .

23
The variance is shown in Appendix B attached to the exam. One way to get the variance as well as higher moments
is via the probability generating function and factorial moments, as will be discussed subsequently.
Memoryless Property:
The geometric shares with the exponential distribution, the “memoryless property.”24 “Given that
there are at least m claims, the probability distribution of the number of claims in excess of m does
not depend on m.” In other words, if one were to truncate and shift a Geometric
Distribution, then one obtains the same Geometric Distribution.
Exercise: Let the number of claims be given by an Geometric Distribution with β = 1.7.
Eliminate from the data all instances where there are 3 or fewer claims and subtract 4
from each of the remaining data points. (Truncate and shift at 4.)
What is the resulting distribution?
[Solution: Due to the memoryless property, the result is a Geometric Distribution with β = 1.7.]
Generally, let f(x) be the original Geometric Distribution. Let g(x) be the truncated and shifted
distribution. Take as an example, a truncation point of 4 as in the exercise.
f(x + 4) βx + 4 / (1+β)x + 5
Then g(x) = = f(x+4) / S(3) = = βx/(1+β)x+1,
1 - {f(0) + f(1)+ f(2)+ f(3)} β 4 / (1+ β)4
which is again a Geometric Distribution with the same parameter β.
Constant Force of Mortality:
Another application where the Geometric Distribution arises is constant force of mortality, when one
only looks at regular time intervals rather than at time continuously.25
Exercise: Every year in which Jim starts off alive, he has a 10% chance of dying during that year.
If Jim is currently alive, what is the distribution of his curtate future lifetime?
[Solution: There is a 10% chance he dies during the first year, and has a curtate future lifetime of 0.
If he survives the first year, there is a 10% chance he dies during the second year. Thus there is a
(0.9)(0.1) = 0.09 chance he dies during the second year, and has a curtate future lifetime of 1. If he
survives the second year, which has probability 0.92 , there is a 10% chance he dies during the third
year. Prob[curtate future lifetime = 2] = (0.92 )(0.1).
Similarly, Prob[curtate future lifetime = n] = (0.9n )(0.1).
This is a Geometric Distribution with β/(1+β) = 0.9 or β = 0.9/0.1 = 9.]
24
See Section 6.3 of Loss Models. It is due to this memoryless property of the Exponential and Geometric
distributions, that they have constant mean residual lives, as discussed subsequently.
25
When one has a constant force of mortality and looks at time continuously, one gets the Exponential Distribution,
the continuous analog of the Geometric Distribution.
In general, if there is a constant probability of death each year q, then the curtate future lifetime,26 K(x),
follows a Geometric Distribution, with β = (1-q)/q =
probability of continuing sequence / probability of ending sequence.
Therefore, for a constant probability of death each year, q, the curtate expectation of life,27 ex, is
β = (1-q)/q, the mean of this Geometric Distribution. The variance of the curtate future lifetime is:
β(1+β) = {(1-q)/q}(1/q) = (1-q)/q2 .
What is Jimʼs curtate expectation of life and variance of his curtate future lifetime?
[Solution: Jimʼs curtate future lifetime is Geometric, with mean = β = (1 - 0.1)/0.1 = 9,
and variance β(1+β) = (9)(10) = 90.]
Exercise: Jim has a constant force of mortality, µ = 0.10536.

What is the distribution of Jimʼs future lifetime.
What is Jimʼs complete expectation of life?
What is the variance of Jimʼs future lifetime?
[Solution: It is Exponential, with mean θ = 1/µ = 9.49, and variance θ2 = 90.1.
Comment: Jim has a 1 - e-0.10536 = 10% chance of dying each year in which he starts off alive.
However, here we look at time continuously.]
With a constant force of mortality:

observe continuously ⇔ Exponential Distribution
observe at discrete intervals ⇔ Geometric Distribution.
Jimʼs estate will be paid $1 at the end of the year of his death.
At a 5% annual rate of interest, what is the present value of this benefit?
26
The curtate future lifetime is the number of whole years completed prior to death.
See page 54 of Actuarial Mathematics.
27
The curtate expectation of life is the expected value of the curtate future lifetime.
[Solution: If Jimʼs curtate future lifetime is n, there is a payment of 1 at time n + 1.

∞ ∞
Present value = ∑ f(n) vn + 1 = ∑ (9n / 10n + 1) 0.9524n + 1 = (0.09524)/(1 - 0.8572) = 0.667.]
0 0
In general, if there is a constant probability of death each year q, the present value of $1 paid at the
∞
βn v 1 1
end of the year of death is: ∑ (1+ β)n + 1 vn + 1 = =
(1+β) 1 - v β / (1+ β) (1+ β) / v - β
=
0
1 1
= = q/(q + i).28
(1+ β)(1+ i) - β (1+ i) / q - (1- q) / q
At a 5% annual rate of interest, what is the present value of the benefits from an annuity immediate
with annual payment of 1?
[Solution: If Jimʼs curtate future lifetime is n, there are n payments of 1 each at times:
1, 2, 3,.., n. The present value of these payments is: v + v2 + ... + vn = (1 - vn ) / i.
∞
Present value = ∑ f(n)(1 - vn ) / i = (1/i) {Σf(n) - Σ (9n/10n+1) vn} =
n=0
20{1 - (0.1)/(1- 0.9/1.05)} = 6.]
In general, if there is a constant probability of death each year q, the present value of the benefits
from an annuity immediate with annual payment of 1:
∞ ∞ ∞
βn
∑ f(n)(1 - vn ) / i= { ∑ f(n) - ∑ (1+β)
n
n + 1 (v ) } /i =
n=0 n=0 n=0
1/ (1+ β)
(1/i){1 - } = (1/i){1 - 1/(1 + β - vβ)} = (1/i){(β - vβ)/(1 + β - vβ)} =
1 - vβ / (1+ β)
= (1/(1+i)) {β/(1 + β - vβ)} = (1/(1+i)) {1/(1/β + 1 - v)} = (1/(1+i)) {1/(q/(1-q) + 1 - v)} =

(1/(1+i)) {(1-q)/(q + (1 - v)(1-q))} = (1-q)/(q(1+i) + i(1-q))} = (1-q)/(q+i).29
In the previous exercise, the present value of benefits is: (1 - 0.1)/(0.1 + 0.05) = 0.9/0.15 = 6.
For i = 0, (1-q)/(q+i) becomes (1-q)/q, the mean of the Geometric Distribution of curtate future
lifetimes. For q = 0, (1-q)/(q+i) becomes 1/i, the present value of a perpetuity, with the first
payment one year from now.
28
With a constant force of mortality µ, the present value of $1 paid at the time of death is: µ/(µ + δ).
29
With a constant force of mortality µ, the present value of a life annuity paid continuously is: 1/(µ + δ).
Series of Bernoulli Trials:
For a series of Bernoulli trials with chance of success 0.3, the probability that there are no success in
the first four trials is: (1 - 0.3)4 = 0.24.
Exercise: What is the probability that there is no success in the first four trials and the fifth trial is a
success?
[Solution: (1 - 0.3)4 (0.3) = 0.072 = the probability of the first success occurring on the fifth trial.]
In general, the chance of the first success after x failures is: (1 - 0.3)x(0.3).
More generally, take a series of Bernoulli trials with chance of success q. The probability of the first
success on trial x+1 is: (1-q)x q.
f(x) = (1-q)x q, x = 0, 1, 2, 3,...
This is the Geometric distribution. It is a special case of the Negative Binomial Distribution.30
Loss Models uses the notation β, where q = 1/(1+β).
β = (1-q) / q = probability of a failure / probability of a success.
For a series of independent identical Bernoulli trials, the chance of the first success
following x failures is given by a Geometric Distribution with mean:
β = chance of a failure / chance of a success.
The number of trials = 1 + number of failures = 1 + Geometric.
The Geometric Distribution shows up in many applications, including Markov Chains and Ruin
Theory. In many contexts:
β = probability of continuing sequence / probability of ending sequence
= probability of remaining in the loop / probability of leaving the loop.
30
The Geometric distribution with parameter β is the Negative Binomial Distribution with parameters β and r=1.
Problems:
The following five questions all deal with a Geometric distribution with β = 0.6.
5.1 (1 point) What is the mean?

(A) 0.4 (B) 0.5 (C) 0.6 (D) 2/3 (E) 1.5
5.2 (1 point) What is the variance?

A. less than 1.0
E. at least 1.3
5.3 (2 points) What is the chance of having 3 claims?

A. less than 3%
E. at least 6%
5.4 (2 points) What is the mode?

5.5 (2 points) What is the chance of having 3 claims or more?

A. less than 3%
E. at least 6%
5.6 (1 point) The variable N is generated by the following algorithm:

(1) N = 0.
(2) 25% chance of exiting.
(3) N = N + 1.
(4) Return to step #2.
A. less than 10
B. at least 10 but less than 15
C. at least 15 but less than 20
D. at least 20 but less than 25
E. at least 25
5.7 (2 points) Use the following information:

• Assume a Rating Bureau has been making Workersʼ Compensation classification
rates for a very, very long time.
• Assume every year the rate for the Carpenters class is based on a credibility
weighting of the indicated rate based on the latest year of data and the current rate.
• Each year, the indicated rate for the Carpenters class is given 20% credibility.
• Each year, the rate for year Y, was based on the data from year Y-3 and the rate
in the year Y-1. Specifically, the rate in the year 2001 is based on the data from
1998 and the rate in the year 2000.
What portion of the rate in the year 2001 is based on the data from the year 1990?
A. less than 1%
E. at least 4%
5.8 (3 points) An insurance company has stopped writing new general liability insurance policies.
However, the insurer is still paying claims on previously written policies. Assume for simplicity that
payments are made at the end of each quarter of a year. It is estimated that at the end of each
quarter of a year the insurer pays 8% of the total amount remaining to be paid. The next payment
will be made today.
Let X be the total amount the insurer has remaining to pay.
Let Y be the present value of the total amount the insurer has remaining to pay.
If the annual rate of interest is 5%, what is the Y/X?
A. 0.80 B. 0.82 C. 0.84 D. 0.86 E. 0.88

There is a constant force of mortality of 3%.
There is an annual interest rate of 4%.
5.9 (1 point) What is the curtate expectation of life?

(A) 32.0 (B) 32.2 (C) 32.4 (D) 32.6 (E) 32.8
5.10 (1 point) What is variance of the curtate future lifetime?

(A) 900 (B) 1000 (C) 1100 (D) 1200 (E) 1300
5.11 (2 points) What is the actuarial present value of a life insurance which pays 100,000 at the end
of the year of death?
(A) 41,500 (B) 42,000 (C) 42,500 (D) 43,000 (E) 43,500
5.12 (2 points) What is the actuarial present value of an annuity immediate which pays 10,000 per
year?
(A) 125,000 (B) 130,000 (C) 135,000 (D) 140,000 (E) 145,000
5.13 (1 point) After each time Mark Orfe eats at a restaurant, there is 95% chance he will eat there
again at some time in the future. Mark has eaten today at the Phoenicia Restaurant.
What is the probability that Mark will eat at the Phoenicia Restaurant precisely 7 times in the future?
A. 2.0% B. 2.5% C. 3.0% D. 3.5% E. 4.0%

• The number of days of work missed by a work related injury to a workersʼ arm is
Geometrically distributed with β = 4.
• If a worker is disabled for 5 days or less, nothing is paid for his lost wages under
workers compensation insurance.
• If he is disabled for more than 5 days due to a work related injury, workers compensation insurance
pays him his wages for all of the days he was out of work.
What is the average number of days of wages reimbursed under workers compensation insurance
for a work related injury to a workersʼ arm?
(A) 2.2 (B) 2.4 (C) 2.6 (D) 2.8 (E) 3.0

The variable X is generated by the following algorithm:
(1) X = 0.
(2) Roll a fair die with six sides and call the result Y.
(3) X = X + Y.
(4) If Y = 6 return to step #2.
(5) Exit.
5.15 (2 points) What is the mean of X?

A. less than 4.0
E. at least 5.5
5.16 (2 points) What is the variance of X?

A. less than 8
E. at least 11
5.17 (1 point) N follows a Geometric Distribution with β = 0.2. What is Prob[N = 1 | N ≤ 1]?
A. 8% B. 10% C. 12% D. 14% E. 16%
5.18 (1 point) N follows a Geometric Distribution with β = 0.4. What is Prob[N = 2 | N ≥ 2]?
A. 62% B. 65% C. 68% D. 71% E. 74%
5.19 (2 points) N follows a Geometric Distribution with β = 1.5. What is E[1/(N+1)]?

Hint: x + x2 /2 + x3 /3 + x4 /4 + ... = -ln(1-x), for 0 < x < 1.
A. 0.5 B. 0.6 C. 0.7 D. 0.8 E. 0.9
5.20 (2 points) N follows a Geometric Distribution with β = 0.8. What is E[N | N > 1]?
A. 2.6 B. 2.7 C. 2.8 D. 2.9 E. 3.0

• Larry, his brother Darryl, and his other brother Darryl are playing as a three man
basketball team at the school yard.
• Larry, Darryl, and Darryl have a 20% chance of winning each game, independent of any
other game.
• When a teamʼs turn to play comes, they play the previous winning team.
• Each time a team wins a game it plays again.
• Each time a team loses a game it sits down and waits for its next chance to play.
• It is currently the turn of Larry, Darryl, and Darryl to play again after sitting for a while.
Let X be the number of games Larry, Darryl, and Darryl play until they sit down again.
What is the variance of X?
A. 0.10 B. 0.16 C. 0.20 D. 0.24 E. 0.31

N follows a Geometric Distribution with β = 1.3.
Define (N - j)+ = n-j if n ≥ j, and 0 otherwise.
5.22 (2 points) Determine E[(N - 1)+].

A. 0.73 B. 0.76 C. 0.79 D. 0.82 E. 0.85

A. 0.30 B. 0.33 C. 0.36 D. 0.39 E. 0.42
Use the following information for the next three questions:

Ethan is an unemployed worker. Ethan has a 25% probability of finding a job each week.
5.24 (2 points) What is the probability that Ethan is still unemployed after looking for a job for 6
weeks?
A. 12% B. 14% C. 16% D. 18% E. 20%
5.25 (1 point) If Ethan finds a job the first week he looks, count this as being unemployed 0 weeks.
If Ethan finds a job the second week he looks, count this as being unemployed 1 week, etc.
What is the mean number of weeks that Ethan remains unemployed?
A. 2 B. 3 C. 4 D. 5 E. 6
5.26 (1 point) What is the variance of the number of weeks that Ethan remains unemployed?
A. 12 B. 13 C. 14 D. 15 E. 16
∞
5.27 (3 points) For a discrete density pk, define the entropy as: - ∑ pk ln[pk].
k=0
Determine the entropy for a Geometric Distribution as per Loss Models.

Hint: Why is the mean of the Geometric Distribution β?
5.28 (2, 5/85, Q.44) (1.5 points) Let X denote the number of independent rolls of a fair die
required to obtain the first "3". What is P[X ≥ 6]?
A. (1/6)5 (5/6) B. (1/6)5 C. (5/6)5 (1/6) D. (5/6)6 E. (5/6)5
5.29 (2, 5/88, Q.22) (1.5 points) Let X be a discrete random variable with probability function
P[X = x] = 2/3x for x = 1, 2, 3, . . . What is the probability that X is even?
A. 1/4 B. 2/7 C. 1/3 D. 2/3 E. 3/4
5.30 (2, 5/90, Q.5) (1.7 points) A fair die is tossed until a 2 is obtained. If X is the number of trials
required to obtain the first 2, what is the smallest value of x for which P[X ≤ x] ≥ 1/2?
A. 2 B. 3 C. 4 D. 5 E. 6
5.31 (2, 5/92, Q.35) (1.7 points) Ten percent of all new businesses fail within the first year. The
records of new businesses are examined until a business that failed within the first year is found. Let
X be the total number of businesses examined which did not fail within the first year, prior to finding a
business that failed within the first year. What is the probability function for X?
A. 0.1(0.9x) for x = 0, 1, 2, 3,... B. 0.9x(0.1x) for x = 1, 2, 3,... C. 0.1x(0.9x) for x = 0, 1, 2, 3,...
D. 0.9x(0.1x) for x = 1, 2,3,... E. 0.1(x - 1)(0.9x) for x = 2, 3, 4,...
5.32 (Course 1 Sample Exam, Q. 7) (1.9 points) As part of the underwriting process for
insurance, each prospective policyholder is tested for high blood pressure. Let X represent the
number of tests completed when the first person with high blood pressure is found.
The expected value of X is 12.5.
Calculate the probability that the sixth person tested is the first one with high blood pressure.
A. 0.000 B. 0.053 C. 0.080 D. 0.316 E. 0.394
5.33 (1, 5/00, Q.36) (1.9 points) In modeling the number of claims filed by an individual under an
automobile policy during a three-year period, an actuary makes the simplifying assumption that for all
integers n ≥ 0, pn+1 = pn /5, where pn represents the probability that the policyholder files n claims
during the period. Under this assumption, what is the probability that a policyholder files more than
one claim during the period?
(A) 0.04 (B) 0.16 (C) 0.20 (D) 0.80 (E) 0.96
5.34 (1, 11/01, Q.33) (1.9 points) An insurance policy on an electrical device pays a benefit of
4000 if the device fails during the first year. The amount of the benefit decreases by 1000 each
successive year until it reaches 0. If the device has not failed by the beginning of any given year,
the probability of failure during that year is 0.4.
What is the expected benefit under this policy?
(A) 2234 (B) 2400 (C) 2500 (D) 2667 (E) 2694
5.1. C. mean = β = 0.6.
5.2. A. variance = β(1+ β) = (0.6)(1.6) = 0.96.
5.3. B. f(x) = βx / (1+β)x+1. f(3) = (.6)3 / (1.6)3+1 = 3.30%.
5.4. A. The mode is 0, since f(0) is larger than any other value.
n 0 1 2 3 4 5 6 7
f(n) 0.6250 0.2344 0.0879 0.0330 0.0124 0.0046 0.0017 0.0007
Comment: Just as with the Exponential Distribution, the Geometric Distribution always has a mode
of zero.
5.5. D. 1 - {f(0) + f(1) + f(2)} = 1 - (0.6250 + 0.2344 + 0.0879) = 5.27%.

Alternately, S(x) = (β/(1+β))x+1. S(2) = (0.6/1.6)3 = 5.27%.
5.6. B. This is a loop, in which each time through there is a 25% of exiting and a 75% chance of
staying in the loop. Therefore, N is Geometric with
β = probability of remaining in the loop / probability of leaving the loop = 0.75/0.25 = 3.
Variance = β(1 + β) = (3)(4) = 12.
5.7. D. In making the rate for the year 2000, we give 20% weight to the data for the year 1997 and
the remaining weight of 80% to the then current rate, that for 1999.
The weight given to the data from year 1997 in the rate for year 2000 is 0.20.
In making the rate for the year 2001, we give 20% weight to the data for the year 1987 and the
remaining weight of 80% to the then current rate, that for year 2000.
Therefore, the weight given to the data from year 1997 in the rate for year 2001 is: (1 - 0.2)(0.2).
One could go through and do similar reasoning to determine how much weight the data for 1996
gets in the year 2011 rate.
Similarly, the weight given to the data from year 1996 in the rate for year 2001 is: (1-0.2)2 (0.2).
Given the pattern, we infer that the weight given to the data from year 1990 in the rate for year 2001
is: (1-0.2)8 (0.2) = 3.4%.
Comment: The weights are from a geometric distribution with β = 1/Z - 1, and β/(1+β) = 1- Z.
The weights are: (1-Z)n Z for n = 0,1,2,... Older years of data get less weight.
This is a simplification of a real world application, as discussed in “An Example of Credibility
and Shifting Risk Parameters”, by Howard C. Mahler, PCAS 1990.
5.8. E. Let Z be the amount remaining to be paid prior to quarter n. Then the payment in quarter n
is 0.08Z. This leaves 0.92Z remaining to be paid prior to quarter n+1. Thus the payment in quarter
n+1 is (0.08)(0.92)Z. The payment in quarter n+1 is 0.92 times the payment in quarter n.
The payments each quarter decline by a factor of 0.92.
Therefore, the proportion of the total paid in each quarter is a Geometric Distribution with
β/(1+β) = 0.92. ⇒ β = 0.92/(1-0.92) = 11.5. The payment at the end of quarter n is:
X f(n) = X βn /(1+β)n+1, n = 0, 1, 2, ... (The sum of these payment is X.)
The present value of the payment at the end of quarter n is:
X f(n)/(1.05)n/4 = X(.9879n )βn /(1+β)n+1, n = 0, 1, 2, ...
Y, the total present value is:
∞ ∞ ∞
∑ X (0.9879 ) βn n
/ (1+β) n + 1 = {X/(1+β)} ∑ (0.9879 β / (1+ β)} n = (X/12.5) ∑ 0.9089n
n=0 n=0 n=0
= (X/12.5) / (1-0.9089) = 0.878X. Y/X = 0.878.

Comment: In “Measuring the Interest Rate Sensitivity of Loss Reserves,” by Richard Gorvett
and Stephen DʼArcy, PCAS 2000 , a geometric payment pattern is used in order to estimate
Macaulay durations, modified durations, and effective durations.
5.9. E. If a person is alive at the beginning of the year, the chance they die during the next year is:
1 - e−µ = 1 - e-0.03 = 0.02955. Therefore, the distribution of curtate future lifetimes is Geometric with
mean β = (1-q)/q = 0.97045/0.02955 = 32.84 years.
Alternately, the complete expectation of life is: 1/µ = 1/0.03 = 33.33.
The curtate future lifetime is on average about 1/2 less; 33.33 - 1/2 = 32.83.
5.10. C. The variance of the Geometric Distribution is: β(1+β) = (32.84)(33.84) = 1111.
Alternately, the future lifetime is Exponentially distributed with mean θ = 1/µ = 1/.03 = 33.33, and
variance θ2 = 33.332 = 1111. Since approximately they differ by a constant, 1/2, the variance of the
curtate future lifetimes is approximately that of the future lifetimes, 1111.
5.11. C. With constant probability of death, q, the present value of the insurance is: q/(q + i).
(100,000)q/(q + i) = (100000)(0.02955) / (0.02955 + 0.04) = 42,487.
Alternately, the present value of an insurance that pays at the moment of death is:
µ/(µ+δ) = 0.03 / (0.03 + ln(1.04)) = 0.03 / (0.03 + 0.03922) = 0.43340.
(100000)(0.43340) = 43,340.
The insurance paid at the end of the year of death is paid on average about 1/2 year later;
43340/(1.04.5) = 42,498.
5.12. D. With constant probability of death, q, the present value of an annuity immediate is:
(1-q)/(q +i).
(10000)(1-q)/(q +i) = (10000)(1 - .02955)/(.02955 + .04) = 139,533.
Alternately, the present value of an annuity that pays continuously is:
1/(µ+δ) = 1/(.03 + ln(1.04)) = 1/(.03 + .03922) = 14.4467. (10000)(14.4467) = 144,467.
Discounting for another half year of interest and mortality, the present value of the annuity immediate
is approximately: 144,467/((1.04.5)(1.03.5)) = 139,583.
5.13. D. There is a 95% chance Mark will return. If he returns, there is another 95% chance he will
return again, etc. The chance of returning 7 times and then not returning an 8th time is:
(.957 )(.05) = 3.5%.
Comment: The number of future visits is a Geometric Distribution with β =
probability of continuing sequence / probability of ending sequence = .95/.05 = 19.
f(7) = β7 /(1+β)8 = 197 /208 = 3.5%.
5.14. C. If he is disabled for n days, then he is paid 0 if n ≤ 5, and n days of wages if n ≥ 6.

Therefore, the mean number of days of wages paid is:
∞ ∞ 5
∑ n f(n) = ∑ n f(n) - ∑ n f(n) = E[N] - {0f(0) + 1f(1) + 2f(2) + 3f(3) + 4f(4) + 5f(5)} =
n=6 n=0 n=0
4 - {(1)(0.2)(0.8) + (2)(0.2)(0.82 ) + (3)(0.2)(0.83 ) + (4)(0.2)(0.84 ) + (5)(0.2)(0.85 )} = 2.62.

Alternately, due to the memoryless property of the Geometric Distribution (analogous to its
continuous analog the Exponential), truncated and shifted from below at 6, we get the same
Geometric Distribution. Thus if only those days beyond 6 were paid for, the average nonzero
payment is 4. However, in each case where we have at least 6 days of disability we pay the full
length of disability which is 6 days longer, so the average nonzero payment is: 4 + 6 = 10.
The probability of a nonzero payment is: 1 - {f(0) + f(1) + f(2) + f(3) + f(4) + f(5)} =
1 - {0.2 + (0.2)(0.8) + (0.2)(0.82 ) + (0.2)(0.83 ) + (0.2)(0.84 ) + (0.2)(0.85 )} = 0.262.
Thus the average payment (including zero payments) is: (0.262)(10 days) = 2.62 days.
Comment: Just an exam type question, not intended as a model of the real world.
Asks for the average payment per injury, including the zeros.
The solution is: E[X | X > 5] Prob[X > 5] + 0 Prob[X ≤ 5].
5.15. B. & 5.16. D. The number of additional dies rolled beyond the first is Geometric with
β = probability of remaining in the loop / probability of leaving the loop = (1/6)/(5/6) = 1/5.
Let N be the number of dies rolled, then N - 1 is Geometric with β = 1/5.
X = 6(N - 1) + the result of the last 6-sided die rolled.
The result of the last six sided die to be rolled is equally likely to be a 1, 2, 3, 4 or 5 (it canʼt be a six
or we would have rolled an additional die.)
E[X] = (6)(mean of a Geometric with β = 1/5) + (average of 1,2,3,4,5) = (6)(1/5) + 3 = 4.2.
Variance of the distribution equally likely to be 1, 2, 3, 4, or 5 is: (22 + 12 + 02 + 12 + 22 )/5 = 2.
Var[X] = 62 (variance of a Geometric with β = 1/5) + 2 = (36)(1/5)(6/5) + 2 = 10.64.
5.17. D. Prob[N = 1 | N ≤ 1] = Prob[N = 1]/Prob[N ≤ 1] = β/(1+β)2 /{1/(1+β)) + β/(1+β))2 } =

β/(1 + 2β) = .2/1.4 = 0.143.
5.18. D. Prob[N = 2 | N ≥ 2] = Prob[N = 2]/Prob[N ≥ 2] = β2/(1+β)3 /{β2/(1+β)2 } = 1/(1+β).

Alternately, from the memoryless property, Prob[N = 2 | N ≥ 2] = Prob[N = 0] = 1/(1+β) = .714.
∞ ∞ ∞
⎛ β ⎞m
5.19. B. E[1/(N+1)] = ∑ f(n)/ (n + 1) = ∑ f(m- 1) / m = (1/β) ∑ ⎜ ⎟ /m
⎝ 1+ β ⎠
n=0 m=1 m=1
= (1/β) {-ln(1 - β/(1+β)} = ln(1+β)/β = ln(2.5)/1.5 = 0.611.
5.20. C. E[N | N > 1]Prob[N > 1] + (1) Prob[N = 1] + (0) Prob[N = 0] = E[N] = β.
E[N | N > 1] = {β - β/(1+β)2 } / {β2/(1+β)2 } = 2 + β = 2.8.
5.21. E. X is 1 + a Geometric Distribution with

β = (chance of remaining in the loop)/(chance of leaving the loop) = .2/.8 = 1/4.
Variance of X is: β(1+β) = (1/4)(5/4) = 5/16 = 0.3125.
Comment: Prob[X = 1] = 1 - .2 = .8. Prob[X = 2] = (.2)(.8). Prob[X = 3] = (.22 )(.8).
Prob[X = 4] = (.23 )(.8). Prob[X = x] = (.2x-1)(.8). While this is a series of Bernoulli trials, it ends
when the team has its first failure. X is the number of trials through the first failure.
5.22. A. E[(N - 1)+] = E[N] - E[N ∧ 1] = β - Prob[N ≥ 1] = β − β/(1+β) = β2/(1+β) = 0.7348.
Alternately, E[(N-1)+] = E[(1-N)+] + E[N] - 1 = Prob[N = 0] + β - 1 =
β + 1/(1+β) - 1 = 1.3 + 1/2.3 - 1 = 0.7348.
Alternately, the memoryless property of the Geometric ⇒ E[(N-1)+]/Prob[N≥1] = E[N] = β. ⇒
E[(N-1)+] = β Prob[N≥1] = β β/(1+β) = β2/(1+β) = 0.7348.
5.23. E. E[(N - 2)+] = E[N] - E[N ∧ 2] = β - (Prob[N = 1] + 2 Prob[N ≥ 2]) =
β - β/(1+β)2 - 2β2/(1+β)2 = {β(1+β)2 - β - 2β2}/(1+β)2 = β3/(1+β)2 = 1.33 /2.32 = 0.415.

Alternately, E[(N-2)+] = E[(2-N)+] + E[N] - 2 = 2Prob[N = 0] + Prob[N = 1] + β - 2 =
β + 2/(1+β) + β/(1+β)2 - 2 = 1.3 + 2/2.3 + (1.3)/2.32 - 2 = 0.415.
Alternately, the memoryless property of the Geometric ⇒ E[(N-2)+]/Prob[N≥2] = E[N] = β. ⇒
E[(N-2)+] = β Prob[N≥2] = β β2/(1+β)2 = β3/(1+β)2 = 0.415.
Comment: For integral j, for the Geometric, E[(N - j)+] = βj+1/(1+β)j.
5.24. D. Probability of finding a job within six weeks is:

(.25){1 + .75 + .752 + .753 + .754 + .755 } = .822. 1 - .822 = 17.8%.
5.25. B. The number of weeks he remains unemployed is Geometric with

β = (chance of failure) / ( chance of success) = 0.75/0.25 = 3. Mean = β = 3.
5.26. A. Variance of this Geometric is: β (1 + β) = (3)(4) = 12.

∞
5.27. The mean of a Geometric Distribution is β. ⇒ ∑ k pk = β.
k=0
βk
pk = . ⇒ ln[pk] = k ln[β] - (k+1) ln[1 + β] = {ln[β] - ln[1 + β]} k - ln[1 + β] .
(1+ β) k + 1
∞ ∞ ∞
- ∑ pk ln[pk] = {ln[1 + β] - ln[β]} ∑ k pk + ln[1 + β] ∑ pk = {ln[1 + β] - ln[β]} β + ln[1 + β] (1) =
k=0 k=0 k=0
(1 + β) ln[1 + β] - β ln[β].
Comment: The Shannon entropy from information theory, except there the log is to the base 2.
5.28. E. Prob[X ≥ 6] = Prob[first 5 rolls each ≠ 3] = (5/6)5 .

Alternately, the number failures before the first success, X - 1, is Geometric with
β = chance of failure / chance of success = (5/6)/(1/6) = 5.
Prob[X ≥ 6] = Prob[# failures ≥ 5] = 1 - F(4) = {β/(1+β)}4+1 = (5/6)5 .
5.29. A. Prob[X is even] = 2/32 + 2/34 + 2/36 + ... = (2/9)/(1 - 1/9) = 1/4.
Comment: X - 1 follows a Geometric Distribution with β = 2.
5.30. C. X - 1 is Geometric with β = chance of failure / chance of success = (5/6)/(1/6) = 5.

Prob[X - 1 ≥ x - 1] = {β/(1+β)}x = (5/6)x.
We want to find where the distribution function is at least 1/2.
Thus we want to find the pace where the survival function is 1/2.
Set Prob[X ≥ x] =Prob[X-1 ≥ x-1] = 1/2: 1/2 = (5/6)x.
x = ln(1/2)/ln(5/6) = 3.8. The next greatest integer is 4. P[X ≤ 4] = 1 - (5/6)4 = .518 ≥ 1/2.
Alternately, Prob[X = x] = Prob[X - 1 tosses ≠ 2]Prob[X = 2] = (5/6)x-1/6.
X Probability Cumulative
1 0.1667 0.1667
2 0.1389 0.3056
3 0.1157 0.4213
4 0.0965 0.5177
5 0.0804 0.5981
6 0.0670 0.6651
5.31. A. X has a Geometric with β = chance of continuing / chance of ending = (.9)/(.1) = 9.

f(x) = 9x/10x+1 = (0.1)(0.9x ), for x = 0, 1, 2, 3,...
5.32. B. This is a series of Bernoulli trials, and X - 1 is the number of failures before the first success.
Thus X - 1 is Geometric. β = E[X - 1] = 12.5 - 1 = 11.5.
Prob[X = 6] = Prob[X - 1 = 5] = f(5) = β5/(1+ β)6 = 11.55 /12.56 = 0.0527.

Alternately, Prob[person has high blood pressure] = 1/E[X] = 1/12.5 = 8%.
Prob[sixth person is the first one with high blood pressure]
= Prob[first five donʼt have high blood pressure] Prob[sixth has high blood pressure]
= (1 - 0.08)5 (0.08) = 0.0527.
5.33. A. The densities are declining geometrically.

Therefore, this is a Geometric Distribution, with β/(1+ β) = 1/5. ⇒ β = 1/4.
Prob[more than one claim] = 1 - f(0) - f(1) = 1 - 1/(1+ β) - β/(1+ β)2 = 1 - 4/5 - 4/25 = 0.04.
5.34. E. Expected Benefit =

(4000)(0.4) + (3000)(0.6)(0.4) + (2000)(0.62 )(0.4) + (1000)(0.63 )(0.4) = 2694.
Alternately, the benefit is 1000(4 - N)+, where N is the number of years before the device fails.
N is Geometric, with 1/(1 + β) = .4. ⇒ β = 1.5.

E[N ∧ 4] = 0f(0) + 1f(1) + 2f(2) + 3f(3) + 4{1 - f(0) - f(1) - f(2) - f(3)} = 4 - 4f(0) - 3f(1) - 2f(2) - f(3).
Expected Benefit = 1000E[(4 - N)+] = 1000(4 - E[N ∧ 4]) = 1000{4f(0) + 3f(1) + 2f(2) + f(3)}
= 1000{4(0.4) + 3(0.4)(0.6) + 2(0.4)(0.62 ) + (0.4)(0.63 )} = 2694.
2016-C-1, Frequency Distributions, §6 Negative Binomial Dist. HCM 10/21/15, Page 97
Section 6, Negative Binomial Distribution
The third and final important frequency distribution is the Negative Binomial, which has the Geometric
as a special case.
Negative Binomial Distribution
Support: x = 0, 1, 2, 3, ... Parameters: β > 0, r ≥ 0. r = 1 is a Geometric Distribution
D. f. : F(x) = β(r, x+1 ; 1/(1+β)) =1- β( x+1, r ; β/(1+β) ) Incomplete Beta Function
r(r + 1)...(r + x - 1) βx ⎛ x+ r − 1⎞ βx
P. d. f. : f(x) = = ⎜ ⎟ .
x! (1+ β )x + r ⎝ x ⎠ (1+ β) x + r
1 rβ
f(0) = f(1) =
(1 + β)r (1 + β)r + 1
2 3
r (r + 1) β r (r + 1) (r + 2) β
f(2) = f(3) =
2 (1 + β)r + 2 6 (1 + β)r + 3
Mean = rβ Variance = rβ(1+β) Variance / Mean = 1 + β > 1.
1+β 1 + 2β
Coefficient of Variation = Skewness = = CV(1+2β)/(1+β).
rβ r β (1+ β)
6 β2 + 6 β + 1
Kurtosis = 3 + .
r β (1 + β )
Mode = largest integer in (r-1)β (if (r-1)β is an integer,

then both (r-1)β and (r-1)β - 1 are modes.)
Probability Generating Function: P(z) = {1- β(z-1)}-r, z < 1 + 1/β.
Moment Generating Function: M(s) = {1- β(es-1)}-r, s < ln(1+β) - ln(β).

f(x+1)/f(x) = a + b/(x+1), a = β/(1+β), b = (r-1)β/(1+β), f(0) = (1+β)-r.
A Negative Binomial Distribution with r = 2 and β = 4:
Prob.
0.08
0.06
0.04
0.02
x
0 5 10 15 20 25 30
A Negative Binomial Distribution with r = 0.5 and β = 10:
Prob.
0.3
0.25
0.2
0.15
0.1
0.05
x
0 5 10 15 20 25 30
Here is a Negative Binomial Distribution with parameters β = 2/3 and r = 8:31

Number of Number of Claims Square of Number of Claims
Claims f(x) F(x) times f(x) times f(x)
0 0.0167962 0.01680 0.00000 0.00000
1 0.0537477 0.07054 0.05375 0.05375
2 0.0967459 0.16729 0.19349 0.38698
3 0.1289945 0.29628 0.38698 1.16095
4 0.1418940 0.43818 0.56758 2.27030
5 0.1362182 0.57440 0.68109 3.40546
6 0.1180558 0.69245 0.70833 4.25001
7 0.0944446 0.78690 0.66111 4.62779
8 0.0708335 0.85773 0.56667 4.53334
9 0.0503705 0.90810 0.45333 4.08001
10 0.0342519 0.94235 0.34252 3.42519
11 0.0224194 0.96477 0.24661 2.71275
12 0.0141990 0.97897 0.17039 2.04465
13 0.0087378 0.98771 0.11359 1.47669
14 0.0052427 0.99295 0.07340 1.02757
15 0.0030757 0.99603 0.04614 0.69204
16 0.0017685 0.99780 0.02830 0.45275
17 0.0009987 0.99879 0.01698 0.28863
18 0.0005548 0.99935 0.00999 0.17977
19 0.0003037 0.99965 0.00577 0.10964
20 0.0001640 0.99982 0.00328 0.06560
21 0.0000875 0.99990 0.00184 0.03857
22 0.0000461 0.99995 0.00101 0.02232
23 0.0000241 0.99997 0.00055 0.01273
24 0.0000124 0.99999 0.00030 0.00716
25 0.0000064 0.99999 0.00016 0.00398
26 0.0000032 1.00000 0.00008 0.00218
27 0.0000016 1.00000 0.00004 0.00119
28 0.0000008 1.00000 0.00002 0.00064
29 0.0000004 1.00000 0.00001 0.00034
30 0.0000002 1.00000 0.00001 0.00018
Sum 1.00000 5.33333 37.33314
For example, f(5) = {( 2/3)5 / (1+ 2/3)8+5}(12!) / {(5!)(7!)}

= (0.000171993)(479,001,600)/{(120)(5040)} = 0.136.
The mean is: rβ = 8(2/3) = 5.333. The variance is: 8(2/3)(1+2/3) = 8.89.
The variance can also be computed as: (mean)(1+β) = 5.333(5/3) = 8.89.
The variance is indeed = E[X2 ] - E[X]2 = 37.333 - 5.33332 = 8.89.
According to the formula given previously, the mode should be the largest integer in (r-1)β =
(8-1)(2/3) = 4.67, which contains the integer 4. In fact, f(4) = 14.2% is the largest value of the
probability density function. Since F(5) = 0.57 ≥ 0.5 and F(4) = 0.44 < 0.5, 5 is the median.
31
The values for the Negative Binomial probability density function in the table were computed using:
f(0) = (β/(1+β))r and f(x+1) / f(x) = β(x+r) / {(x+1)(1+β)}.
For example, f(12) = f(11)β(11+r) / {12(1+β)} = (0.02242)(2/3)(19) / 20 = 0.01420.
Mean and Variance of the Negative Binomial Distribution:
The mean of a Geometric distribution is β and its variance is β(1+β). Since the Negative Binomial is
a sum of r Geometric Distributions, it follows that the mean of the Negative Binomial is rβ and
the variance of the Negative Binomial is rβ(1+β).
Since β > 0, 1+ β > 1, for the Negative Binomial Distribution the variance is greater than
the mean.
For the Negative Binomial, the ratio of the variance to the mean is 1+β, while variance/mean = 1 for
the Poisson Distribution.
Thus (β)(mean) is the “extra variance” for the Negative Binomial compared to the Poisson.
Non-Integer Values of r:
Note that even if r is not integer, the binomial coefficient in the front of the Negative Binomial Density
⎛ x+ r − 1⎞ (x +r -1)! (x +r -1) (x + r - 2) ... (r)
can be calculated as: ⎜ ⎟= = .
⎝ x ⎠ x! (r - 1)! x!
For example with r = 6.2 if one wanted to compute f(4), then the binomial coefficient in front is:
⎛4 + 6.2 − 1⎞ ⎛9.2 ⎞ 9.2! (9.2) (8.2) (7.2) (6.2)
⎜ ⎟ = ⎜ ⎟= = = 140.32.
⎝ 4 ⎠ ⎝ 4 ⎠ 5.2! 4! 4!
Note that the numerator has 4 factors; in general it will have x factors. These four factors are:
9.2! / (9.2-4)! = 9.2!/5.2!, or if you prefer: Γ(10.2) / Γ(6.2) = (9.2)(8.2)(7.2)(6.2).
As shown in Loss Models, in general one can rewrite the density of the Negative Binomial as:
r(r + 1)...(r + x - 1) βx
f(x) = , where there are x factors in the product in the numerator.
x! (1+ β )x + r
Exercise: For a Negative Binomial with parameters r = 6.2 and β = 7/3, compute f(4).
[Solution: f(4) = {(9.2)(8.2)(7.2)(6.2)/4!} (7/3)4 / (1+ 7/3)6.2+4 = 0.0193.]
Negative Binomial as a Mixture of Poissons:
As discussed subsequently, when Poissons are mixed via a Gamma Distribution, the mixed
distribution is always a Negative Binomial Distribution, with r = α = shape parameter of the Gamma
and β = θ = scale parameter of the Gamma. The mixture of Poissons via a Gamma distribution
produces a Negative Binomial Distribution and increases the variance above the mean.
Series of Bernoulli Trials:
Return to the situation that resulted in the Geometric distribution, involving a series of independent
Bernoulli trials each with chance of success 1/(1 + β), and chance of failure of β/(1 + β).
What is the probability of two successes and four failures in the first six trials?
It is given by the Binomial Distribution:

⎛6⎞ ⎛6⎞
⎜ ⎟ 1/(1+β)2 {1 - 1/(1+β)}4 = ⎜ ⎟ β4 /(1+β)6 .
⎝2⎠ ⎝2⎠
The chance of having the third success on the seventh trial is given by 1/(1 + β) times the above
probability:
⎛6⎞
⎜ ⎟ β4 /(1+β)7
⎝2⎠
Similarly the chance of the third success on trial x + 3 is given by 1/(1 + β) times the probability of
3 - 1 = 2 successes and x failures on the first x + 3 - 1 = x + 2 trials:
⎛ x +2⎞
⎜ ⎟ βx/(1+β)x+3
⎝ 2 ⎠
More generally, the chance of the rth success on trial x+r is given by 1/(1 + β) times the probability
of r-1 success and x failures on the first x+r-1 trials.
⎛ x+ r − 1⎞ ⎛ x+ r − 1⎞
f(x) = {1/(1+β)} ⎜ ⎟ βx/(1+β)x+r-1 = ⎜ ⎟ βx/(1+β)x+r, x = 0, 1, 2, 3...
⎝ r -1 ⎠ ⎝ x ⎠
This is the Negative Binomial Distribution. Thus we see that one source of the Negative Binomial is
the chance of experiencing failures on a series of independent Bernoulli trials prior to getting a certain
number of successes.32 Note that in the derivation, 1/(1 + β) is the chance of success on each
Bernoulli trial.
β / (1+ β) chance of a failure
Thus, β = = .
1/ (1+ β) chance of a success
For a series of independent identical Bernoulli trials, the chance of success number r
following x failures is given by a Negative Binomial Distribution with parameters
β = (chance of a failure) / (chance of a success), and r.
Exercise: One has a series of independent Bernoulli trials, each with chance of success 0.3.
What is the distribution of the number of failures prior to the 5th success?
[Solution: A Negative Binomial Distribution, as per Loss Models, with parameters β = 0.7/0.3 = 7/3,
and r = 5.]
While this is one derivation of the Negative Binomial distribution, note that the Negative Binomial
Distribution is used to model claim counts in many situations that have no relation to this derivation.
32
Even though the Negative Binomial Distribution was derived here for integer values of r, as has been discussed,
the Negative Binomial Distribution is well defined for r non-integer as well.
Negative Binomial as a Sum of Geometric Distributions:
The number of claims for a Negative Binomial Distribution was modeled as the number of failures
prior to getting a total of r successes on a series of independent Bernoulli trials. Instead one can add
up the number of failures associated with getting a single success r times independently of each
other. As seen before, each of these is given by a Geometric distribution. Therefore, obtaining r
successes is the sum of r separate independent variables each involving getting a single success.
Number of Failures until the third success has a

Negative Binomial Distribution: r = 3, β = (1 - q)/q.
Time 0 Success #1 Success #2 Success #3
Geometric Geometric Geometric

β = (1 - q)/q β = (1 - q)/q β = (1 - q)/q
Therefore, the Negative Binomial Distribution with parameters β and r, with r integer, can
be thought of as the sum of r independent Geometric distributions with parameter β.
The Negative Binomial Distribution for r = 1 is a Geometric Distribution.
Since the Geometric distribution is the discrete analog of the Exponential distribution, the Negative
Binomial distribution is the discrete analog of the continuous Gamma Distribution33.
The parameter r in the Negative Binomial is analogous to the parameter α in the Gamma
Distribution.34
(1+β)/β in the Negative Binomial Distribution is analogous to e1/θ in the Gamma Distribution.
33
Recall that the Gamma Distribution is a sum of α independent Exponential Distributions, just as the Negative
Binomial is the sum of r independent Geometric Distributions.
34
Note that the mean and variance of the Negative Binomial and the Gamma are proportional respectively to r and α.
Adding Negative Binomial Distributions:
Since the Negative Binomial is a sum of Geometric Distributions, if one sums independent Negative
Binomials with the same β, then one gets another Negative Binomial, with the same β parameter
and the sum of their r parameters.35
Exercise: X is a Negative Binomial with β = 1.4 and r = 0.8. Y is a Negative Binomial with
β = 1.4 and r = 2.2. Z is a Negative Binomial with β = 1.4 and r = 1.7. X, Y, and Z are independent
of each other. What form does X + Y + Z have?
[Solution: X + Y + Z is a Negative Binomial with β = 1.4 and r = .8 + 2.2 + 1.7 = 4.7.]
If X is Negative Binomial with parameters β and r1 , and Y is Negative Binomial with
parameters β and r2 , X and Y independent, then X + Y is Negative Binomial with
parameters β and r1 + r2 .
Specifically, the sum of n independent identically distributed Negative Binomial variables, with the
same parameters β and r, is a Negative Binomial with parameters β and nr.
Exercise: X is a Negative Binomial with β = 1.4 and r = 0.8.

What is the form of the sum of 25 independent random draws from X?
[Solution: A random draw from a Negative Binomial with β = 1.4 and r = (25)(.8) = 20.]
Thus if one had 25 exposures, each of which had an independent Negative Binomial frequency
process with β = 1.4 and r = 0.8, then the portfolio of 25 exposures has a
Negative Binomial frequency process with β = 1.4 and r = 20.
35
This holds whether or not r is integer. This is analogous to adding independent Gammas with the same θ
parameter. One obtains a Gamma, with the same θ parameter, but with the new α parameter equal to the sum of the
individual α parameters.
Effect of Exposures:
Assume one has 100 exposures with independent, identically distributed frequency distributions.
If each one is Negative Binomial with parameters β and r, then so is the sum, with parameters β and
100r. If we change the number of exposures to for example 150, then the sum is Negative Binomial
with parameters β and 150r, or 1.5 times the r parameter in the first case.
In general, as the exposures change, the r parameter changes in proportion.36
Exercise: The total number of claims from a portfolio of insureds has a Negative Binomial Distribution
with β = 0.2 and r = 30.
If next year the portfolio has 120% of the current exposures, what is its frequency distribution?
[Solution: Negative Binomial with β = 0.2 and r = (1.2)(30) = 36.]
Thinning Negative Binomial Distributions:
Thinning can also be applied to the Negative Binomial Distribution.37

The β parameter of the Negative Binomial Distribution is multiplied by the thinning factor.
Exercise: Claim frequency follows a Negative Binomial Distribution with parameters

β = 0.20 and r = 1.5. One quarter of all claims involve attorneys. If attorney involvement is
independent between different claims, what is the probability of getting two claims involving
attorneys in the next year?
[Solution: Claims with attorney involvement are Negative Binomial Distribution with
β = (.20)(25%) = 0.05 and r = 1.5.
Thus f(2) = r(r+1)β2 / {2! (1+β)r+2} = (1.5)(2.5)(.05)2 / {2 (1.05)3.5} = 0.395%.]
Note that when thinning the parameter β is altered, while when adding the r parameter is affected.
As discussed previously, if one adds two independent Negative Binomial Distributions with the
same β, then the result is also a Negative Binomial Distribution, with the sum of the r parameters.
36
See Section 7.4 of Loss Models, not on the syllabus. This same result holds for a Compound Frequency
Distribution, to be discussed subsequently, with a primary distribution that is Negative Binomial.
37
See Table 8.3 in Loss Models. However, unlike the Poisson case, the large and small accidents are not
independent processes.
Problems:
The following six questions all deal with a Negative Binomial distribution with parameters
β = 0.4 and r = 3.

A. less than .9
B. at least .9 but less than 1.0
E. at least 1.2

A. less than 1.8
E. at least 2.1
6.3 (2 points) What is the chance of having 4 claims?

A. less than 3%
E. at least 6%
6.4 (2 points) What is the mode?

6.5 (2 points) What is the median?

6.6 (2 points) What is the chance of having 4 claims or less?

A. 90% B. 92% C. 94% D. 96% E. 98%
6.7 (2 points) Bud and Lou play a series of games. Bud has a 60% chance of winning each game.
Lou has a 40% chance of winning each game. The outcome of each game is independent of any
other. Let N be the number of games Bud wins prior to Lou winning 5 games.
A. less than 14
E. at least 20
6.8 (1 point) For a Negative Binomial distribution with β = 2/9 and r = 1.5, what is the chance of
having 3 claims?
A. 1% B. 2% C. 3% D. 4% E. 5%
6.9 (2 points) In baseball a team bats in an inning until it makes 3 outs. Assume each batter has a
40% chance of getting on base and a 60% chance of making an out. Then what is the chance of a
team sending exactly 8 batters to the plate in an inning? (Assume no double or triple plays.
Assume nobody is picked off base, caught stealing or thrown out on the bases. Assume each
batterʼs chance of getting on base is independent of whether another batter got on base.)
A. less than 1%
E. at least 4%
6.10 (1 point) Assume each exposure has a Negative Binomial frequency distribution, as per Loss
Models, with β = 0.1 and r = 0.27. You insure 20,000 independent exposures.
What is the frequency distribution for your portfolio?
A. Negative Binomial with β = 0.1 and r = 0.27.
B. Negative Binomial with β = 0.1 and r = 5400.
C. Negative Binomial with β = 2000 and r = 0.27.
D. Negative Binomial with β = 2000 and r = 5400.
E. None of the above.
6.11 (3 points) Frequency is given by a Negative Binomial distribution with β = 1.38 and r = 3.
Severity is given by a Weibull Distribution with τ = 0.3 and θ = 1000.
What is chance of two losses each of size greater than $25,000?
A. 1% B. 2% C. 3% D. 4% E. 5%

Six friends each have their own phone.
The number of calls each friend gets per night from telemarketers is Geometric with β = 0.3.
The number of calls each friend gets is independent of the others.
6.12 (2 points) Tonight, what is the probability that three of the friends get one or more calls from
telemarketers, while the other three do not?
A. 11% B. 14% C. 17% D. 20% E. 23%
6.13 (2 points) Tonight, what is the probability that the friends get a total of three calls from
telemarketers?
A. 11% B. 14% C. 17% D. 20% E. 23%
6.14 (2 points) The total number of claims from a group of 80 drivers has a
Negative Binomial Distribution with β = 0.5 and r = 4.
What is the probability that a group of 40 similar drivers have a total of 2 or more claims?
A. 22% B. 24% C. 26% D. 28% E. 30%
6.15 (2 points) The total number of non-zero payments from a policy with a $1000 deductible
follows a Negative Binomial Distribution with β = 0.8 and r = 3.
The ground up losses follow an Exponential Distribution with θ = 2500.
If this policy instead had a $5000 deductible, what would be the probability of having no
non-zero payments?
A. 56% B. 58% C. 60% D. 62% E. 64%
6.16 (3 points) The mathematician Stefan Banach smoked a pipe.

In order to light his pipe, he carried a matchbox in each of two pockets.
Each time he needs a match, he is equally likely to take it from either matchbox.
Assume that he starts the month with two matchboxes each containing 20 matches.
Eventually Banach finds that when he tries to get a match from one of his matchboxes it is empty.
What is the probability that when this occurs, the other matchbox has exactly 5 matches in it?
A. less than 6%
E. at least 9%
6.17 (2 points) Total claim counts generated from a portfolio of 400 policies follow a Negative
Binomial distribution with parameters r = 3 and β = 0.4. If the portfolio increases to 500 policies,
what is the probability of observing exactly 2 claims in total?
A. 21% B. 23% C. 25% D. 27% E. 29%

Two teams are playing against one another in a seven game series.
The results of each game are independent of the others.
The first team to win 4 games wins the series.
6.18 (3 points) The Flint Tropics have a 45% chance of winning each game.
What is the Flint Tropics chance of winning the series?
A. 33% B. 35% C. 37% D. 39% E. 41%
6.19 (3 points) The Durham Bulls have a 60% chance of winning each game.
What is the Durham Bulls chance of winning the series?
A. 67% B. 69% C. 71% D. 73% E. 75%
6.20 (3 points) The New York Knights have a 40% chance of winning each game.
The Knights lose the first game. The opposing manager offers to split the next two games with the
Knights (each team would win one of the next two games.)
Should the Knights accept this offer?
6.21 (3 points) The number of losses follows a Negative Binomial distribution with r = 4 and
β = 3. Sizes of loss are uniform from 0 to 15,000.
There is a deductible of 1000, a maximum covered loss of 10,000, and a coinsurance of 90%.
Determine the probability that there are exactly six payments of size greater than 5000.
A. 9.0% B. 9.5% C. 10.0% D. 10.5% E. 11.0%
6.22 (2 points) Define (N - j)+ = n - j if n ≥ j, and 0 otherwise.

N follows a Negative Binomial distribution with r = 5 and β = 0.3. Determine E[(N - 2)+].
A. 0.25 B. 0.30 C. 0.35 D. 0.40 E. 0.45
6.23 (3 points) The number of new claims the State House Insurance Company receives in a day
follows a Negative Binomial Distribution r = 5 and β = 0.8. For a claim chosen at random, on
average how many other claims were also made on the same day?
A. 4.0 B. 4.2 C. 4.4 D. 4.6 E. 4.8
6.24 (2, 5/83, Q.44) (1.5 points) If a fair coin is tossed repeatedly, what is the probability that the
third head occurs on the nth toss?
⎛ n⎞
A. (n-1)/2n+1 B. (n - 1)(n - 2)/2n+1 C. (n - 1)(n - 2)/2n D. (n-1)/2n E. ⎜ ⎟ / 2n
⎝3⎠
6.25 (2, 5/90, Q.45) (1.7 points) A coin is twice as likely to turn up tails as heads. If the coin is
tossed independently, what is the probability that the third head occurs on the fifth trial?
A. 8/81 B. 40/243 C. 16/81 D. 80/243 E. 3/5
6.26 (2, 2/96, Q.28) (1.7 points) Let X be the number of independent Bernoulli trials performed
until a success occurs. Let Y be the number of independent Bernoulli trials performed until 5
successes occur. A success occurs with probability p and Var(X) = 3/4.
Calculate Var(Y).
A. 3/20 B. 3/(4 5 ) C. 3/4 D. 15/4 E. 75/4
6.27 (1, 11/01, Q.11) (1.9 points) A company takes out an insurance policy to cover accidents that
occur at its manufacturing plant. The probability that one or more accidents will occur during any given
month is 3/5. The number of accidents that occur in any given month is independent of the number
of accidents that occur in all other months.
Calculate the probability that there will be at least four months in which no accidents
occur before the fourth month in which at least one accident occurs.
(A) 0.01 (B) 0.12 (C) 0.23 (D) 0.29 (E) 0.41
6.28 (1, 11/01, Q.21) (1.9 points) An insurance company determines that N, the number of claims
received in a week, is a random variable with P[N = n] = 1/2n+1, where n ≥ 0.
The company also determines that the number of claims received in a given week is independent of
the number of claims received in any other week. Determine the probability that exactly seven
claims will be received during a given two-week period.
(A) 1/256 (B) 1/128 (C) 7/512 (D) 1/64 (E) 1/32
6.29 (CAS3, 11/03, Q.18) (2.5 points) A new actuarial student analyzed the claim frequencies of a
group of drivers and concluded that they were distributed according to a negative binomial
distribution and that the two parameters, r and β, were equal.
An experienced actuary reviewed the analysis and pointed out the following:
"Yes, it is a negative binomial distribution. The r parameter is fine, but the value of the β parameter is
wrong. Your parameters indicate that 1/9 of the drivers should be claim-free, but
in fact, 4/9 of them are claim-free."
Based on this information, calculate the variance of the corrected negative binomial distribution.
A. 0.50 B. 1.00 C. 1.50 D. 2.00 E. 2.50
6.30 (CAS3, 11/04, Q.21) (2.5 points) The number of auto claims for a group of 1,000 insured
drivers has a negative binomial distribution with β = 0.5 and r = 5.
Determine the parameters β and r for the distribution of the number of auto claims for a group of
2,500 such individuals.
A. β = 1.25 and r = 5
B. β = 0.20 and r = 5
C. β = 0.50 and r = 5
D. β = 0.20 and r= 12.5
E. β = 0.50 and r = 12.5
6.31 (CAS3, 5/05, Q.28) (2.5 points)

You are given a negative binomial distribution with r = 2.5 and β = 5.
For what value of k does pk take on its largest value?
A. Less than 7 B. 7 C. 8 D. 9 E. 10 or more
6.32 (CAS3, 5/06, Q.32) (2.5 points) Total claim counts generated from a portfolio of 1,000
policies follow a Negative Binomial distribution with parameters r = 5 and β = 0.2.
Calculate the variance in total claim counts if the portfolio increases to 2,000 policies.
A. Less than 1.0
B. At least 1.0 but less than 1.5
C. At least 1.5 but less than 2.0
D. At least 2.0 but less than 2.5
E. At least 2.5
6.33 (CAS3, 11/06, Q.23) (2.5 points) An actuary has determined that the number of claims
follows a negative binomial distribution with mean 3 and variance 12.
Calculate the probability that the number of claims is at least 3 but less than 6.
A. Less than 0.20
E. At least 0.35
6.34 (CAS3, 11/06, Q.24) (2.5 points) Two independent random variables, X1 and X2 , follow the
negative binomial distribution with parameters (r1 , β1) and (r2 , β2), respectively.
Under which of the following circumstances will X1 + X2 always be negative binomial?
1. r1 = r2 .
2. β1 = β2.
3. The coefficients of variation of X1 and X2 are equal.
A. 1 only B. 2 only C. 3 only D. 1 and 3 only E. 2 and 3 only
6.35 (CAS3, 11/06, Q.31) (2.5 points)

You are given the following information for a group of policyholders:
• The frequency distribution is negative binomial with r = 3 and β = 4.
• The severity distribution is Pareto with α = 2 and θ = 2,000.
Calculate the variance of the number of payments if a $500 deductible is introduced.
A. Less than 30
B. At least 30, but less than 40
C. At least 40, but less than 50
D. At least 50, but less than 60
E. At least 60
6.36 (SOA M, 11/06, Q.22 & 2009 Sample Q.283) (2.5 points) The annual number of doctor
visits for each individual in a family of 4 has a geometric distribution with mean 1.5.
The annual numbers of visits for the family members are mutually independent.
An insurance pays 100 per doctor visit beginning with the 4th visit per family.
Calculate the expected payments per year for this family.
(A) 320 (B) 323 (C) 326 (D) 329 (E) 332
6.1. E. mean = rβ = (3)(.4) = 1.2.
6.2. A. variance = rβ(1+ β) = (3)(.4)(1.4) = 1.68.
⎛ x +r -1⎞ ⎛6⎞
6.3. B. ⎜ ⎟ βx/ (1+β)x+r = ⎜ 4⎟ (0.4)4 / (1.4)4+3 = 15 (0.0256)/(10.54) = 0.0364.
⎝ x ⎠ ⎝ ⎠
6.4. A. & 6.5. B. The mode is 0, since f(0) is larger than any other value.
n 0 1 2 3 4
f(n) 0.3644 0.3124 0.1785 0.0850 0.0364
F(n) 0.364 0.677 0.855 0.940 0.977
The median is 1, since F(0) <.5 and F(1) ≥ .5.
Comment: Iʼve used the formulas: f(0) = (β/(1+β))r and f(x+1) / f(x) = β(x+r) / {(x+1)(1+β)}.
Just as with the Gamma Distribution, the Negative Binomial can have either a mode of zero or a
positive mode. For r < 1 + 1/β, as is the case here, the mode is zero, and the Negative Binomial
looks somewhat similar to an Exponential Distribution.
6.6. E. F(4) = f(0) + f(1) + f(2) + f(3) + f(4) = 97.7%.

n 0 1 2 3 4
f(n) 0.3644 0.3124 0.1785 0.0850 0.0364
F(n) 0.3644 0.6768 0.8553 0.9403 0.9767
Comment: Using the Incomplete Beta Function: F(4) = 1- β(4+1, r; β/(1+β)) = 1 - β(5,3; .4/1.4) =
1 - 0.0233 = 0.9767.
6.7. D. This is series of Bernoulli trials. Treating Louʼs winning as a “success”, then chance of
success is 40%. N is the number of failures prior to the 5th success.
Therefore N has a Negative Binomial Distribution with r = 5 and
β = chance of failure / chance of success = 60%/40% = 1.5.
Variance is: rβ(1+β) = (5)(1.5)(2.5) = 18.75.
6.8. A. f(3) = {r(r+1)(r+2)} / 3!} β3 / (1+β)3+r = {(1.5)(2.5)(3.5)/ 6} (2/9)3 (11/9)-4.5 = 0.0097.

6.9. E. For the defense a batter reaching base is a failure and an out is a success. The number of
batters reaching base is the number of failures prior to 3 successes for the defense. The chance of a
success for the defense is 0.6. Therefore the number of batters who reach base is given by a
Negative Binomial with r = 3 and
β = (chance of failure for the defense)/(chance of success for the defense) = 0.4/0.6 = 2/3.
If exactly 8 batters come to the plate, then 5 reach base and 3 make out. The chance of exactly 5
batters reaching base is f(5) for r = 3 and β = 2/3: {(3)(4)(5)(6)(7)/5!} β5 / (1+β)5+r =
(21)(0.13169)/59.537 = 0.0464.
Alternately, for there to be exactly 8 batters, the last one has to make an out, and exactly two of the
first 7 must make an out. Prob[2 of 7 make out] ⇔
density at 2 of Binomial Distribution with m = 7 and q = 0.6 ⇔ ((7)(6)/2)0.62 0.45 = 0.0774.

Prob[8th batter makes an out]Prob[2 of 7 make an out] = (0.6)(0.0774) = 0.0464.
Comment: Generally, one can use either a Negative Binomial Distribution or some reasoning and a
Binomial Distribution in order to answer these type of questions.
For there to be exactly 8 batters, the last one has to make an out, and exactly two of the first 7 must
make an out. The team at bat sits down when the third batter makes out.
If instead 6 batters get on base and 2 batters make out, then the ninth batter would get up.
6.10. B. The sum of independent Negative Binomials, each with the same β, is another Negative
Binomial, with the sum of the r parameters. In this case we get a Negative Binomial with β = 0.1 and
r = (0.27)(20,000) = 5400.
6.11. D. S(25,000) = exp(-(25000/1000)0.3) = 0.0723. The losses greater than $25,000 is

another Negative Binomial with r = 3 and β = (1.38)(0.0723) = 0.0998.
For a Negative Binomial, f(2) = (r(r+1)/2)β2 /(1+β)r+2 = {(3)(4)/2}0.09982 /(1.0998)5 = 3.71%.
Comment: An example of thinning a Negative Binomial.
6.12. A. For the Geometric, f(0) = 1/(1+β) = 1/1.3. 1 - f(0) = .3/1.3.

Prob[3 with 0 and 3 not with 0] = {6! / (3! 3!)} (1/1.3)3 (.3/1.3)3 = 0.112.
6.13. B. The total number of calls is Negative Binomial with r = 6 and β = .3.
f(3) = (r(r+1)(r+2)/3!)β3/(1+β)3+r = ((6)(7)(8)/3!).33 /1.39 = 0.143.
6.14. C. The frequency for the 40 drivers is Negative Binomial Distribution with parameters
r = (40/80)(4) = 2 and β = 0.5.
f(0) = 1/1.52 = 44.44%. f(1) = 2(.5/1.53 ) = 29.63%. 1 - f(0) - f(1) = 25.9%.
6.15. E. For the Exponential, S(1000) = exp[-1000/2500] = .6703.

S(5000) = exp[-5000/2500] = .1353. Therefore, with the $5000 deductible, the non-zero
payments are Negative Binomial Distribution with r = 3 and β = (.1353/.6703)(0.8) = .16.
f(0) = 1/1.163 = 64%.
6.16. E. Let us assume the righthand matchbox is the one discovered to be empty.
Call a “success” choosing the righthand box and a “failure” choosing the lefthand box.
Then we have a series of Bernoulli trials, with chance of success 1/2.
The number of “failures” prior to the 21st “success” (looking in the righthand matchbox 20 times and
getting a match and once more finding no matches are left) is Negative Binomial with r = 21 and
β = (chance of failure)/(chance of success) = (1/2)/(1/2) = 1.
For the lefthand matchbox to then have 5 matches, we must have had 15 “failures”.
Density at 15 for this Negative Binomial is: {(21)(22)...(35) / 15!} 115/(1 + 1)15+21 = 4.73%.
However, it is equally likely that the lefthand matchbox is the one discovered to be out of matches.
Thus we double this probability: (2)(4.73%) = 9.5%.
Comment: Difficult. The famous Banach Match problem.
6.17. A. When one changes the number of exposures, the r parameter changes in proportion.
For 500 policies, total claim counts follow a Negative Binomial distribution with parameters
r = 3(500/400) = 3.75 and β = 0.4.
f(2) = {r(r+1)/2}β2/(1+β)r+2 = (3.75)(4.75)(.5)(.42 )/(1.45.75) = 20.6%.

6.18. D. Ignoring the fact that once a team wins four games, the final games of the series will not be
played, the total number of games won out of seven by the Tropics is Binomial with q = 0.45 and
m = 7. We want the sum of the densities of this Binomial from 4 to 7:
35(0.454 )(0.553 ) + 21(0.455 )(0.552 ) + 7(0.456 )(0.55) + 0.457
= 0.2388 + 0.1172 + 0.0320 + 0.0037 = 0.3917.
Alternately, the number of failures by the Tropics prior to their 4th success is Negative Binomial with
r = 4 and β = .55/.45 = 11/9.
For the Tropics to win the series they have to have 3 or fewer loses prior to their 4th win.
The probability of this is the sum of the densities of the Negative Binomial at 0 to 3:
1/(20/9)4 + 4(11/9)/(20/9)5 + {(4)(5)(11/9)2 /2!}/(20/9)6 + {(4)(5)(6)(11/9)3 /3!}/(20/9)7
= 0.0410 + 0.0902 + 0.1240 + 0.1364 = 0.3916.
Comment: The question ignores any effect of home field advantage.
6.19. C. Ignoring the fact that once a team wins four games, the final games of the series will not be
played, the total number of games won out of seven by the Bulls is Binomial with q = 0.60 and
m = 7. We want the sum of the densities of this Binomial from 4 to 7:
35(.64 )(.43 ) + 21(.65 )(.42 ) + 7(.66 )(.4) + .67
= 0.2903 + 0.2613 + 0.1306 + 0.0280 = 0.7102.
Alternately, the number of failures by the Bulls prior to their 4th success is Negative Binomial with
r = 4 and β = .4/.6 = 2/3.
For the Bulls to win the series they have to have 3 or fewer loses prior to their 4th win.
The probability of this is the sum of the densities of the Negative Binomial at 0 to 3:
1/(5/3)4 + 4(2/3)/(5/3)5 + {(4)(5)(2/3)2 /2!}/(5/3)6 + {(4)(5)(6)(2/3)3 /3!}/(5/3)7
= 0.1296 + 0.2074 + 0.2074 + 0.1659 = 0.7103.
Comment: According to Bill James, “A useful rule of thumb is that the advantage doubles in a
seven-game series. In other words, if one team would win 51% of the games between two
opponents, then they would win 52% of the seven-game series. If one team would win 55% of the
games, then they would win 60% of the series.”
Here is a graph of the chance of winning the seven game series, as a function of the chance of
winning each game:
per series
0.8
0.6
0.4
0.2
per game
0.3 0.4 0.5 0.6 0.7 0.8
6.20. If the Knights do not accept the offer, then they need to win four of six games.
We want the sum of the densities from 4 to 6 of a Binomial with q = .4 and m = 6:
15(.44 )(.62 ) + 6(.45 )(.6) + .46 = 0.1382 + 0.0369 + 0.0041 = 0.1792.
If the Knights accept the offer, then they need to win three of four games.
We want the sum of the densities from 3 to 4 of a Binomial with q = .4 and m = 4:
4(.43 )(.6) + .44 = 0.1536 + 0.0256 = 0.1792.
Thus the Knights are indifferent between accepting this offer or not.
Alternately, if the Knights do not accept the offer, then they need to win four of six games.
The number of failures by the Knights prior to their 4th success is Negative Binomial with
r = 4 and β = .6/.4 = 1.5. The Knights win the series if they have 2 or fewer failures:
1/2.54 + 4(1.5)/2.55 + {(4)(5)(1.5)2 /2!}/2.56 = 0.0256 + 0.0614 + 0.0922 = 0.1792.
If the Knights accept the offer, then they need to win three of four games.
The number of failures by the Knights prior to their 3rd success is Negative Binomial with
r = 3 and β = .6/.4 = 1.5. The Knights win the series if they have 1 or fewer failures:
1/2.53 + 3(1.5)/2.54 = 0.0640 + 0.1152 = 0.1792.
Thus the Knights are indifferent between accepting this offer or not.
Comment: A comparison of their chances of winning the series as a function of their chance of
winning a game, accepting the offer (dashed) and not accepting the offer (solid):
% series
0.35
0.3
0.25
0.2
0.15
0.1
% game
0.3 0.35 0.4 0.45 0.5
The Knights should accept the offer if their chance of winning each game is less than 40%.
6.21. C. A payment is of size greater than 5000 if the loss is of size greater than:
5000/.9 + 1000 = 6556. Probability of a loss of size greater than 6556 is: 1 - 6556/15000 =
56.3%. The large losses are Negative Binomial with r = 4 and β = (56.3%)(3) = 1.69.
f(6) = {r(r+1)(r+2)(r+3)(r+4)(r+5)/6!}β6/(1+β)r+6 = {(4)(5)(6)(7)(8)(9)/720}1.696 /2.6910 = 9.9%.

Comment: An example of thinning a Negative Binomial.
6.22. C. f(0) = 1/1.35 = 0.2693. f(1) = (5)0.3/1.36 = 0.3108.

E[N] = 0f(0) + 1f(1) + 2f(2) + 3f(3) + 4f(4) + 5f(5) + ... E[(N - 2)+] = 1f(3) + 2f(4) + 3f(5) + ....
E[N] - E[(N - 2)+] = f(1) + 2f(2) + 2f(3) + 2f(4) + 2f(5) + ... = f(1) + 2{1 - f(0) - f(1)} = 2 - 2f(0) - f(1).
E[(N - 2)+] = E[N] - {2 - 2f(0) - f(1)} = (5)(.3) - {2 - (2)(.2693) -.3108} = 0.3494.
Alternately, E[N ∧ 2] = 0f(0) + 1f(1) + 2{1 - f(0) - f(1)} = 1.1506.
E[(N - 2)+] = E[N] - E[N ∧ 2] = (5)(0.3) - 1.1506 = 0.3494.
Alternately, E[(N - 2)+] = E[(2-N)+] + E[N] - 2 = 2f(0) + f(1) + (5)(0.3) - 2 = 0.3494.
Comment: See the section on Limited Expected Values in “Mahlerʼs Guide to Fitting Loss
Distributions.”
6.23. E. Let n be the number of claims made on a day.

The probability that the claim picked is on a day of size n is proportional to the product of the
number of claims on that day and the proportion of days of that size: n f(n).
Thus, Prob[claim is from a day with n claims] = n f(n) / Σ n f(n) = n f(n) / E[N].
For n > 0, the number of other claims on the same day is n - 1.
∑ n f(n) (n- 1) ∑ (n2 - n) f(n)

1 1 E[N2 ] - E[N]
Average number of other claims is: = = =
E[N] E[N] E[N]
E[N2 ] Var[N] + E[N]2 Var[N]

-1= -1= + E[N] - 1 = 1 + β + rβ - 1 = (r + 1)β = (6)(0.8) = 4.8.
E[N] E[N] E[N]
Comment: The average day has four claims; on the average day there are three other claims.
However, a claim chosen at random is more likely to be from a day that had a lot of claims.
6.24. B. This is a Negative Binomial with r = 3, β = chance of failure / chance of success = 1,

and x = number of failures = n - 3.
f(x) = {r(r+1)...(r+x-1)/x!}βx/(1+β)r+x = {(3)(4)...(x+2)/x!}/23+x = (x+1)(x+2)/24+x.
f(n-3) = (n-2)(n-1)/2n + 1.
Alternately, for the third head to occur on the nth toss, for n ≥ 3, we have to have had two head out of
⎛n -1⎞
the first n-1 tosses, which has probability ⎜ ⎟ / 2n-1 = (n-2)(n-1) / 2n , and a head on the nth toss,
⎝ 2⎠
which has probability 1/2. Thus the total probability is: (n-2)(n-1)/2n + 1.
6.25. A. The number of tails before the third head is Negative Binomial, with r = 3 and
β = chance of failure / chance of success = chance of tail / chance of head = 2.
Prob[ third head occurs on the fifth trial] = Prob[2 tails when the get 3rd head] = f(2) =
{r(r+1)/2}β2/(1+β)r+2 = (6)(4)/35 = 8/81.
Alternately, need 2 heads and 2 tails out of the first 4 tosses, and then a head on the fifth toss:
{4!/(2!2!)}(1/3)2 (2/3)2 (1/3) = 8/81.
6.26. D. X-1 is Geometric with β = chance of failure / chance of success = (1 - p)/p = 1/p - 1.
Therefore, 3/4 = Var(X) = Var(X-1) = β(1 + β) = (1/p - 1){1/p).
0.75p2 + p - 1 = 0. ⇒ p = {-1 + 1+ 3 }/1.5 = 2/3.

β = 3/2 - 1 = 1/2. Y-5 is Negative Binomial with r = 5 and β = 1/2.
Var[Y - 5] = Var[Y] = (5)(1/2)(3/2) = 15/4.
Alternately, once one has gotten the first success, the number of additional trials until the second
success is independent of and has the same distribution as X, the number of additional trials until the
first success. ⇒ Y = X + X + X + X + X. ⇒ Var[Y] = 5Var[X] = (5)(3/4) = 15/4.
6.27. D. Define a “success” as a month in which at least one accident occurs.

We have a series of independent Bernoulli trials, and we stop upon the fourth success.
The number of failures before the fourth success is Negative Binomial with r = 4 and
β = chance of failure / chance of success = (2/5)/(3/5) = 2/3.
f(0) = 1/(1 + 2/3)4 = 0.1296. f(1) = 4(2/3)/(5/3)5 = 0.20736.
f(2) = {(4)(5)/2!}(2/3)2 /(5/3)6 = 0.20736. f(3) = {(4)(5)(6)/3!}(2/3)3 /(5/3)7 = 0.165888.
Prob[at least 4 failures] = 1 - (0.1296 + 0.20736 + 0.20736 + 0.165888) = 0.289792.
Alternately, instead define a “success” as a month in which no accident occurs.
We have a series of independent Bernoulli trials, and we stop upon the fourth success.
The number of failures before the fourth success is Negative Binomial with r = 4 and
β = chance of failure / chance of success = (3/5)/(2/5) = 1.5.
f(0) = 1/(1 + 1.5)4 = 0.0256. f(1) = (4)1.5/2.55 = 0.06144.
f(2) = {(4)(5)/2!}1.52 /2.56 = 0.09216. f(3) = {(4)(5)(6)/3!}1.53 /2.57 = 0.110592.
The event we want will occur if at the time of the fourth success, the fourth month in which no
accidents occur, there have been fewer than four failures, in other words fewer than four months in
which at least one accident occurs.
Prob[fewer than 4 failures] = 0.0256 + 0.06144 + 0.09216 + 0.110592 = 0.289792.
6.28. D. The number of claims in a week is Geometric with β/(1+β) = 1/2. ⇒ β = 1.

The sum of two independent Geometrics is a Negative Binomial with r = 2 and β = 1.
f(7) = {(2)(3)(4)(5)(6)(7)(8)/7!}β7/(1+β)9 = 1/64.
6.29. C. For the studentʼs Negative Binomial, r = β: f(0) = 1/(1+β)r = 1/(1+r)r = 1/9. ⇒ r = 2.
For the corrected Negative Binomial, r = 2 and: f(0) = 1/(1+β)r = 1/(1+β)2 = 4/9. ⇒ β = .5.
Variance of the corrected Negative Binomial = rβ(1+β) = (2)(.5)(1.5) = 1.5.
6.30. E. For a Negative Binomial distribution, as the exposures change we get another Negative
Binomial; the r parameter changes in proportion, while β remains the same.
The new r = (2500/1000)(5) = 12.5. β = 0.5 and r = 12.5.
6.31. B. For a Negative Binomial, a = β/(1 + β) = 5/6,

and b = (r - 1)β/(1 + β) = (1.5)(5/6) = 5/4.
f(x)/f(x-1) = a + b/x = 5/6 + (5/4)/x, x = 1, 2, 3, ...
To find the mode, where the density is largest, find when this ratio is greater than 1.
5/6 + (5/4)/x = 1. ⇒ x/6 = 5/4. x = 7.5.
So f(7)/f(6) > 1 while f(8)/f(7) < 1, and 7 is the mode.
Comment: f(6) = .0556878. f(7) = .0563507. f(8) = .0557637.
6.32. D. Doubling the exposures, multiplies r by 2. For 2000 policies, total claim counts follow a
Negative Binomial distribution with parameters r = 10 and β = 0.2.
Variance = rβ(1+β) = (10)(0.2)(1.2) = 2.4.
Alternately, for 1000 policies, the variance of total claim counts is: (5)(0.2)(1.2) = 1.2.
2000 policies. ⇔ 1000 policies + 1000 policies.
⇒ For 2000 policies, the variance of total claim counts is: 1.2 + 1.2 = 2.4.
Comment: When one adds independent Negative Binomial Distribution with the same β, one gets
another Negative Binomial Distribution with the sum of the r parameters. When one changes the
number of exposures, the r parameter changes in proportion.
6.33. B. rβ = 3. rβ(1+β) = 12. ⇒ 1 + β = 12/3 = 4. ⇒ β = 3. ⇒ r = 1.

f(3) + f(4) + f(5) = 33 /44 + 34 /45 + 35 /46 = 0.244.
Comment: We have fit via Method of Moments. Since r = 1, this is a Geometric Distribution.
6.34. B. 1. False. 2. True.

3. CV = rβ(1+ β) / (rβ) = (1+ β) / r . False.
Comment: For the Negative Binomial, P(z) = 1/{1 - β(z-1)}r.
The p.g.f. of the sum of two independent variables is the product of their p.g.f.s:
1/({1 - β1(z-1)} r1 {1 - β2(z-1)} r2 ).
This only has the same form as a Negative Binomial if and only if β1 = β2.
6.35. A. For the Pareto, S(500) = (2/2.5)2 = 0.64. Thus the number of losses of size greater than
500 is Negative Binomial with r = 3 and β = (0.64)(4) = 2.56.
The variance of the number of large losses is: (3)(2.56)(3.56) = 27.34.
6.36. D. The total number of visits is the sum of 4 independent, identically distributed Geometric
Distributions, which is a Negative Binomial with r = 4 and β = 1.5.
f(0) = 1/2.54 = 0.0256. f(1) = (4)1.5/2.55 = 0.06144. f(2) = {(4)(5)/2}1.52 /2.56 = 0.09216.
E[N ∧ 3] = 0f(0) + 1f(1) + 2f(2) + 3{1 - f(0) - f(1) - f(2)} = 2.708.
E[(N-3)+] = E[N] - E[N ∧ 3] = (4)(1.5) - 2.708 = 3.292. 100E[(N-3)+] = 329.2.
Alternately, E[(N-3)+] = E[(3-N)+] + E[N] - 3 = 3f(0) + 2f(1) + f(2) + (4)(1.5) - 3 = 3.292.
Comment: The exam question is intending to ask the expected amount that the insurer will to pay
due to claims from this family. This could have been made clearer.
See the section on Limited Expected Values in “Mahlerʼs Guide to Fitting Loss Distributions.”
2016-C-1, Frequency Distributions, §7 Normal Approximation HCM 10/21/15, Page 123
Section 7, Normal Approximation
This section will go over important information that Loss Models assumes the reader already knows
concerning the Normal Distribution and its use to approximate frequency distributions. These ideas
are important for practical applications of frequency distributions.38
The Binomial Distribution with parameters q and m is the sum of m independent Bernoulli trials, each
with parameter q. The Poisson Distribution with λ integer, is the sum of λ independent Poisson
variables each with mean of one. The Negative Binomial Distribution with parameters β and r, with r
integer, is the sum of r independent Geometric distributions each with parameter β.
Thus by the Central Limit Theorem, each of these distributions can be approximated by a Normal
Distribution with the same mean and variance.
For the Binomial as m → ∞, for the Poisson as λ→ ∞, and for the Negative Binomial as
r → ∞, the distribution approaches a Normal39. The approximation is quite good for large values of
the relevant parameter, but not very good for extremely small values.
For example, here is the graph of a Binomial Distribution with q = 0.4 and m = 30.
It is has mean (30)(0.4) = 12 and variance = (30)(0.4)(0.6) = 7.2.
Also shown is a Normal Distribution with µ = 12 and σ = 7.2 = 2.683.
Prob.
0.14
0.12
0.1
0.08
0.06
0.04
0.02
x
0 5 10 15 20 25 30
38
These ideas also underlay Classical Credibility.
39
In fact as discussed in a subsequent section, the Binomial and the Negative Binomial each approach a Poisson
which in turn approaches a Normal.
Here is the graph of a Poisson Distribution with λ = 10, and the approximating Normal Distribution
with µ = 10 and σ = 10 = 3.162:
Prob.
0.12
0.1
0.08
0.06
0.04
0.02
x
0 5 10 15 20 25 30
Here is the graph of a Negative Binomial Distribution with β = 0.5 and r = 20, with
mean (20)(0.5) = 10 and variance (20)(0.5)(1.5) = 15, and the approximating Normal Distribution
with µ = 10 and σ = 15 = 3.873:
Prob.
0.1
0.08
0.06
0.04
0.02
x
0 5 10 15 20 25 30
A typical use of the Normal Approximation would be to find the probability of observing a certain
range of claims. For example, given a certain distribution, what is the probability of at least 10 and no
more than 20 claims.
Exercise: Given a Binomial with parameters q = 0.3 and m = 10, what is the chance of observing 1
or 2 claims?
[Solution: 10(0.31 )(0.79 ) + 45(0.32 )(0.78 ) = 0.1211 + 0.2335 = 0.3546.]
In this case one could compute the exact answer as the sum of only two terms.
Nevertheless, let us illustrate how the Normal Approximation could be used in this case.
The Binomial distribution with q = 0.3 and m = 10 has a mean of: (0.3)(10) = 3, and a variance of:
(10)(0.3)(0.7) = 2.1. This Binomial Distribution can be approximated by a Normal Distribution with
mean of 3 and variance of 2.1, as shown below:
Prob.
0.25
0.2
0.15
0.1
0.05
x
2 4 6 8 10
Prob[1 claim] = the area of a rectangle of width one and height f(1) = 0.1211.
Prob[2 claims] = the area of a rectangle of width one and height f(2) = 0.2335.
The chance of either one or two claims is the sum of these two rectangles; this is approximated by
the area under this Normal Distribution, with mean 3 and variance 2.1, from 1 - .5 = .5 to 2 + .5 = 2.5.
Prob[ 1 or 2 claims] ≅ Φ[(2.5-3)/ 2.1 ] - Φ[(.5-3)/ 2.1 ] = Φ[-0.345] - Φ[-1.725]
= 0.365 - 0.042 = 0.323.
Note that in order to get the probability for two values on the discrete Binomial Distribution, one has
to cover an interval of length two on the real line for the continuous Normal Distribution. We
subtracted 1/2 from the lower end of 1 and added 1/2 to the upper end of 2.
This is called the “continuity correction”.
Below, I have zoomed in on the relevant of part of the previous diagram:
Prob.
0.25 D
C
0.2
0.15
B
0.1 A
0.05
x
0.5 1 1.5 2 2.5
It should make it clear why the continuity correction is needed. In this case the chance of having 1 or 2
claims is equal to the area under the two rectangles, which is not close to the area under the Normal
from 1 to 2, but is approximated by the area under the Normal from 0.5 to 2.5.
In order to use the Normal Approximation, one must translate to the so called “Standard” Normal
Distribution40. In this case, we therefore need to standardize the variables by subtracting the mean
of 3 and dividing by the standard deviation of 2.1 = 1.449. In this case,
0.5 ↔ (0.5 - 3) / 1.449 = -1.725, while 2.5 ↔ (2.5 - 3) / 1.449 = -0.345. Thus, the chance of
observing either 1 or 2 claims is approximately: Φ[-0.345] - Φ[-1.725] = 0.365 - 0.042 = 0.323.
This compares to the exact result of .3546 calculated above. The diagram above shows why the
approximation was too small in this particular case41. Area A is within the first rectangle, but not under
the Normal Distribution. Area B is not within the first rectangle, but is under the Normal Distribution.
Area C is within the second rectangle, but not under the Normal Distribution. Area D is not within the
second rectangle, but is under the Normal Distribution.
Normal Approximation minus Exact Result = (Area B - Area A) + (Area D - Area C).
While there was no advantage to using the Normal approximation in this example, it saves a lot of
time when trying to deal with many terms.
40
Attached to the exam and shown below.
41
The approximation gets better as the mean of the Binomial gets larger. The error can be either positive or
negative.
In general, let µ be the mean of the frequency distribution, while σ is the standard
deviation of the frequency distribution, then the chance of observing at least i claims
(j + 0.5) - µ (i - 0.5) − µ
and not more than j claims is approximately: Φ[ ] - Φ[ ].
σ σ
Exercise: Use the Normal Approximation in order to estimate the probability

of observing at least 10 claims but no more than 18 claims from a Negative Binomial Distribution
with parameters β = 2/3 and r = 20.
[Solution: Mean = rβ = 13.33 and variance = rβ(1+β) = 22.22.
Prob[at least 10 claims but no more than 18 claims] ≅
Φ[(18.5 - 13.33) / 22.22 ] − Φ[(9.5-13.33) / 22.22 ] = Φ[1.097] − Φ[-0.813] =

0.864 - 0.208 = 0.656.
Comment: The exact answer is 0.648.]
Here is a graph of the Normal Approximation used in this exercise:
Prob.
0.08
0.06
0.04
0.02
x
5 10 15 18 20 25 30
The continuity correction in this case: at least 10 claims but no more than 18 claims
↔ 10 - 1/2 = 9.5 to 18 + 1/2 = 18.5 on the Normal Distribution.
Note that Prob[ 10 ≤ # claims ≤ 18] = Prob[ 9 < # claims < 19]. Thus one must be careful to carefully
check the wording, to distinguish between open and closed intervals.
Prob[ 9 < # claims < 19] = Prob[ 10 ≤ # claims ≤ 18] ≅ Φ[{18.5 - µ} / σ] − Φ[{9.5 - µ} / σ].
One should use the continuity correction whenever one is using the Normal Distribution
in order to approximate the probability associated with a discrete distribution.
Do not use the continuity correction when one is using the Normal Distribution in order to
approximate continuous distributions, such as aggregate distributions42 or the Gamma Distribution.
Exercise: Use the Normal Approximation in order to estimate the probability

of observing more than 15 claims from a Poisson Distribution with λ = 10.
[Solution: Mean = variance = 10. Prob[# claims > 15] = 1 - Prob[# claims ≤ 15] ≅
1 - Φ[(15.5-10)/ 10 ] = 1 - Φ[1.739] = 1 - 0.9590 = 4.10%.

Comment: The exact answer is 4.87%.]
The area under the Normal Distribution and to the right of the vertical line at 15.5 is the approximation
used in this exercise:
Prob.
0.12
0.1
0.08
0.06
0.04
0.02
x
0 5 10 15 20
42
See “Mahlerʼs Guide to Aggregate Distributions.”
Diagrams:
Some of you will find the following simple diagrams useful when applying the Normal
Approximation to discrete distributions.
More than 15 claims ⇔ At least 16 claims ⇔ 16 claims or more
15 15.5 16
|→
Prob[More than 15 claims] ≅ 1 - Φ[(15.5 - µ)/σ].
Exercise: For a frequency distribution with mean 14 and standard deviation 2, using the Normal
Approximation, what is the probability of at least 16 claims?
[Solution: Prob[At least 16 claims] = Prob[More than 15 claims] ≅ 1 - Φ[(15.5 - µ)/σ] =
1 - Φ[(15.5 - 14)/2] = 1 - Φ[0.75] = 1 - 0.07734 = 22.66%.]
Less than 12 claims ⇔ At most 11 claims ⇔ 11 claims or less
11 11.5 12
←|
Prob[Less than 12 claims] ≅ Φ[(11.5 - µ)/σ].
Approximation, what is the probability of at most 11 claims?
[Solution: Prob[At most 11 claims] = Prob[Less than 12 claims] ≅ Φ[(11.5 - µ)/σ] =
Φ[(11.5 - 10)/4] = Φ[0.375] = 64.6%.]
At least 10 claims and at most 13 claims ⇔ More than 9 claims and less than 14 claims
9 9.5 10 11 12 13 13.5 14
Prob[At least 10 claims and at most 13 claims ] ≅ Φ[(13.5 - µ)/σ] - Φ[(9.5 - µ)/σ].
Approximation, what is the probability of more than 9 claims and less than 14 claims?
[Solution: Prob[more than 9 claims and less than 14 claims] =
Prob[At least 10 claims and at most 13 claims] ≅ Φ[(13.5 - µ)/σ] − Φ[(9.5 - µ)/σ] =
Φ[(13.5 - 10)/4] - Φ[(9.5 - 10)/4] = Φ[0.875] - Φ[-0.125] = 0.809 - 0.450 = 35.9%.]
Confidence Intervals:
One can use the lower portion of the Normal Distribution table in order to get confidence intervals.
For example, in order to get a 95% confidence interval, one allows 2.5% probability on either tail.
Φ(1.96) = (1 + 95%)/2 = 97.5%.
Thus 95% of the probability on the Standard Normal Distribution is between -1.96 and 1.96:
2.5% 2.5%
- 1.96 1.96
Thus a 95% confidence interval for a Normal would be: mean ± 1.960 standard deviations.
Similarly, since Φ(1.645) = (1 + 90%)/2 = 95%, a 90% confidence interval is:

mean ± 1.645 standard deviations.
5% 5%
- 1.645 1.645
Normal Distribution:
The Normal Distribution is a bell-shaped symmetric distribution. Its two parameters are
(x -µ)2
exp[- ]
2σ2 , -∞ < x < ∞.
its mean µ and its standard deviation σ. f(x) =
σ 2π
The sum of two independent Normal Distributions is also a Normal Distribution, with the
sum of the means and variances. If X is normally distributed, then so is aX + b, but with mean
aµ+b and standard deviation aσ. If one standardizes a normally distributed variable by subtracting µ
and dividing by σ, then one obtains a Standard Normal with mean 0 and standard deviation of 1.
A Normal Distribution with µ = 10 and σ = 5:
0.08
0.06
0.04
0.02
- 10 10 20 30
exp[-x2 / 2]
The density of the Standard Normal is denoted by φ(x) = , -∞ < x < ∞.43
2π
The corresponding distribution function is denoted by Φ(x).
Φ(x) ≅ 1- φ(x){.4361836t -.1201676t2 +.9372980t3 }, where t = 1/(1+.33267x).44
43
As shown near the bottom of the first page of the Tables for Exam C.
44
See pages 103-104 of Simulation by Ross or 26.2.16 in Handbook of Mathematical Functions.
Normal Distribution
Support: ∞ > x > -∞ Parameters: ∞ > µ > -∞ (location parameter)

σ > 0 (scale parameter)
D. f. : F(x) = Φ[(x−µ)/σ]
(x -µ)2
exp[- ]
⎡ x - µ⎤ 2σ2 . exp[-x2 / 2]
P. d. f. : f(x) = φ⎢ /σ = φ(x) = .
⎣ σ ⎥⎦ σ 2π 2π
Central Moments: E[(X−µ)n] = σn n! /{2n/2 (n/2)!} n even, n ≥ 2

E[(X−µ)n] = 0 n odd, n ≥ 1
Mean = µ Variance = σ2
Coefficient of Variation = Standard Deviation / Mean = σ/µ
Skewness = 0 (distribution is symmetric) Kurtosis = 3
Mode = µ Median = µ
Limited Expected Value Function:

E[X ∧ x] = µΦ[(x−µ)/σ] − σexp[ -(x-µ )2 /(2σ2 ) ]/ 2 π + x {1 - Φ[(x−µ)/σ]}
Excess Ratio: R(x) = {1- x/µ}{1−Φ((x−µ)/σ)} + (σ/µ)exp( -[(x-µ )2 ]/[2σ2 ] )/ 2 π
Mean Residual Life: e(x) = µ - x + σexp( -[(x-µ )2 ]/[2σ2 ] )/ {{1 - Φ((x−µ)/σ) 2 π }}
Derivatives of d.f. : ∂F(x) / ∂µ = - φ((x−µ)/σ) ∂F(x) / ∂σ = - φ((x−µ)/σ) / σ2
Method of Moments: µ = µ1 ′ , σ = (µ2 ′ - µ1 ′2)0.5
Percentile Matching: Set gi = Φ−1(pi), then σ = (x1 -x2 )/(g1 -g2 ), µ = x1 - σg1
Method of Maximum Likelihood: Same as Method of Moments.

Using the Normal Table:
When using the normal distribution, choose the nearest z-value to find the probability, or
if the probability is given, chose the nearest z-value. No interpolation should be used.
Example: If the given z-value is 0.759, and you need to find Pr(Z < 0.759) from the normal
distribution table, then choose the probability value for z-value = 0.76; Pr(Z < 0.76) = 0.7764.
When using the Normal Approximation to a discrete distribution, use the continuity correction.45
When using the top portion of the table, use the symmetry of the Standard Normal Distribution
around zero: Φ[-x] = 1 - Φ[x].
For example, Φ[-0.4] = 1 - Φ[0.4] = 1 - 0.6554 = 0.3446.
The bottom portion of the table can be used to get confidence intervals.
To cover a confidence interval of probability P, find y such that Φ[y] = (1 + P)/2.
For example, in order to get a 95% confidence interval, find y such that Φ[y] = 97.5%.
Thus, y = 1.960.
[-1.960, 1.960] covers 95% probability on a Standard Normal Distribution.
45
The instructions for Exam C from the SOA.
Normal Distribution Table
Entries represent the area under the standardized normal distribution from -∞ to z, Pr(Z < z). The value of z
to the first decimal place is given in the left column. The second decimal is given in the top row.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
3.5 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998
3.6 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999
3.7 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999
3.8 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999
3.9 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
Values of z for selected values of Pr(Z < z)

z 0.842 1.036 1.282 1.645 1.960 2.326 2.576
Pr(Z < z) 0.800 0.850 0.900 0.950 0.975 0.990 0.995
Problems:
7.1 (2 points) You roll 1000 6-sided dice.

What is the chance of observing exactly 167 sixes?
(Use the Normal Approximation.)
A. less than 2.5%
E. at least 4.0%
7.2 (2 points) You roll 1000 6-sided dice.

What is the chance of observing 150 or more sixes but less than or equal to 180 sixes?
A. less than 78%
E. at least 81%
7.3 (2 points) You conduct 100 independent Bernoulli Trials, each with chance of success 1/4.
What is the chance of observing a total of at least 16 but not more than 20 successes?
A. less than 11%
E. at least 14%
7.4 (2 points) One observes 10,000 independent lives, each of which has a 2% chance of death
over the coming year. What is the chance of observing 205 or more deaths?
A. less than 36%
E. at least 39%
7.5 (2 points) The number of claims in a year is given by a Poisson distribution with parameter
λ = 400. What is the probability of observing at least 420 but no more than 440 claims over the
next year? (Use the Normal Approximation.)
A. less than 11%
E. at least 14%
Use the following information in the next three questions:

The Few States Insurance Company writes insurance in the states of Taxachusetts, Florgia and
Calizonia. Claims frequency for Few States Insurance in each state is Poisson, with expected claims
per year of 400 in Taxachusetts, 500 in Florgia and 1000 in Calizonia. The claim frequencies in the
three states are independent.
7.6 (2 points) What is the chance of Few States Insurance having a total of more than 1950 claims
next year? (Use the Normal Approximation.)
A. less than 10%
E. at least 13%
7.7 (3 points) What is the chance that Few States Insurance has more claims next year from
Taxachusetts and Florgia combined than from Calizonia?
A. less than 1.0%
E. at least 1.6%
7.8 (3 points) Define a large claim as one larger than $10,000. Assume that 30% of claims are large
in Taxachusetts, 25% in Florgia and 20% in Calizonia. Which of the following is an approximate 90%
confidence interval for the number of large claims observed by Few States Insurance over the next
year? Frequency and severity are independent.
A. [390, 500] B. [395, 495] C. [400, 490] D. [405, 485] E. [410, 480]
7.9 (2 points) A six-sided die is rolled five times. Using the Central Limit Theorem, what is the
estimated probability of obtaining a total of 20 on the five rolls?
A. less than 9.0%
B. at least 9% but less than 9.5%
C. at least 9.5% but less than 10%
D. at least 10% but less than 10.5%
E. at least 10.5%
7.10 (2 points) The number of claims in a year is given by the negative binomial distribution:
⎛ 9999 + x⎞
P[X=x] = ⎜ ⎟ 0.610000 0.4x, x = 0,1,2,3...
⎝ x ⎠
Using the Central Limit Theorem, what is the estimated probability of having 6800 or more claims in
a year?
A. less than 10.5%
B. at least 10.5% but less than 11%
C. at least 11% but less than 11.5%
D. at least 11.5% but less than 12%
E. at least 12%
7.11 (2 points) In order to estimate 1 - Φ(4), use the formula:
Φ(x) ≅ 1- φ(x){.4361836t -.1201676t2 +.9372980t3 }, where t = 1/(1 + .33267x),

A. less than .0020%
B. at least .0020% but less than .0020%
C. at least .0025% but less than .0030%
D. at least .0030% but less than .0035%
E. at least .0035%

• The New York Yankees baseball team plays 162 games.
• Assume the Yankees have an a priori chance of winning each game of 65%.
• Assume the results of the games are independent of each other.

What is the chance of the Yankees winning 114 or more games?
A. less than 6%
E. at least 9%

• Sue takes an actuarial exam with 40 multiple choice questions, each of equal value.
• Sue knows the answers to 13 questions and answers them correctly.
• Sue guesses at random on the remaining 27 questions, with a 1/5 chance of
getting each such question correct, with each question independent of the others.
If 22 correct answers are needed to pass the exam, what is the probability that Sue passed her
exam?
Use the Normal Approximation.
A. 4% B. 5% C. 6% D. 7% E. 8%

• The New York Yankees baseball team plays 162 games, 81 at home and 81 on
the road.
• The Yankees have an a priori chance of winning each home game of 80%.
• The Yankees have an a priori chance of winning each road game of 50%.
• Assume the results of the games are independent of each other.

What is the chance of the Yankees winning 114 or more games?
A. less than 6%
E. at least 9%

• Lucky Tom takes an actuarial exam with 40 multiple choice questions, each of equal value.
• Lucky Tom knows absolutely nothing about the material being tested.
• Lucky Tom guesses at random on each question, with a 40% chance of
getting each question correct, independent of the others.
If 24 correct answers are needed to pass the exam, what is the probability that Lucky Tom passed
his exam? Use the Normal Approximation.
A. 0.4% B. 0.5% C. 0.6% D. 0.7% E. 0.8%
7.16 (4 points) X has a Normal Distribution with mean µ and standard deviation σ.
Determine the expected value of |x|.
7.17 (4, 5/86, Q.48) (2 points) Assume an insurer has 400 claims drawn independently from a
distribution with mean 500 and variance 10,000.
Assuming that the Central Limit Theorem applies, find M such that the probability of the sum of
these claims being less than or equal to M is approximately 99%.
In which of the following intervals is M?
A. Less than 202,000
B. At least 202,000, but less than 203,000
C. At least 203,000, but less than 204,000
D. At least 204,000, but less than 205,000
E. 205,000 or more
7.18 (4, 5/86, Q.51) (1 point) Suppose X has a Poisson distribution with mean q.
Let Φ be the (Cumulative) Standard Normal Distribution.
Which of the following is an approximation for Prob(1 ≤ x ≤ 4) for sufficiently large q?
A. Φ[(4 - q) / q ] - Φ[(1 - q) / q]
B. Φ[(4.5 - q) / q ] - Φ[(.5 - q) / q]
C. Φ[(1.5 - q) / q ] - Φ[(3.5 - q) / q]
D. Φ[( 3.5 - q) / q ] - Φ[(1.5 - q) / q]

E. Φ[(4 - q) / q] - Φ[(1 - q) / q]
7.19 (4, 5/87, Q.51) (2 points) Suppose that the number of claims for an individual policy during a
year has a Poisson distribution with mean 0.01. What is the probability that there will be 5, 6, or 7
claims from 400 identical policies in one year, assuming a normal approximation?
A. Less than 0.30
E. 0.45 or more.
7.20 (4, 5/88, Q.46) (1 point) A random variable X is normally distributed with mean 4.8 and
variance 4. The probability that X lies between 3.6 and 7.2 is Φ(b) - Φ(a) where Φ is the distribution
function of the unit normal variable. What are a and b, respectively?
A. 0.6, 1.2 B. 0.6, -0.3 C. -0.3, 0.6 D. -0.6, 1.2 E. None A, B, C, or D.
7.21 (4, 5/88, Q.49) (1 point) An unbiased coin is tossed 20 times. Using the normal
approximation, what is the probability of obtaining at least 8 heads?
The cumulative unit normal distribution is denoted by Φ(x).
A. Φ(-1.118) B. Φ(-0.671) C. 1 - Φ(-0.447) D. Φ(0.671) E. Φ(1.118)
7.22 (4, 5/90, Q.25) (1 point) Suppose the distribution of claim amounts is normal with a mean of
$1,500. If the probability that a claim exceeds $5,000 is .015, in what range is the standard
deviation, σ, of the distribution?
A. σ < 1,600
B. 1,600 ≤ σ < 1,625
C. 1,625 ≤ σ < 1,650
D. 1,650 ≤ σ < 1,675
E. σ ≥ 1,675
7.23 (4, 5/90, Q.36) (2 points) The number of claims for each insured written by the
Homogeneous Insurance Company follows a Poisson process with a mean of .16.
The company has 100 independent insureds.
Let p be the probability that the company has more than 12 claims and less than 20 claims.
In what range does p fall? You may use the normal approximation.
A. p < 0.61
B. 0.61 < p < 0.63
C. 0.63 < p < 0.65
D. 0.65 < p < 0.67
E. 0.67 < p
7.24 (4, 5/91, Q.29) (2 points) A sample of 1,000 policies yields an estimated claim frequency of
0.210. Assuming the number of claims for each policy has a Poisson distribution, use the Normal
Approximation to find a 95% confidence interval for this estimate.
A. (0.198, 0.225) B. (0.191, 0.232) C. (0.183, 0.240)
D. (0.173, 0.251) E. (0.161, 0.264)
7.25 (4B, 5/92, Q.5) (2 points) You are given the following information:
• Number of large claims follows a Poisson distribution.
• Exposures are constant and there are no inflationary effects.
• In the past 5 years, the following number of large claims has occurred: 12, 15, 19, 11, 18
Estimate the probability that more than 25 large claims occur in one year.
(The Poisson distribution should be approximated by the normal distribution.)
A. Less than .002
B. At least .002 but less than .003
C. At least .003 but less than .004
D. At least .004 but less than .005
E. At least .005
• The occurrence of hurricanes in a given year has a Poisson distribution.
• For the last 10 years, the following number of hurricanes has occurred:
2, 4, 3, 8, 2, 7, 6, 3, 5, 2
Using the normal approximation to the Poisson, determine the probability of more than 10
hurricanes occurring in a single year.
A. Less than 0.0005
E. At least 0.0065
7.27 (4B, 5/94, Q.20) (2 points) The occurrence of tornadoes in a given year is assumed to follow
a binomial distribution with parameters m = 50 and q = 0.60.
Using the Normal approximation to the binomial, determine the probability that at least 25 and at
most 40 tornadoes occur in a given year.
A. Less than 0.80
E. At least 0.95
7.28 (5A, 11/94, Q.35) (1.5 points)

An insurance contract was priced with the following assumptions:
Claim frequency is Poisson with mean 0.01.
All claims are of size $5000.
Premiums are 110% of expected losses.
The company requires a 99% probability of not having losses exceed premiums.
(3/4 point) a. What is the minimum number of policies that the company must write given
the above surplus requirement?
(3/4 point) b. After the rate has been established, it was discovered that the claim severity
assumption was incorrect and that the claim severity should be 5% greater than
originally assumed. Now, what is the minimum number of policies that
the company must write given the above surplus requirement?
• A portfolio consists of 1,600 independent risks.
• For each risk the probability of at least one claim is 0.5.
Using the Central Limit Theorem, determine the approximate probability that the number of risks in
the portfolio with at least one claim will be greater than 850.
A. Less than 0.01
E. At least 0.20
• A portfolio consists of 10,000 identical and independent risks.
• The number of claims per year for each risk follows a Poisson distribution with mean λ.
• During the latest year, 1000 claims have been observed for the entire portfolio.
Determine the lower bound of a symmetric 95% confidence interval for λ.
A. Less than 0.0825
E. At least 0.0975
7.31 (IOA 101, 9/00, Q.3) (1.5 points) The number of claims arising in a period of one month from
a group of policies can be modeled by a Poisson distribution with mean 24.
Using the Normal Approximation, determine the probability that fewer than 20 claims arise in a
particular month.
7.32 (IOA 101, 4/01, Q.4) (1.5 points) For a certain type of policy the probability that a
policyholder will make a claim in a year is 0.001. If a random sample of 10,000 policyholders is
selected, using the Normal Approximation, calculate an approximate value for the probability that
not more than 5 will make a claim next year.
7.1. C. This is Binomial with q = 1/6 and m =1000. Mean = 1000 /6 = 166.66.
Standard Deviation = (1000)(5 / 6)(1/ 6) = 11.785.
167.5 -166.66 166.5 -166.66
Φ[ ] - Φ[ ] = Φ[0.07] - Φ[-0.01] = 0.5279 - 0.4960 = 0.0319.
11.785 11.785
7.2. D. This is Binomial with q = 1/6 and m =1000. Mean = 1000 /6 = 166.66.
Standard Deviation = (1000)(5 / 6)(1/ 6) = 11.785. The interval from 150 to 180 corresponds on
the Standard Normal to the interval from {(149.5-166.66)/11.785}
to {(180.5-166.66)/11.785}. Therefore the desired probability is:
Φ((180.5-166.66)/11.785) - Φ((149.5-166.66)/11.785) = Φ(1.17) - Φ( -1.46) =
.8790 - .0721 = 0.8069.
Comment: The exact answer is 0.8080, so the Normal Approximation is quite good.
7.3. D. This is the Binomial Distribution with q =.25 and m = 100. Therefore the mean is (100)(.25)
= 25. The Variance is: (100)(.25)(.75) = 18.75 and the Standard Deviation is: 18.75 = 4.330.
Therefore the desired probability is:
Φ[(20.5-25)/4.330) - Φ((15.5-25)/4.330] = Φ(-1.04) - Φ(-2.19) = .1492 - .0143 = 0.1349.
Comment: The exact answer is .1377, so the Normal Approximation is okay.
7.4. C. Binomial Distribution with mean = 200 and variance = (10,000)(.02)(1-.02) = 196.
Standard deviation = 14. Chance of 205 or more claims = 1 - chance of 204 claims or less ≅
1 - Φ((204.5-200)/14) = 1 - Φ(.32) =1 - .6255 = 0.3745.
7.5. E. Mean = 400 = variance. Standard deviation = 20.

Φ((440.5-400)/20) - Φ((419.5-400)/20) = Φ(2.03) - Φ(0.98) =.9788 - .8365 = 0.1423.
7.6. D. The total claims follow a Poisson Distribution with mean 400 + 500 + 1000 = 1900, since
independent Poisson variables add. This has a variance equal to the mean of 1900 and therefore a
standard deviation of 1900 = 43.59.
Prob[more than 1950 claims] ≅ 1 - Φ((1950.5-1900)/43.59) = 1 - Φ(1.16) = 1 - 0.8770 = 0.123.
7.7. B. The number of claims in Taxachusetts and Florgia is given by a Poisson with mean
400 + 500 = 900. (Since the sum of independent Poisson variables is a Poisson.) This is
approximated by a Normal distribution with a mean of 900 and variance of 900. The number of
claims in Calizonia is approximated by the Normal distribution with mean 1000 and variance of
1000. The difference between the number of claims in Calizonia and the sum of the claims in
Taxachusetts and Florgia is therefore approximately a Normal Distribution with
mean = 1000 - 900 = 100 and variance = 1000 + 900 = 1900.
More claims next year from Taxachusetts and Florgia combined than from Calizonia ⇔
(# in Calizonia) - (# in Taxachusetts + # in Florgia) < 0 ⇔

(# in Calizonia) - (# in Taxachusetts + # in Florgia) ≤ -1.
The probability of this is approximately: Φ({(0 - .5) -100} / 1900 ) = Φ(-2.31) = .0104.
Comment: The sum of independent Normal variables is a Normal.
If X is Normal, then so is -X, so the difference of Normal variables is also Normal.
Also E[X - Y] = E[X] - E[Y]. For X and Y independent variables Var[X - Y] = Var[X] + Var[Y].
7.8. E. The number of large claims in Taxachusetts is Poisson with mean (30%)(400) = 120. (This
is the concept of “thinning” a Poisson.) Similarly the number of large claims in Florgia and Calizonia
are Poisson with means of 125 and 200 respectively. Thus the large claims from all three states is
Poisson with mean = 120 + 125 + 200 = 445. (The sum of independent Poisson variables is a
Poisson.) This Poisson is approximated by a Normal with a mean of 445 and a variance of 445.
The standard deviation is 445 = 21.10. Φ(1.645) = .95 and thus a 90% confidence interval ≅
the mean ± 1.645 standard deviations, which in this case is about 445 ± (1.645)(21.10) =
410.3 to 479.7. Thus [410, 480] covers a little more than 90% of the probability.
7.9. A. For a six-sided die the mean is 3.5 and the variance is 35/12. For 5 such dice the mean is:
(5)(3.5) = 17.5 and the variance is: (5)(35/12) = 175/12. The standard deviation = 3.819. Thus the
interval from 19.5 to 20.5 corresponds to (19.5-17.5)/3.819 = .524 to
(20.5-17.5)/3.819 = .786 on the standard unit normal distribution. Using the Standard Normal Table,
this has a probability of Φ(.79) - Φ(.52) = .7852 - .6985 = 0.0867.
7.10. A. A Negative Binomial distribution with β = 2/3 and r =10000.

Mean = rβ= (10000)(2/3) = 6666.66. Variance = mean(1+ β) = 11111.11
Standard Deviation = 105.4. 1 - Φ((6799.5-6666.66)/105.4) = 1 - Φ(1.26) = 1- .8962 = 10.38%
Comment: You have to recognize that this is an alternate way of writing the Negative Binomial
Distribution. In the tables attached to the exam, f(x) = {r(r+1)..(r+x-1)/x!} β x / (1+β)x+r. The factor
β/(1 + β) is taken to the power x. Thus for the form of the distribution in this question, β/ (1 + β) =
0.4, and solving β = 2/3. Then line up with the formula in Appendix B and note that r = 10,000.
7.11. D. t = 1/(1+(.33267)(4)) = .42906.

φ(4) = exp(-42 /2) / 2 π = 0.00033546/2.5066 = 0.00013383.
1- Φ(4) ≅ (.0013383){.4361836(.42906) -.1201676(.42906)2 +.9372980(.42906)3 } =
(.00013383)(.23906) = 0.00003199.
Comment: The exact answer is 1- Φ(4) = .000031672.
7.12. D. The number of games won is a binomial with m =162 and q = .65.
The mean is: (162)(.65) = 105.3. The variance is: (162)(.65)(1-.65) = 36.855.
The standard deviation is 36.855 = 6.07. Thus the chance of 114 or more wins is about:
1 - Φ((113.5-105.3)/6.07) = 1 - Φ(1.35) = 1 - .9115 = 8.85%.
Comment: The exact probability is 8.72%.
7.13. D. The number of correct guesses is Binomial with parameters m = 27 and

q = 1/5, with mean: (1/5)(27) = 5.4 and variance: (1/5)(4/5)(27) = 4.32.
Therefore, Prob(# correct guesses ≥ 9) ≅ 1 - Φ[(8.5-5.4)/ 4.32 ] = 1 - Φ(1.49) = 6.81%.
Comment: Any resemblance between the situation in this question and actual exams is coincidental.
The exact answer is in terms of an incomplete Beta Function: 1 - β(19, 9, 0.8) = 7.4%.
If Sue knows c questions, c ≤ 22, then her chance of passing is: 1 - β(19, 22-c, 0.8),
as displayed below:
Prob.
1.0
0.8
0.6
0.4
0.2
c
5 10 15 20
7.14. C. The number of home games won is a binomial with m = 81 and q = .8. The mean is:
(81)(.8) = 64.8 and variance is: (81)(.8)(1-.8) = 12.96. The number of road games won is a
binomial with m = 81 and q = .5. The mean is: (81)(.5) = 40.5 and variance is: (81)(.5)(1-.5) =
20.25. The number of home and road wins are independent random variables, thus the variance of
their sum is the sum of their variances: 12.96 + 20.25 = 33.21.
The standard deviation is 33.21 = 5.76. The mean number of wins is: 64.8 + 40.5 = 105.3.
Thus the chance of 114 or more wins is about:
1 - Φ((113.5-105.3)/5.76) = 1 - Φ(1.42) = 1 - .9228 = 7.78%.
Comment: The exact probability is 7.62%, obtained by convoluting two binomial distributions.
7.15. E. Number of questions Lucky Tom guesses correctly is Binomial with mean
(0.4)(40) = 16, and variance (40)(0.4)(0.6) = 9.6. The probability he guesses 24 or more correctly
is approximately: 1 - Φ[(23.5 - 16)/ 9.6 ] = 1 - Φ(2.42) = 1 - .9922 = 0.78%.
Comment: The exact answer is 0.834177%. An ordinary person would only have a 20% chance of
randomly guessing correctly on each question. Therefore, their chance of passing would be
approximately: 1 - Φ[(23.5 - 8)/ 6.4 ] = 1 - Φ(6.13) = 4.5 x 10-10.
7.16. Let y = (x - µ)/σ. Then y follows a Standard Normal Distribution with mean 0
and standard deviation 1. f(y) = exp[-y2 /2]/ 2 π . x = σy + µ.
Expected value of |x| = Expected value of |σy + µ| =
∞ -µ/ σ ∞
| σy + µ | exp[-y2 / 2] (σy + µ) exp[-y2 / 2] (σy + µ) exp[-y2 / 2]
∫ 2π
dy = - ∫ 2π
dy + ∫ 2π
dy
-∞ -∞ -µ/ σ
-µ/ σ ∞
y exp[-y2 / 2] y exp[-y2 / 2]
= -σ ∫ 2π
dy - µΦ(-µ/σ) + σ ∫ 2π
dy + µ{1 - Φ(-µ/σ)} =
-∞ -µ/ σ
y=∞ y = -µ / σ
σ ⎤ σ ⎤
µ{1 - 2Φ(-µ/s)} + exp[-y2 / 2]⎥ - exp[-y2 / 2]⎥ =
2π ⎦ 2π ⎦
y = -µ / σ y = -∞
2 µ2
µ{1 - 2Φ[-µ/σ]} + σ exp[- ].
π 2 σ2
2
Comment: For a Standard Normal, with µ = 0 and σ = 1, E[|X|] = .
π
7.17. D. The sum of 400 claims has a mean of (400)(500) = 200,000 and a variance of
(400)(10000). Thus the standard deviation of the sum is (20)(100) = 2000.
In order to standardize the variables one subtracts the mean and divides by the standard deviation,
thus standardizing M gives: (M - 200,000)/2000.
We wish the probability of the sum of the claims being less than or equal to M to be 99%.
For the standard Normal Distribution, Φ(2.327) = 0.99.
Setting 2.327 = (M - 200,000)/2000, we get M = 200000 + (2.327)(2000) = 204,654.
7.18. B. Φ[(4.5 - q) / q ] - Φ[(0.5 - q ) / q ].
7.19. C. The portfolio has a mean of (400)(.01) = 4. Since each policy has a variance of .01 and
they should be assumed to be independent, then the variance of the portfolio is (400)(.01) = 4.
Thus the probability of 5, 6 or 7 claims is approximately:
Φ[(7.5-4)/2] - Φ[4.5-4)/2] = Φ[1.75] - Φ[.25] = 0.9599 - 0.5987 = 0.3612.
7.20. D. The standard deviation of the Normal is 4 = 2.

Thus 3.6 corresponds to (3.6-4.8)/2 = -0.6, while 7.2 corresponds to (7.2-4.8)/2 = 1.2.
7.21. E. The distribution is Binomial with q =.5 and m = 20. That has mean (20)(.5) = 10 and
variance (20)(.5)(1-.5) = 5. The chance of obtaining 8 or more heads is approximately:
1 - Φ[(7.5-10)/ 5 ] = 1 - Φ(-1.118) = 1 - {1 - Φ(1.118)} = Φ(1.118).
7.22. B. The chance that a claim exceeds 5000 is 1 - Φ((5000-1500) / σ) = .015.

Thus Φ(3500 / σ) = .985. Consulting the Standard Normal Distribution, Φ(2.17) = .985,
therefore 3500 / σ = 2.17. σ = 3500 / 2.17 = 1613.
7.23. B. The sum of independent Poisson variables is a Poison. The mean number of claims is
(100)(.16) = 16. Since for a Poisson the mean and variance are equal, the variance is also 16.
The standard deviation is 4. The probability is:
Φ((19.5-16) / 4) - Φ((12.5-16) / 4) = Φ(0.87) - Φ(-0.88) = 0.8078 - 0.1894 = 0.6184.
Comment: More than 12 claims (greater than or equal to 13 claims) corresponds to 12.5 due to the
“continuity correction”.
7.24. C. The observed number of claims is (1000)(.210) = 210. Since for the Poisson Distribution
the variance is equal to the mean, the estimated variance for the sum is also 210. The standard
deviation is 210 = 14.49. Using the Normal Approximation, an approximate 95% confidence
interval is ± 1.96 standard deviations. Φ(1.96) = 0.975. Therefore a 95% confidence interval for the
number of claims from 1000 policies is 210 ± (1.96)(14.49) =
210 ± 28.4. A 95% confidence interval for the claim frequency is: 0.210 ± 0.028.
Alternately, the standard deviation for the estimated frequency declines as the square root of the
number of policies used to estimate it: 0.210 / 1000 = .458 / 31.62 = 0.01449. Thus a 95%
confidence interval for the claim frequency is: 0.210 ± (1.96)(0.01449) = 0.210 ± 0.028.
Alternately, one can be a little more “precise” and let λ be the Poisson frequency. Then the standard
deviation is: λ / 1000 and the 95% confidence interval has λ within 1.96 standard deviations of
0.210: -1.96 λ / 1000 ≤ (0.210 - λ) ≤ 1.96 λ / 1000 . We can solve for the boundaries of this
interval: 1.962 λ /1000 = (0.210 - λ)2 . ⇒ λ2 - 0.4238416λ + 0.0441 = 0.
λ = {0.4238416 ± 0.42384162 - (4)(1)(0.0441) } / {(2) (1)} = 0.2119 ± 0.0285.

Thus the boundaries are 0.2119 - 0.0285 = 0.183 and 0.2119 + 0.0285 = 0.240.
Comment: One needs to assume that the policies have independent claim frequencies.
The sum of independent Poisson variables is again a Poisson.
7.25. C. The average number of large claims observed per year is: (12+15+19+11+18)/5 = 15.
Thus we estimate that the Poisson has a mean of 15 and thus a variance of 15.
Thus Prob(N > 25) ≅ 1 - Φ[(25.5 - 15)/ 15 ] = 1 - Φ(2.71) ≅ 1 - 0.9966 = 0.0034.
7.26. B. The observed mean is 42 / 10 = 4.2. Assume a Poisson with mean of 4.2 and therefore
variance of 4.2. Using the “continuity correction”, more than 10 on the discrete Poisson,
(11, 12, 13, ...) will correspond to more than 10.5 on the continuous Normal Distribution.
With a mean of 4.2 and a standard deviation of 4.2 = 2.05, 10.5 is “standardized” to:
(10.5 - 4.2) / 2.05 = 3.07. Thus P(N > 10) ≅ 1 - Φ(3.07) = 1 - .9989 = 0.0011.
7.27. D. The mean of the Binomial is mq = (50)(.6) = 30.

The variance of the Binomial is mq(1-q) = (50)(.6)(1-.6) = 12.
Thus the standard deviation is 12 = 3.464.
Φ[(40.5 - 30) / 3.464] - Φ[(24.5 - 30) / 3.464] = Φ(3.03) - Φ(-1.59) = 0.9988 - 0.0559 = 0.9429.
7.28. (a) With N policies, the mean aggregate loss = N(.01)(5000) and the variance of aggregate
losses = N(.01)(50002 ). Thus premiums are 1.1N(.01)(5000).
The 99th percentile of the Standard Normal Distribution is 2.326. Thus we want:
Premiums - Expected Losses = 2.326(standard deviation of aggregate losses).
0.1N(0.01)(5000) = 2.326 N(0.01)(50002 ) . Therefore, N = (2.326/0.1)2 /0.01 = 54,103.
(b) With a severity of (1.05)(5000) = 5250 and N policies, the mean aggregate loss =
N(.01)(5250), and the variance of aggregate losses = N(.01)(52502 ).
Premiums are still: 1.1N(0.01)(5000). Therefore, we want:
1.1N(0.01)(5000) - N(0.01)(5250) = 2.326 N(0.01)(52502 ) .
2.5N = 1221.15 N . N = 238,593.
7.29. A. The mean is 800 while the variance is: (0.5)(1 - 0.5)(1600) = 400.
Thus the standard deviation is 20.
Using the continuity correction, more than 850 corresponds on the continuous Normal Distribution to:
1 - Φ[(850.5-800)/20] = 1 - Φ(2.53) = 0.0057.
7.30. D. The observed frequency is 1000/10000 = .1, which is the point estimate for λ.
Since for a Poisson the variance is equal to the mean, the estimated variance for a single insured of
its observed frequency is .1. For the sum of 10,000 identical insureds, the variance is divided by
10000; thus the variance of the observed frequency is: 0.1/10,000 = 1/100,000. The standard
deviation is 1/ 100,000 = 0.00316. Using the Normal Approximation, ±1.96 standard deviations
would produce an approximate 95% confidence interval:
0.1 ± (1.96)(0.00316) = 0.1 ± 0.0062 = [0.0938, 0.1062].
7.31. Prob[N < 20] ≅ Φ[(19.5 - 24)/ 24 ] = Φ[-0.92] = 17.88%.
7.32. The Binomial has mean: (0.001)(10,000) = 10, and variance: (0.001)(.999)(10,000) = 9.99.
Prob[N ≤ 5] ≅ Φ[(5.5 - 10)/ 9.99 ] = Φ[-1.42] = 7.78%.
Comment: The exact answer is 0.06699.
2016-C-1, Frequency Distributions, §8 Skewness HCM 10/21/15, Page 151
Section 8, Skewness
Skewness is one measure of the shape of a distribution.46
For example, take the following frequency distribution:
0 0.1 0 0 0.0
1 0.2 0.2 0.2 0.2
2 0 0 0 0.0
3 0.1 0.3 0.9 2.7
4 0 0 0 0.0
5 0 0 0 0.0
6 0.1 0.6 3.6 21.6
7 0 0 0 0.0
8 0 0 0 0.0
9 0.1 0.9 8.1 72.9
10 0.3 3 30 300.0
11 0.1 1.1 12.1 133.1
Sum 1 6.1 54.9 530.5
E[X] = 1st moment about the origin = 6.1

E[X2 ] = 2nd moment about the origin = 54.9
E[X3 ] = 3rd moment about the origin = 530.5
Variance ≡ 2nd Central Moment = E[X2 ] - E[X]2 = 54.9 - 6.12 = 17.69.
3rd Central Moment ≡ E[(X - E[X])3 ] = E[X3 - 3X2 E[X] + 3XE[X]2 - E[X]3 ]
= E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 = 530.5 - (3)(6.1)(54.9) + (2)(6.13 ) = -20.2.
(Coefficient of) Skewness ≡ Third Central Moment / STDDEV3 = -20.2/4.2063 = -0.27.
The third central moment: µ3 ≡ E[ (X - E[X])3 ] = E[X3 ] - 3E[X] E[X2 ] + 2E[X]3 .

The (coefficient of) skewness is defined as the 3rd central moment divided by the cube
of the standard deviation = E[ (X - E[X])3 ] / STDDEV3 .
46
The coefficient of variation and kurtosis are others. See “Mahlerʼs Guide to Loss Distributions.”
In the above example, the skewness is -0.27.

A negative skewness indicates a curve skewed to the left.
The Binomial Distribution for q > 0.5 is skewed to the left.

For example, here is a Binomial Distribution with m = 10 and q = 0.7:
density
0.25
0.20
0.15
0.10
0.05
x
2 4 6 8 10
In contrast, the Binomial Distribution for q < 0.5 has positive skewness and is skewed to the right.
For example, here is a Binomial Distribution with m = 10 and q = 0.2:
density
0.30
0.25
0.20
0.15
0.10
0.05
x
2 4 6 8 10
The Poisson Distribution, the Negative Binomial Distribution (including the special case of the
Geometric Distribution), as well as most size of loss distributions, are skewed to the right; they have
a small but significant probability of very large values.
For example, here is a Geometric Distribution with β = 2:47
density
0.30
0.25
0.20
0.15
0.10
0.05
x
2 4 6 8 10
47
Even through only densities up to 10 are shown, the Geometric Distribution has support from zero to infinity.
As another example of a distribution skewed to the right, here is a Poisson Distribution with λ = 3:48
density
0.20
0.15
0.10
0.05
x
2 4 6 8 10
For the Poisson distribution the skewness is positive and therefore the distribution is skewed to the
right. However, as λ gets very large, the skewness of a Poisson approaches zero; in fact the
Poisson approaches a Normal Distribution.49
For example, here is a Poisson Distribution with λ = 30:
density
0.07
0.06
0.05
0.04
0.03
0.02
0.01
x
10 20 30 40 50 60
48
Even through only densities up to 10 are shown, the Poisson Distribution has support from zero to infinity.
49
This follows from the Central Limit Theorem and the fact that for integral N, a Poisson with parameter N is the sum
of N independent variables each with a Poisson distribution with a parameter of unity. The Normal Distribution is
symmetric and therefore has zero skewness.
A symmetric distribution has zero skewness.

Therefore, the Binomial Distribution for q = 0.5 and the Normal Distribution each have zero
skewness. For example, here is a Binomial Distribution with m = 10 and q = 0.5:
density
0.25
0.20
0.15
0.10
0.05
x
2 4 6 8 10
Binomial Distribution:
For a Binomial Distribution with m = 5 and q = 0.1:

0 59.049% 0.00000 0.00000 0.00000
1 32.805% 0.32805 0.32805 0.32805
2 7.290% 0.14580 0.29160 0.58320
3 0.810% 0.02430 0.07290 0.21870
4 0.045% 0.00180 0.00720 0.02880
5 0.001% 0.00005 0.00025 0.00125
Sum 1 0.50000 0.70000 1.16000
The mean is: 0.5 = (5)(0.1) = mq.

The variance is: E[X2 ] - E[X]2 = 0.7 - 0.52 = 0.45 = (5)(0.1)(0.9) = mq(1-q).
E[X3] - 3 E[X] E[X2] + 2 E[X]3 1.16 - (3)(0.7)(0.5) + 2 (0.53 )
The skewness is: =
σ3 0.45 3 / 2
0.8 1 - 2q
= 1.1925 = = .
0.45 m q (1 - q)
1 - 2q
For a Binomial Distribution, the skewness is: .
m q (1 - q)
Binomial Distribution with q < 1/2 ⇔ positive skewness ⇔ skewed to the right.
Binomial Distribution q = 1/2 ⇔ symmetric ⇒ zero skewness.
Binomial Distribution q > 1/2 ⇔ negative skewness ⇔ skewed to the left.

Poisson Distribution:
For a Poisson distribution with λ = 2.5:
Number Probability Probability x Square of Cube of Distribution
of Claims Density Function # of Claims # of Claims # of Claims Function
0 0.08208500 0.00000000 0.00000000 0.00000000 0.08208500
1 0.20521250 0.20521250 0.20521250 0.20521250 0.28729750
2 0.25651562 0.51303124 1.02606248 2.05212497 0.54381312
3 0.21376302 0.64128905 1.92386716 5.77160147 0.75757613
4 0.13360189 0.53440754 2.13763017 8.55052069 0.89117802
5 0.06680094 0.33400471 1.67002357 8.35011786 0.95797896
6 0.02783373 0.16700236 1.00201414 6.01208486 0.98581269
7 0.00994062 0.06958432 0.48709021 3.40963146 0.99575330
8 0.00310644 0.02485154 0.19881233 1.59049864 0.99885975
9 0.00086290 0.00776611 0.06989496 0.62905464 0.99972265
10 0.00021573 0.00215725 0.02157252 0.21572518 0.99993837
11 0.00004903 0.00053931 0.00593244 0.06525687 0.99998740
12 0.00001021 0.00012257 0.00147085 0.01765024 0.99999762
13 0.00000196 0.00002554 0.00033196 0.00431553 0.99999958
14 0.00000035 0.00000491 0.00006875 0.00096250 0.99999993
15 0.00000006 0.00000088 0.00001315 0.00019731 0.99999999
16 0.00000001 0.00000015 0.00000234 0.00003741 1.00000000
17 0.00000000 0.00000002 0.00000039 0.00000660 1.00000000
18 0.00000000 0.00000000 0.00000006 0.00000109 1.00000000
19 0.00000000 0.00000000 0.00000001 0.00000017 1.00000000
20 0.00000000 0.00000000 0.00000000 0.00000002 1.00000000
Sum 1.00000000 2.50000000 8.75000000 36.87500000
The mean is: 2.5 = λ. The variance is: E[X2 ] - E[X]2 = 8.75 - 2.52 = 2.5 = λ.
variance λ 1
The coefficient of variation = = = .
mean λ λ
E[X3] - 3 E[X] E[X2] + 2 E[X]3
The skewness is: =
σ3
36.875 - (3)(2.5)(8.75) + 2 (2.53) 1 1
= = .
2.5 3/ 2 2.5 λ
1
For the Poisson Distribution, the skewness is: . For the Poisson, Skewness = CV.
λ
Poisson Distribution ⇔ positive skewness ⇔ skewed to the right.

Negative Binomial Distribution:
For a Negative Binomial distribution with r = 3 and β = 0.4:

Number Probability Probability x Square of Cube of Distribution
of Claims Density Function # of Claims # of Claims # of Claims Function
0 0.36443149 0.00000000 0.00000000 0.00000000 0.36443149
1 0.31236985 0.31236985 0.31236985 0.31236985 0.67680133
2 0.17849705 0.35699411 0.71398822 1.42797644 0.85529839
3 0.08499860 0.25499579 0.76498738 2.29496213 0.94029699
4 0.03642797 0.14571188 0.58284753 2.33139010 0.97672496
5 0.01457119 0.07285594 0.36427970 1.82139852 0.99129614
6 0.00555093 0.03330557 0.19983344 1.19900062 0.99684707
7 0.00203912 0.01427382 0.09991672 0.69941703 0.99888619
8 0.00072826 0.00582605 0.04660838 0.37286706 0.99961445
9 0.00025431 0.00228880 0.02059924 0.18539316 0.99986876
10 0.00008719 0.00087193 0.00871926 0.08719255 0.99995595
11 0.00002944 0.00032386 0.00356244 0.03918682 0.99998539
12 0.00000981 0.00011777 0.00141320 0.01695839 0.99999520
13 0.00000324 0.00004206 0.00054677 0.00710805 0.99999844
14 0.00000106 0.00001479 0.00020706 0.00289887 0.99999950
15 0.00000034 0.00000513 0.00007697 0.00115454 0.99999984
16 0.00000011 0.00000176 0.00002815 0.00045038 0.99999995
17 0.00000004 0.00000060 0.00001015 0.00017251 0.99999998
18 0.00000001 0.00000020 0.00000361 0.00006501 0.99999999
19 0.00000000 0.00000007 0.00000127 0.00002414 1.00000000
20 0.00000000 0.00000002 0.00000044 0.00000885 1.00000000
Sum 1.00000000 1.19999999 3.11999977 10.79999502
The mean is: 1.2 = (3)(0.4) = rβ.

The variance is: E[X2 ] - E[X]2 = 3.12 - 1.22 = 1.68 = (3)(.4)(1.4) = rβ(1+β).
The third central moment is: E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 = 10.8 - (3)(1.2)(3.12) + 2 (1.23 ) =
3.024 = (1.8)(3)(.4)(1.4) = (1 + 2β)rβ(1 + β).50
1.8 1 + 2β
The skewness is: 3.024 / 1.683/2 = 1.389 = =
(3)(0.4 )(1.4 ) rβ(1 + β )
1 + 2β
For the Negative Binomial Distribution, the skewness is: .
rβ(1 + β )
Negative Binomial Distribution ⇔ positive skewness ⇔ skewed to the right.

50
For the Negative Binomial Distribution, 3σ2 - 2µ + 2(σ2 - µ)2 /µ = 3rβ(1+β) - 2rβ + 2{rβ(1+β) − rβ}2 /(rβ) =
rβ + 3rβ2 + 2rβ3 = rβ(1 + β)(1 + 2β) = third central moment.
This property of the Negative Binomial is discussed in Section 7.2 of Loss Models, not on the syllabus.
Problems:
8.1 (3 points) What is the skewness of the following frequency distribution?

0 0.02
1 0.04
2 0.14
3 0.31
4 0.36
5 0.13
A. less than -1.0
B. at least -1.0 but less than -0.5
C. at least -0.5 but less than 0
D. at least 0 but less than 0.5
E. at least 0.5
8.2 (2 points) A distribution has first moment = m, second moment about the origin = m + m2 , and
third moment about the origin = m + 3m2 + m3 .
Which of the following is the skewness of this distribution?
A. m B. m0.5 C. 1 D. m-0.5 E. m-1
8.3 (3 points) The number of claims filed by a commercial auto insured as the result of
at-fault accidents caused by its drivers is shown below:
Year Claims
2002 7
2001 3
2000 5
1999 10
1998 5
Calculate the skewness of the empirical distribution of the number of claims per year.
A. Less than 0.50
E. At least 1.25
8.4 (4 points) You are given the following distribution of the number of claims on 100,000 motor
vehicle comprehensive policies:
Number of claims Observed number of policies
0 88,585
1 10,577
2 779
3 54
4 4
5 1
6 or more 0
Calculate the skewness of this distribution.
A. 1.0 B. 1.5 C. 2.0 D. 2.5 E. 3.0
8.5 (4, 5/87, Q.33) (1 point) There are 1000 insurance policies in force for one year.
The results are as follows:
Number of Claims Policies
0 800
1 130
2 50
3 20
1000
Which of the following statements are true?
1. The mean of this distribution is 0.29.
2. The variance of this distribution is at least 0.45.
3. The skewness of this distribution is negative.
A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
8.6 (CAS3, 5/04, Q.28) (2.5 points) A pizza delivery company has purchased an automobile
liability policy for its delivery drivers from the same insurance company for the past five years.
The number of claims filed by the pizza delivery company as the result of at-fault accidents caused
by its drivers is shown below:
Year Claims
2002 4
2001 1
2000 3
1999 2
1998 15
Calculate the skewness of the empirical distribution of the number of claims per year.
A. Less than 0.50
E. At least 1.25
8.1. B. Variance = E[X2 ] - E[X]2 = 12.4 - 3.342 = 1.244.

skewness = {E[X3 ] - (3 E[X] E[X2 ]) + (2 E[X]3 )} / STDDEV3 =
{48.82 - (3)(3.34)(12.4) + (2) (3.343 )} / (1.1163 ) = -0.65.
0 2% 0 0 0.0
1 4% 0.04 0.04 0.0
2 14% 0.28 0.56 1.1
3 31% 0.93 2.79 8.4
4 36% 1.44 5.76 23.0
5 13% 0.65 3.25 16.2
Sum 1 3.34 12.4 48.82
8.2. D. σ2 = µ2 ′ - µ1 ′2 = (m + m2 ) - m2 = m.
skewness = {µ3 ′ - (3 µ1 ′ µ2 ′) + (2 µ1 ′3)} / σ3 =
{(m + 3m2 + m3 ) - 3(m + m2 )m + 2 m3 } / m3/2 = m- 0 . 5.

Comment: The moments are those of the Poisson Distribution with mean m.
8.3. B. E[X] = (7 + 3 + 5 + 10 + 5)/5 = 6. E[X2 ] = (72 + 32 + 52 + 102 + 52 )/5 = 41.6.

Var[X] = 41.6 - 62 = 5.6. E[X3 ] = (73 + 33 + 53 + 103 + 53 )/5 = 324.
Skewness = {E[X3 ] - 3 E[X2 ]E[X] + 2E[X]3 } / Var[X]1.5
= {324 - (3)(41.6)(6) + (2)(63 )}/5.61.5 = 7.2/13.25 = 0.54.
Comment: Similar to CAS3, 5/04, Q.28. E[(X- X )3 ] = (13 + (-3)3 + (-1)3 + 43 + (-1)3 )/5 = 7.2.
8.4. E. E[X] = 12318/100000 = 0.12318. E[X2 ] = 14268/100000 = 0.14268.

Number Number of Contribution to Contribution to Contribution to
of Claims Policies First Moment Second Moment Third Moment
0 88,585 0 0 0
1 10,577 10577 10577 10577
2 779 1558 3116 6232
3 54 162 486 1458
4 4 16 64 256
5 1 5 25 125
Total 100,000 12,318 14,268 18,648
Var[X] = 0.14268 - 0.123182 = 0.12751.
E[X3 ] = 18648/100000 = 0.18648.
Third Central Moment = E[X3 ] - 3 E[X]E[X2 ] + 2E[X]3
= 0.18648 - (3)(0.12318)(0.14268) + (2)(0.123183 ) = 0.13749.
Skewness = (Third Central Moment) / Var[X]1.5 = 0.13749 / 0.127511.5 = 3.02.
Comment: Data taken from Table 5.9.1 in Introductory Statistics with Applications in General
Insurance by Hossack, Pollard and Zehnwirth.
8.5. A. 1. True. The mean is {(0)(800) + (1)(130) + (2)(50) + (3)(20)} / 1000 = 0.290.
2. False. The second moment is {(02)(800) + (12)(130) + (22)(50) + (32)(20)} / 1000 = 0.510.
Thus the variance = 0.510 - 0.292 = 0.4259.
3. False. The distribution is skewed to the right and thus of positive skewness. The third moment is:
{(03 )(800) + (13 )(130) + (23 )(50) + (33 )(20)} / 1000 = 1.070.
Therefore, skewness = {µ3 ′ - (3 µ1 ′ µ2 ′) + (2 µ1 ′3)} / STDDEV3 =
{1.070 - (3)(.29)(.51) +(2)(.293 )} / .278 = 2.4 > 0.
8.6. E. E[X] = (4 + 1 + 3 + 2 + 15)/5 = 5. E[X2 ] = (42 + 12 + 32 + 22 + 152 )/5 = 51.

Var[X] = 51 - 52 = 26. E[X3 ] = (43 + 13 + 33 + 23 + 153 )/5 = 695.
Skewness = {E[X3 ] - 3 E[X2 ]E[X] + 2E[X]3 } / Var[X]1.5
= {695 - (3)(51)(5) + (2)(53 )}/261.5 = 180/133.425 = 1.358.
Alternately, the third central moment is:
{(4 - 5)3 + (1 - 5)3 + (3 - 5)3 + (2 - 5)3 + (15 - 5)3 }/5 = 180. Skewness = 180/261.5 = 1.358.
2016-C-1, Frequency Distributions, §9 Prob. Generating Func. HCM 10/21/15, Page 164
Section 9, Probability Generating Functions51
The Probability Generating Function, p.g.f., is useful for working with frequency distributions.52
∞
P(z) = Expected Value of zn = E[zn ] = ∑ f(n) zn .
n=0
Note that as with other generating functions, there is a dummy variable, in this case z.
Exercise: Assume a distribution with 1-q chance of 0 claims and q chance of 1 claim. (This a Bernoulli
distribution with parameter q.) What is the probability generating function?
[Solution: P(z) = E[zn ] = (1-q)(z0 ) + q(z1 ) = 1 + q(z-1).]
The Probability Generating Function of the sum of independent frequencies is the

product of the individual Probability Generating Functions.
Specifically, if X and Y are independent random variables, then

PX+Y(z) = E[zx+y] = E[zx zy] = E[zx]E[zy] = PX(z)PY(z).
Exercise: What is the probability generating function of the sum of two independent Bernoulli
distributions each with parameter q?
[Solution: It is the product of the probability generating functions of each Bernoulli:
{1 + q(z-1)}2 . Alternately, one can compute that for the sum of the two Bernoulli there is:
(1-q)2 chance of zero claims, 2q(1-q) chance of 1 claims and q2 chance of 2 claims.
Thus P(z) = (1-q)2 z0 + 2q(1-q)z1 + q2 z2 = 1 - 2q +q2 + 2qz + 2q2 z + q2 z2 =
1 + 2q(z-1) + (z2 - 2z +1)q2 = {1 + q(z-1)}2 .]
As discussed, a Binomial distribution with parameters q and m is the sum of m independent

Bernoulli distributions each with parameter q. Therefore the probability generating function of a
Binomial distribution is that of the Bernoulli, to the power m: {1 + q(z-1)}m.
The probability generating functions, as well as much other useful information

on each frequency distribution, are given in the tables attached to the exam.
51
See Definition 3.8 in Loss Models.
52
The Probability Generating Function is similar to the Moment Generating Function: M(z) = E[ezn].
See “Mahlerʼs Guide to Aggregate Distributions.” They are related via P(z) = M(ln(z)). They share many properties.
Loss Models uses the Probability Generating Function when dealing with frequency distributions.
Densities:
The distribution determines the probability generating function and vice versa.
Given a p.g.f., one can obtain the probabilities by repeated differentiation as follows:
⎛ d n P(z) ⎞
⎜ ⎟
⎝ dz n ⎠ z = 0
f(n) = .
n!
f(0) = P(0).53 f(1) = Pʼ(0). f(2) = Pʼʼ(0)/2. f(3) = Pʼʼʼ(0)/6. f(4) = Pʼʼʼʼ(0)/24.
Exercise: Given the probability generating function: eλ(z-1), what is the probability of three claims?
[Solution: P(z) = eλ(z-1) = eλze−λ. Pʼ(z) = λeλze−λ. Pʼʼ(z) = λ2ezλe−λ. Pʼʼʼ(z) = λ3ezλe−λ.
f(3) = (d3 P(z) / dz3 )z=0 / 3! = (d3 eλ(z-1) / dz3 )z=0 / 3! = λ3e0λe−λ / 3! = λ3e−λ / 3!.
Note that this is the p.g.f. of a Poisson Distribution with parameter lambda, and this is indeed the
probability of 3 claims for a Poisson. ]
Alternately, the probability of n claims is the coefficient of zn in the p.g.f.
So for example, given the probability generating function: eλ(z-1) = eλze−λ =

∞ ∞ -λ i
∑ i! = ∑ e i! λ zi . Thus for this p.g.f., f(i) = e−λ λi/ i!,
(λz)i
e−λ
i=0 i=0
which is the density function of a Poisson distribution.
Mean:
∞ ∞
P(z) = ∑ f(n)zn . ⇒ P(1) = ∑ f(n) = 1.
n=0 n=0
∞ ∞ ∞
Pʼ(z) = ∑ f(n)nzn-1. ⇒ Pʼ(1) = ∑ f(n)n = ∑ f(n)n = Mean.
n=1 n=1 n=0
Pʼ(1) = Mean.54
53
As z → 0, zn → 0 for n > 0. Therefore, P(z) = Σ f(n) zn → f(0) as z → 0.
54
This is a special case of a result discussed subsequently in the section on factorial moments.
Proof of Results for Adding Distributions:
One can use the probability generating function, in order to determine the results of adding Poisson,
Binomial, or Negative Binomial Distributions.
Assume one has two independent Poisson Distributions with means λ1 and λ2.
The p.g.f. of a Poisson is P(z) = eλ(z-1).

The p.g.f. of the sum of these two Poisson Distributions is the product of the p.g.f.s of the two
Poisson Distributions: exp[λ1(z-1)]exp[λ2(z-1)] = exp[(λ1 + λ2)(z-1)].
This is the p.g.f. of a Poisson Distribution with mean λ1 + λ 2.
In general, the sum of two independent Poisson Distributions is also Poisson with mean equal to the
sum of the means.
Similarly, assume we are summing two independent Binomial Distributions with parameters m1 and
q, and m2 and q. The p.g.f. of a Binomial is P(z) = {1 + q(z-1)}m.
The p.g.f. of the sum is: {1 + q(z-1)}m1 {1 + q(z-1)}m2 = {1 + q(z-1)}m1 + m2.

This is the p.g.f. of a Binomial Distribution with parameters m1 + m2 and q.
In general, the sum of two independent Binomial Distributions with the same q parameter is also
Binomial with parameters m1 + m2 and q.
Assume we are summing two independent Negative Binomial Distributions with parameters r1 and
β, and r2 and β. The p.g.f. of the Negative Binomial is P(z) = {1 - β(z-1)}-r.
The p.g.f. of the sum is: {1 - β(z-1)}-r1 {1 - β(z-1)}-r2 = {1 - β(z-1)}-(r1 + r2).

This is the p.g.f. of a Negative Binomial Distribution with parameters r1 + r2 and β.
In general, the sum of two independent Negative Binomial Distributions with the same β parameter
is also Negative Binomial with parameters r1 + r2 and β.
Infinite Divisibility:55
If a distribution is infinitely divisible, then if one takes the probability generating function to any
positive power, one gets the probability generating function of another member of the same family
of distributions.56
For example, for the Poisson P(z) = eλ(z-1). If we take this p.g.f. to the power ρ > 0,
P(z)ρ = eρλ(z-1), which is the p.g.f. of a Poisson with mean ρλ.
The p.g.f. of a sum of r independent identically distributed variables, is the individual p.g.f. to the
power r. Since for the Geometric P(z) = 1/{1- β(z-1)}, for the Negative Binomial distribution:
P(z) = {1- β(z-1)}-r, for r > 0, β > 0.
Exercise: P(z) = {1- β(z-1)}-r, for r > 0, β > 0.

Is the corresponding distribution infinitely divisible?
[Solution: P(z)ρ = {1- β(z-1)}-ρr. Which is of the same form, but with ρr rather than r. Thus the
corresponding Negative Binomial Distribution is infinitely divisible.]
Infinitely divisible distributions include: Poisson, Negative Binomial, Compound Poisson,

Compound Negative Binomial, Normal, Gamma, and Inverse Gaussian.57
Exercise: P(z) = {1 + q(z-1)}m, for m a positive integer and 0 < q < 1.

Is the corresponding distribution infinitely divisible?
[Solution: P(z)ρ = {1 + q(z-1)}ρm. Which is of the same form, but with ρm rather than m. However,
unless ρ is integral, ρm is not. Thus the corresponding distribution is not infinitely divisible. This is a
Binomial Distribution. While Binomials can be added up, they can not be divided into pieces smaller
than a Bernoulli Distribution.]
If a distribution is infinitely divisible, and one adds up independent identically distributed random
variables, then one gets a member of the same family. As has been discussed this is the case for
the Poisson and for the Negative Binomial.
55
See Definition 7.6 of Loss Models not on the syllabus, and in Section 9.2 of Loss Models on the syllabus.
56
One can work with either the probability generating function, the moment generating function, or the characteristic
function.
57
Compound Distributions will be discussed in a subsequent section.
In particular infinitely divisible distributions are preserved under a change of exposure.58 One can find
a distribution of the same type such that when one adds up independent identical copies they add
up to the original distribution.
Exercise: Find a Poisson Distribution, such that the sum of 5 independent identical copies will be a
Poisson Distribution with λ = 3.5.
[Solution: A Poisson Distribution with λ = 3.5/5 = 0.7.]
Exercise: Find a Negative Binomial Distribution, such that the sum of 8 independent identical copies
will be a Negative Binomial Distribution with β = .37 and r = 1.2.
[Solution: A Negative Binomial Distribution with β = .37 and r = 1.2/8 = .15.]
Distribution Probability Generating Function, P(z)59 Infinitely Divisible
Binomial {1 + q(z-1) }m No60
Poisson eλ(z-1) Yes
r
⎛ 1 ⎞
Negative Binomial ⎜ ⎟ , z < 1 + 1/β Yes
⎝ 1 - β(z -1)⎠
58
See Section 7.4 of Loss Models, not on the syllabus.
59
As shown in Appendix B, attached to the exam.
60
Since m is an integer.
A(z):61
Let an = Prob[x > n].

∞
Define A(z) = ∑ an zn .
n=0
∞ ∞ ∞ ∞
(1 - z) A(z) = ∑ an zn - ∑ an zn+ 1 = a0 - ∑ (an - 1 - an)zn = 1 - p0 - ∑ pn zn = 1 - P(z).
n=0 n=0 n=1 n=1
1 - P(z)
Thus A(z) = .
1 - z
∑ an = A(1) = E[X].
P(z) - 1
P(1) = 1. Therefore, as z → 1, A(z) = → Pʼ(1) = E[X]. Thus
z - 1
n=0
This is analogous to the result that the mean is the integral of the survival function from 0 to ∞.
1
For example, for a Geometric Distribution, P(z) = .
1 - β(z - 1)
1 - P(z) -β(z - 1) 1 β
Thus A(z) = = = . ⇒ A(1) = β = mean.
1 - z 1 - β(z - 1) 1 - z 1 + β - βz
Now in general, 1 / (1 - x/c) = 1 + (x/c) + (x/c)2 + (x/c)3 + (x/c)4 + ....

β β 1
Thus A(z) = = =
1 + β - βz 1 + β 1 - zβ / (1+ β)
β
{1 + {zβ/(1+β)} + {zβ/(1+β)}2 + {zβ/(1+β)}3 + {zβ/(1+β)}4 + ....} =
1 + β
β β β β β
+z( )2 + z2 ( )3 + z3 ( )4 + z4 ( )5 + ...
1 + β 1 + β 1 + β 1 + β 1 + β
β
Thus matching up coefficients of zn , we have: an = ( )n+1.
1 + β
β
Thus for the Geometric, Prob[x > n] = ( )n+1, a result that has been discussed previously.
1 + β
61
See Exercise 6.34 in the third Edition of Loss Models, not on the syllabus.
Problems:
9.1 (2 points) The number of claims, N, made on an insurance portfolio follows the following
distribution:
n Pr(N=n)
0 0.35
1 0.25
2 0.20
3 0.15
4 0.05
What is the Probability Generating Function, P(z)?
A. 1 + 0.65z + 0.4z2 + 0.2z3 + 0.05z4
B. 0.35 + 0.6z + 0.8z2 + 0.95z3 + z4
C. 0.35 + 0.25z + 0.2z2 + 0.15z3 + 0.05z4
D. 0.65 + 0.75z + 0.8z2 + 0.85z3 + 0.95z4
E. None of A, B, C, or D.
9.2 (1 point) For a Poisson Distribution with λ = 0.3,

what is the Probability Generating Function at 5?
A. less than 3
E. at least 6
9.3 (1 point) Which of the following distributions is not infinitely divisible?

A. Binomial B. Poisson C. Negative Binomial
D. Normal E. Gamma
e0.4z - 1
9.4 (3 points) Given the Probability Generating Function, P(z) = , what is the density at 3
e0.4 - 1
for the corresponding frequency distribution?
A. 1/2% B. 1% C. 2% D. 3% E. 4%
9.5 (5 points) You are given the following data on the number of runs scored during half innings of
major league baseball games from 1980 to 1998:
Runs Number of Occurrences
0 518,228
1 105,070
2 47,936
3 21,673
4 9736
5 4033
6 1689
7 639
8 274
9 107
10 36
11 25
12 5
13 7
14 1
15 0
16 1
Total 709,460
WIth the aid of computer, from z = -2.5 to z = 2.5, graph P(z) the probability generating function of
the empirical model corresponding to this data.
9.6 (2 points) A variable B has probability generating function P(z) = 0.8z2 + 0.2z4 .
A variable C has probability generating function P(z) = 0.7z + 0.3z5 .
B and C are independent.
What is the probability generating function of B + C.
A. 1.5z3 + 0.9z5 + 1.1z7 + 0.5z9
B. 0.25z3 + 0.25z5 + 0.25z7 + 0.25z9
C. 0.06z3 + 0.24z5 + 0.14z7 + 0.56z9
D. 0.56z3 + 0.14z5 + 0.24z7 + 0.06z9
9.7 (1 point) Given the Probability Generating Function, P(z) = 0.5z + 0.3z2 + 0.2z4 , what is the
density at 2 for the corresponding frequency distribution?
A. 0.2 B. 0.3 C. 0.4 D. 0.5 E. 0.6
9.8 (1 point) For a Binomial Distribution with m = 4 and q = 0.7, what is the Probability Generating
Function at 10?
A. less than 1000
E. at least 2500
9.9 (1 point) N follows a Poisson Distribution with λ = 5.6. Determine E[3N].

A. 10,000 B. 25,000 C. 50,000 D. 75,000 E. 100,000
9.10 (7 points) A frequency distribution has P(z) = 1 - (1-z)-r,

where r is a parameter between 0 and -1.
(a) (3 points) Determine the density at 0, 1, 2, 3, etc.
(b) (1 point) Determine the mean.
∞
(c) (2 points) Let an = Prob[x > n]. Define A(z) = ∑ an zn .
n=0
1 - P(z)
Show that in general, A(z) = .
1 - z
(d) (2 points) Using the result in part (c), show that for this distribution,
⎛ n + r⎞
an = ⎜ ⎟ = (r+1) (r+2) .... (r+n) / n!.
⎝ n ⎠
9.11 (3 points) For a Binomial Distribution with m = 2 and q = 0.3:

(a) From the definition, determine the form of the probability generating function, P(z).
(b) Confirm that the result in (a) matches the form given in Appendix A of Loss Models.
(c) Using P(z), recover the densities.
9.12 (2 points) X1 , X2 , and X3 are independent, identically distributed variables.

X1 + X2 + X3 is Poisson. Prove that X1 , X2 , and X3 are each Poisson.
5
⎛ 0.25 ⎞
9.13 (3 points) Given the Probability Generating Function, P(z) = ,
⎝ 1 - 0.75z⎠
what is the density at 4 for the corresponding frequency distribution?

A. 1/2% B. 1% C. 2% D. 3% E. 4%
2z3 / 9
9.14 (3 points) Given the Probability Generating Function, P(z) = ,
(1 - z / 3)(1 - 2z / 3)
what is the mean of the corresponding frequency distribution?

A. 5.5 B. 6.0 C. 6.5 D. 7.0 E. 7.5
9.15 (2, 2/96, Q.15) (1.7 points) Let X1 ,..., Xn be independent Poisson random variables with
n
expectations λ1, . . . , λn , respectively. Let Y = ∑ cXi , where c is a constant.
i =1
Determine the probability generating function of Y.

n
A. exp[(zc + z2 c2 /2) ∑ λ i ]
i=1
n
B. exp[(zc - 1) ∑ λ i ]
i=1
n n
C. exp[zc ∑ λ i + (z2 c2 /2) ∑ λ i2 ]
i=1 i=1
n
D. exp[(zc - 1) ∑ λ i ]
i=1
n
E. (zc - 1)n ∏ λ i
i =1
9.16 (IOA 101, 4/00, Q.10) (4.5 points) Under a particular model for the evolution of the size of
a population over time, the probability generating function of Xt , the size at time t, is given by:
z + λt (1 - z)
P(z) = , λ > 0.
1 + λt (1 - z )
If the population dies out, it remains in this extinct state for ever.
(i) (2.25 points) Determine the expected size of the population at time t.
(ii) (1.5 points) Determine the probability that the population has become extinct by time t.
(iii) (0.75 points) Comment briefly on the future prospects for the population.
9.17 (IOA 101, 9/01, Q.2) (1.5 points) Let X1 and X2 be independent Poisson random variables
with respective means µ1 and µ2. Determine the probability generating function of X1 + X2 and
hence state the distribution of X1 + X2 .
9.1. C. P(z) = E[zn ] = (0.35)(z0 ) + (0.25)(z1 ) + (0.20)(z2 ) + (0.15)(z3 ) + (0.05)(z4 ) =

0.35 + 0.25z + 0.2z2 + 0.15z3 + 0.05z4 .
9.2. B. As shown in the Appendix B attached to the exam, for a Poisson Distribution
P(z) = eλ(z-1). P(5) = e4λ = e1.2 = 3.32.
9.3. A. The Binomial is not infinitely divisible.

Comment: In the Binomial m is an integer. For m =1 one has a Bernoulli. One can not divide a
Bernoulli into smaller pieces.
9.4. C. P(z) = (e0.4z - 1)/(e0.4 - 1). Pʼ(z) = 0.4e0.4z/(e0.4 - 1). Pʼʼ(z) = 0.16e0.4z/(e0.4 - 1).
Pʼʼʼ(z) = 0.064e0.4z/(e0.4 - 1). f(3) = (d3 P(z) / dz3 )z=0 / 3! = (0.064/(e0.4 - 1))/6 = 2.17%.
Comment: This is a zero-truncated Poisson Distribution with λ = 0.4.
9.5. P(z) = {518,228 + 105,070 z + 47,936 z2 + ... + z16} / 709,460.

Here is a graph of P(z):
PGF
3.0
2.5
2.0
1.5
1.0
0.5
z
-2 -1 1 2
Comment: For example, P(-2) = 0.599825, and P(2) = 2.73582.
9.6. D. The probability generating function of a sum of independent variables is the product of the
probability generating functions.
PB+C(z) = PB(z)PC(z) = (0.8z2 + 0.2z4 )(0.7z + 0.3z5 ) = 0.56z3 + 0.14z5 + 0.24z7 + 0.06z9 .
Alternately, B has 80% probability of being 2 and 20% probability of being 4.
C has 70% probability of being 1 and 30% probability of being 5.
Therefore, B + C has: (80%)(70%) = 56% chance of being 1+ 2 = 3,
(20%)(70%) = 14% chance of being 4 + 1 = 5, (80%)(30%) = 24% chance of being 2 + 5 = 7,
and (20%)(30%) = 6% chance of being 4 + 5 = 9.
⇒ PB+C(z) = (0.8z2 + 0.2z4 )(0.7z + 0.3z5 ) = 0.56z3 + 0.14z5 + 0.24z7 + 0.06z9 .
Comment: An example of a convolution.
9.7. B. P(z) = Expected Value of zn = Σ f(n) zn . Thus f(2) = 0.3.
Alternately, P(z) = 0.5z + 0.3z2 + 0.2z4 . Pʼ(z) = 0.5 + 0.6z + 0.8z3 . Pʼʼ(z) = 0.6 + 2.4z3 .
f(2) = (d2 P(z) / dz2 )z=0 / 2! = 0.6/2 = 0.3.
9.8. E. As shown in the Appendix B attached to the exam, for a Binomial Distribution
P(z) = {1 + q(z-1)}m = {1 + (.7)(z-1)}4 . P(10) = {1 + (.7)(9)}4 = 2840.
9.9. D. The p.g.f. of the Poisson Distribution is: P(z) = eλ(z-1) = e5.6(z-1).
E[3N] = P(3) = e5.6(3-1) = e11.2 = 73,130.
9.10. (a) f(0) = P(0) = 0.

Pʼ(z) = -r (1-z)-(r+1). f(1) = Pʼ(0) = -r. 0
Pʼʼ(z) = -r (r+1)(1-z)-(r+1). f(2) = Pʼʼ(0)/2 = -r(r+1)/2.
Pʼʼʼ(z) = -r (r+1)(r+2)(1-z)-(r+1). f(3) = Pʼʼʼ(0)/3! = -r(r+1)(r+2)/6.
Γ[x +r]
f(x) = -r(r+1) ... (r+x - 1)/x! = - , x = 1, 2, 3, ...
Γ[x +1] Γ[r]
(b) Mean = Pʼ(1) = infinity. The densities go to zero too slowly; thus there is no finite mean.
∞ ∞ ∞ ∞
(c) (1 - z) A(z) = ∑ an zn - ∑ an zn+ 1 = a0 - ∑ (an - 1 - an)zn = 1 - p0 - ∑ pn zn = 1 - P(z).
n=0 n=0 n=1 n=1
1 - P(z)
Thus A(z) = .
1 - z
∞
⎛ n +r⎞ n
(d) Thus for this distribution A(z) = (1-z)-r / (1-z) = (1-z)-(r+1) = ∑ ⎜⎝ n ⎟⎠
z , from the Taylor series.
n=0
∞
⎛ n + r⎞
Thus since A(z) = ∑ an zn , an = ⎜⎝ n ⎟⎠
= (r+1) (r+2) .... (r+n) / n!.
n=0
Comment: This is called a Sibuya frequency distribution.

It is the limit of an Extended Zero-Truncated Negative Binomial Distribution, as β → ∞.
See Exercises 6.7 and 8.32 in Loss Models.
For r = -0.7, the densities at 1 through 10: 0.7, 0.105, 0.0455, 0.0261625, 0.0172672, 0.0123749,
0.00936954, 0.00737851, 0.00598479, 0.00496738.
For r = -0.7, Prob[n > 10] = (0.3)(1.3) ... (9.3) / 10! = 0.065995.
∞
∑ an = A(1) = E[X].
1 - P(z)
P(1) = 1. Therefore, as z → 0, A(z) = → Pʼ(1) = E[X]. Thus
1 - z
n=0
This is analogous to the result that the mean is the integral of the survival function from 0 to ∞.
For this distribution, Mean = A(1) = (1 - 1)-(r+1) = ∞.
9.11. (a) f(0) = 0.72 = 0.49. f(1) = (2)(0.3)(0.7) = 0.42. f(2) = 0.32 = 0.09.
P(z) = E[zn ] = f(0) z0 + f(1) z1 + f(2) z2 = 0.49 + 0.42z + 0.09z2 .
(b) P(z) = {1 + q(z-1)}m = {1 + (0.3)(z-1)}2 = (0.7 + 0.3z)2 = 0.49 + 0.42z + 0.09z2 .
⎛ dn P(z)⎞
⎝ dzn ⎠ z = 0
(c) f(n) = .
n!
f(0) = P(0) = 0.49.

Pʼ(z) = 0.42 + 0.18 z. f(1) = Pʼ(0) = 0.42.
Pʼʼ(z) = 0.18. f(2) = Pʼʼ(0)/2 = 0.09.
Pʼʼ(z) = 0. f(3) = Pʼʼʼ(0)/6 = 0.
Comment: Since the Binomial has finite support, P(z) has a finite number of terms.
9.12. Since X1 , X2 , and X3 are identically distributed variables, they have the same p.g.f.: PX(z).
Let Y = X1 + X2 + X3 .
Then since X1 , X2 , and X3 are independent, PY(z) = PX(z) PX(z) PX(z) = PX(z)3 .
The probability generating function of Y is that of a Poisson: eλ(z-1) = PX(z)3 .
⇒ PX(z) = e(λ/3)(z-1). ⇒ X is Poisson with mean 1/3 of that of Y.
r r
⎛ 1 ⎞ ⎛ 1/ (1+ β) ⎞
9.13. C. For the Negative Binomial, P(z) = ⎜ ⎟ =⎜ ⎟ .
⎝ 1 - β(z -1)⎠ ⎝ 1 - z β / (1+ β)⎠
5
⎛ 0.25 ⎞
Matching this to the given p.g.f. of ,
⎝ 1 - 0.75z⎠
this is a Negative Binomial Distribution with r = 5 and β = 3.

(5)(6)(7)(8) 34
Thus f(4) = = 0.02163.
4! (1+ 3)4 + 5
0.255
Alternately, Pʼ(z) = (-5)(-0.75) = 0.0036621 / (1 - 0.75z)6 .
(1 - 0.75z)6
Pʼʼ(z) = (-6)(-0.75) (0.0036621) / (1 - 0.75z)7 = 0.0164795 / (1 - 0.75z)7 .

Pʼʼʼ(z) = (-7)(-0.75) (0.0164795) / (1 - 0.75z)8 = 0.0865173 / (1 - 0.75z)8 .
Pʼʼʼʼ(z) = (-8)(-0.75) (0.0865173) / (1 - 0.75z)9 = 0.5191040 / (1 - 0.75z)9 .
f(4) = Pʼʼʼʼ(0) / 4! = 0.5191040 / 24 = 0.02163.
3z2 (1 - z / 3)(1 - 2z / 3) - z 3 (-1/ 3)(1 - 2z / 3) - z3 (-2 / 3)(1 - z / 3)

9.14. A. Pʼ(z) = (2/9) .
{(1 - z / 3)(1 - 2z / 3)}2
3 (1 - 1/ 3)(1 - 2 / 3) - (-1/ 3)(1 - 2 / 3) - (-2 / 3)(1 - 1/ 3)

mean = Pʼ(1) = (2/9) = 5.5.
{(1 - 1/ 3)(1 - 2 / 3)}2
Comment: Assume we have three different coupons equally likely to be on a given bag of chips.
Then this is the probability generating function for the number of bags one has to buy in order to get
a complete collection of coupons.
This p.g.f. is the product of the probability generating functions for three zero-truncated Geometric
Distributions, with β = 0, β = 1/2, and β = 2.
The mean of the sum of these three independent zero-truncated Geometrics is:
(1 + 0) + (1 + 1/2) + (1 + 2) = 5.5.
9.15. D. For each Poisson, the probability generating function is: P(z) = exp[λi(z-1)].
The definition of the probability generating function is PN(z) = E[zn ]. Here we take n = cx.
Multiplying a variable by a constant c: PcX[z] = E[zcx] = E[(zc)x] = PX[zc].
Thus for each Poisson times c, the p.g.f. is: exp[λi(zc - 1)].
n
The p.g.f. of the sum of variables is a product of the p.g.f.s: PY(z) = exp[(zc - 1) ∑ λ i ].
i=1
Comment: Multiplying a Poisson variable by a constant does not result in another Poisson; rather it
results in what is called an Over-Dispersed Poisson Distribution.
Since Var[cX]/E[cX] = cVar[X]/E[X], for a constant > 1, the Over-Dispersed Poisson Distribution
has a variance greater than it mean. See for example “A Primer on the Exponential Family of
Distributions”, by David R. Clark and Charles Thayer, CAS 2004 Discussion Paper Program.
9.16. (i) Pʼ(z) = ({1-λt}{1 + λt(1-z)} + λt{z + λt(1-z)}) / {1 + λt(1-z)}2 = 1 / {1 + λt(1- z)}2 .
E[X] = Pʼ(1) = 1. The expected size of the population is 1 regardless of time.
(ii) f(0) = P(0) = λt/(1 + λt). This is the probability of extinction by time t.
The probability of survival to time t is: 1 - λt/(1 + λt) = 1/(1 + λt) = (1/λ)/{(1/λ) + t),
the survival function of a Pareto Distribution with α = 1 and θ = 1/λ.
(iii) As t approaches infinity, the probability of survival approaches zero.
Comment: Pʼʼ(z) = 2λt/{1 + λt(1- z)}3 . E[X(X -1)] = Pʼʼ(1) = 2λt.
⇒ E[X2 ] = E[X] + 2λt = 1 + 2λt. ⇒ Var[X] = 1 + 2λt - 12 = 2λt.

9.17. P1 (z) = exp[µ1(z-1)]. P2 (z) = exp[µ2(z-1)].

Since X1 and X2 are independent, the probability generating function of X1 + X2 is:
P1 (z)P2 (z) = exp[µ1(z-1) + µ2(z-1)] = exp[(µ1 + µ2)(z-1)].
This is the probability generating function of a Poisson with mean µ1 + µ2, which must therefore be
the distribution of X1 + X2 .
2016-C-1, Frequency Distributions, §10 Factorial Moments HCM 10/21/15, Page 180
Section 10, Factorial Moments
When working with frequency distributions, in addition to moments around the origin and central
moments, one sometimes uses factorial moments. The nth factorial moment is the expected value
of the product of the n factors: X(X-1) .. (X+1-n).
µ( n ) = E[X(X-1) ... (X+1-n)].62
So for example, µ(1) = E[X], µ(2) = E[X(X-1)], µ(3) = E[X(X-1)(X-2)].
Exercise: What is the second factorial moment of a Binomial Distribution with parameters
m = 4 and q = 0.3?
[Solution: The density function is:
f(0) = 0.74 , f(1) = (4)(0.73 )(0.3), f(2) = (6)(0.72 )(0.32 ), f(3) = (4)(0.7)(0.3)3 , f(4) = 0.34 .
E[X(X-1)] = (0)(-1)f(0) + (1)(0)f(1) + (2)(1)f(2) + (3)(2)f(3) + (4)(3)f(4) =
(12)(0.72 ) (0.32 ) + (24)(0.7)(0.33 ) + 12(0.34 ) = 12(0.32 ){(0.72 ) + (2)(0.7)(0.3) + (0.32 )} =
(12)(0.32 )(0.7 + 0.3)2 = (12)(0.32 ) = 1.08.]
The factorial moments are related to the moments about the origin as follows:63
µ(1) = µ1 ′ = µ
µ(2) = µ2 ′ - µ1 ′
µ(3) = µ3 ′ - 3µ2 ′ + 2µ1 ′
µ(4) = µ4 ′ - 6µ3 ′ + 11µ2 ′ - 6µ1 ′
The moments about the origin are related to the factorial moments as follows:
µ1 ′ = µ(1) = µ
µ2 ′ = µ(2) + µ(1)
µ3 ′ = µ(3) + 3µ(2) + µ(1)
µ4 ′ = µ(4) + 6 µ(3) + 7µ(2) + µ(1)
Note that one, can use the factorial moments to compute the variance, etc.
62
See the first page of Appendix B of Loss Models.
63
Moments about the origin are sometimes referred to as “raw moments.”
For example for a Binomial Distribution with m = 4 and q = 0.3, the mean is mq = 1.2, while the
second factorial moment was computed to be 1.08. Thus the second moment around the origin is
µ2 ′ = µ(2) + µ (1) = µ(2) + µ = 1.08 + 1.2 = 2.28. Thus the variance is 2.28 - 1.22 = 0.84.
This in fact equals mq(1-q) = (4)(0.3)(0.7) = 0.84.
In general the variance (the second central moment) is related to the factorial moments as follows:
variance = µ2 = µ2 ′ - µ1 ′2 = µ(2) + µ (1) - µ(1)2.
Using the Probability Generating Function to Get Factorial Moments:
One can use the Probability Generating Function to get the factorial moments.
To get the nth factorial moment, one differentiates the p.g.f. n times and sets z =1:
⎛ d n P(z) ⎞
µ( n ) = ⎜ ⎟ = Pn (1).
⎝ dzn ⎠ z = 1
So for example, µ(1) = E[X] = Pʼ(1), and µ(2) = E[X(X-1)] = Pʼʼ(1).64
Exercise: Given that the p.g.f. of a Poisson Distribution is eλ(z-1), determine its first four factorial
moments.
[Solution: P(z) = eλ(z-1) = eλze−λ. Pʼ(z) = λeλze−λ. Pʼʼ(z) = λ2ezλe−λ. Pʼʼʼ(z) = λ3ezλe−λ
Pʼʼʼʼ(z) = λ4ezλe−λ. µ(1) = Pʼ(1) = λe1λe−λ = λ. µ(2) = Pʼʼ(1) = λ2eλe−λ = λ2.
µ(3) = Pʼʼʼ(1) = λ3e1λe−λ = λ3. µ(4) = Pʼʼʼʼ(1) = λ4.
Comment: For the Poisson Distribution, µ(n) = λn .]
Exercise: Using the first four factorial moments of a Poisson Distribution, determine the first four
moments of a Poisson Distribution.
[Solution: µ1 ′ = µ(1) = λ.
µ2 ′ = µ(2) + µ(1) = λ2 + λ.
µ3 ′ = µ(3) + 3µ(2) + µ (1) = λ3 + 3λ2 + λ.
µ4 ′ = µ(4) + 6µ(3) + 7µ(2) + µ(1) = λ4 + 6λ3 + 7λ2 + λ.]
64
Exercise 6.1 in Loss Models.
Exercise: Using the first four moments of a Poisson Distribution, determine its coefficient of variation,
skewness, and kurtosis.
[Solution: variance = µ2 ′ - µ1 ′2 = λ 2 + λ - λ2 = λ.
Coefficient of variation = standard deviation / mean = λ / λ = 1/ λ .
third central moment = µ3 ′ - 3µ1 ′µ2 ′ + 2µ1 ′3 = λ3 + 3λ2 + λ - 3λ(λ2 + λ) + 2λ3 = λ.
skewness = third central moment / variance1.5 = λ/λ1.5 = 1/ λ .
fourth central moment = µ4 ′ - 4µ1 ′µ3 ′ + 6µ1 ′2µ2 ′ - 3µ1 ′4 =
λ 4 + 6λ3 + 7λ2 + λ - 4λ(λ3 + 3λ2 + λ) + 6λ2 (λ2 + λ) - 3λ4 = 3λ2 + λ.
kurtosis = fourth central moment / variance2 = (3λ2 + λ)/λ2 = 3 + 1/λ.

Comment: While there is a possibility you might use the skewness of the Poisson Distribution, you
are extremely unlikely to ever use the kurtosis of the Poisson Distribution!
Kurtosis is discussed in “Mahlerʼs Guide to Loss Distributions.”
As lambda approaches infinity, the kurtosis of a Poisson approaches 3, that of a Normal Distribution.
As lambda approaches infinity, the Poisson approaches a Normal Distribution.]
Exercise: Derive the p.g.f. of the Geometric Distribution and use it to determine the variance.
∞ ∞ ∞ n
βn ⎛ βz⎞
∑ f(n) ∑ ∑
1
[Solution: P(z) = Expected Value of zn = zn = zn = ⎜ ⎟
(1+ β) n + 1 1+β ⎝ 1+ β ⎠
n=0 n=0 n=0
1 1 1
= = , z < 1 + 1/β.
1+β 1- zβ / (1+β) 1- β(z -1)
β 2β2
Pʼ(z) = 2 . Pʼʼ(z) = . µ(1) = Pʼ(1) = β. µ(2) = Pʼʼ(1) = 2β2.
{1- β(z- 1)} {1- β(z- 1)}3
Thus the variance of the Geometric distribution is: µ(2) + µ(1) - µ(1)2 = 2β2 + β - β2 = β(1+β).]
Formulas for the (a, b, 0) class:65
One can use iteration to calculate the factorial moments of a member of the (a,b,0) class.66
µ(1) = (a + b)/(1-a) µ(n) = (an + b)µ(n-1)/(1-a)
Exercise: Use the above formulas to compute the first three factorial moments about the origin of a
Negative Binomial Distribution.
[Solution: For a Negative Binomial Distribution: a = β/(1+β) and b = (r-1)β/(1+β).
µ(1) = (a + b)/(1 - a) = rβ/(1+β) / {1/(1+β)} = rβ.
µ(2) = (2a + b)µ(1)/(1 - a) = {(r+1)β/(1+β)} rβ / {1/(1+β)} = r(r+1)β2.
µ(3) = (3a + b)µ(2)/(1 - a) = {(r+2)β/(1+β)} r(r+1)β2 / {1/(1+β)} = r(r+1)(r+2)β3.]
In general, the nth factorial moment of a Negative Binomial Distribution is:

µ(n) = r(r+1)...(r+n-1)βn .
Exercise: Use the first three factorial moments to compute the first three moments about the origin of
a Negative Binomial Distribution.
[Solution: µ1 ′ = µ(1) = rβ. µ2 ′ = µ(2) + µ (1) = r(r+1)β2 + rβ.
µ3 ′ = µ(3) + 3µ(2) + µ(1) = r(r+1)(r+2)β3 + 3r(r+1)β2 + rβ.]
Exercise: Use the first two moments about the origin of a Negative Binomial Distribution to compute
its variance.
[Solution: The variance of the Negative Binomial is µ2 ′ - µ1 ′2 = r(r+1)β2 + rβ - (rβ)2 = rβ(1+β).]
Exercise: Use the first three moments about the origin of a Negative Binomial Distribution to
compute its skewness.
[Solution: Third central moment = µ3 ′ - (3 µ1 ′ µ2 ′) + (2 µ1 ′3) =
r(r+1)(r+2)β3 + 3r(r+1)β2 + rβ -(3)(rβ)( r(r+1)β2 + rβ) + 2(rβ)3
= 2rβ3 + 3rβ2 +rβ. Variance = rβ(1+β).

2rβ3 + 3rβ 2 + rβ 1 + 2β
Therefore, skewness = = .]
{rβ(1+β)}1.5 rβ(1 + β)
65
The (a, b, 0) class will be discussed subsequently.
66
See Appendix B.2 of Loss Models.
One can derive that for any member of the (a,b,0) class, the variance = (a+b) / (1-a)2 .67
For example for the Negative Binomial Distribution, a = β/(1+β) and b = (r-1)β/(1+β)
variance = (a+b)/(1-a)2 = {rβ/(1+β)} / {1/(1+β)2 } = rβ(1+β).
The derivation is as follows:

µ(1) = (a+b)/(1-a) . µ(2) = (2a +b)µ(1)/(1-a) = (2a +b)(a+b)/(1-a)2 .
µ2 ′ = µ(2) + µ(1) = (2a +b)(a+b)/(1-a)2 + (a+b)/(1-a) = (a +b+1)(a+b)/(1-a)2
variance = µ2 ′ - µ1 ′2 = (a +b+1)(a+b)/(1-a)2 - {(a+b)/(1-a)}2 = (a+b)/(1-a)2 .
Exercise: Use the above formula for the variance of a member of the (a,b,0) class to compute the
variance of a Binomial Distribution.
[Solution: For the Binomial, a = -q/(1-q) and b = (m+1)q/(1-q).
variance = (a+b)/(1-a)2 = mq/(1-q) / {1/(1-q) }2 = mq(1-q).]
Distribution nth Factorial Moment
Binomial m(m-1)...(m+1-n)qn for n ≤ m, 0 for n > m
Poisson λn
Negative Binomial r(r+1)...(r+n-1)βn
67
See Appendix B.2 of Loss Models.
Problems:
10.1 (2 points) The number of claims, N, made on an insurance portfolio follows the following
distribution:
n Pr(N=n)
0 0.3
1 0.3
2 0.2
3 0.1
4 0.1
What is the second factorial moment of N?
A. 1.6 B. 1.8 C. 2.0 D. 2.2 E. 2.4
10.2 (3 points) Determine the third moment of a Poisson Distribution with λ = 5.

A. less than 140
E. at least 200
10.3 (2 points) The random variable X has a Binomial distribution with parameters q and m = 8.
Determine the expected value of X(X -1)(X - 2).
A. 512 B. 512q(q-1)(q-2) C. q(q-1)(q-2) D. q3 E. None of A, B, C, or D
10.4 (2 points) You are given the following information about the probability generating function for
a discrete distribution:
• P'(1) = 10
• P"(1) = 98
Calculate the variance of the distribution.
A. 8 B. 10 C. 12 D. 14 E. 16
10.5 (3 points) The random variable X has a Negative Binomial distribution with parameters
β = 7/3 and r = 9. Determine the expected value of X(X -1)(X - 2)(X-3).
A. less than 200,000
B. at least 200,000 but less than 300,000
C. at least 300,000 but less than 400,000
D. at least 400,000 but less than 500,000
E. at least 500,000
10.6 (3 points) Determine the third moment of a Binomial Distribution with m = 10 and q = 0.3.
A. less than 40
E. at least 70
10.7 (3 points) Determine the third moment of a Negative Binomial Distribution with r = 10 and
β = 3.
A. less than 36,000
E. at least 42,000
10.8 (4B, 11/97, Q.21) (2 points) The random variable X has a Poisson distribution with mean λ.
Determine the expected value of X(X -1)...(X - 9).
A. 1 B. λ C. λ(λ−1)...(λ−9) D. λ10 E. λ(λ+1)...(λ+9)
10.9 (CAS3, 11/06, Q.25) (2.5 points) You are given the following information about the
probability generating function for a discrete distribution:
• P'(1) = 2
• P"(1) = 6
Calculate the variance of the distribution.
A. Less than 1.5
E. At least 4.5
10.1. D. The 2nd factorial moment is:

E[N(N-1)] = (.3)(0)(-1) + (.3)(1)(0) + (.2)(2)(1) + (.1)(3)(2) + (.1)(4)(3) = 2.2.
10.2. E. The factorial moments for a Poisson are: λn . mean = first factorial moment = λ = 5.
Second factorial moment = 52 = 25 = E[X(X-1)] = E[X2 ] - E[X]. ⇒ E[X2 ] = 25 + 5 = 30.

Third factorial moment = 53 = 125 = E[X(X-1)(X-2)] = E[X3 ] - 3E[X2 ] + 2E[X].
⇒ E[X3 ] = 125 + (3)(30) - (2)(5) = 205.
Alternately, for the Poisson P(z) = eλ(z-1).
P(z) = eλ(z-1). Pʼ(z) = λeλ(z-1). Pʼʼ(z) = λ2eλ(z-1). Pʼʼʼ(z) = λ3eλ(z-1).
mean = first factorial moment = Pʼ(1) = λ. Second factorial moment = Pʼʼ(1) = λ2.
Third factorial moment = Pʼʼʼ(1) = λ3. Proceed as before.
Alternately, the skewness of a Poisson is 1/ λ .

Since the variance is λ, the third central moment is: λ1.5/ λ = λ.
λ = E[(X - λ)3 ] = E[X3 ] - 3λE[X2 ] + 3λ2E[X] - λ3.
⇒ E[X3 ] = λ + 3λE[X2 ] - 3λ2E[X] + λ3 = λ + 3λ(λ + λ2) - 3λ2λ + λ3 = λ3 + 3λ2 + λ

= 125 + 75 + 5 = 205.
Comment: The third moment of a Poisson Distribution is: λ3 + 3λ2 +λ.
One could compute enough of the densities and then calculate the third moment:
0 0.674% 0.00000 0.00000 0.00000
1 3.369% 0.03369 0.03369 0.03369
2 8.422% 0.16845 0.33690 0.67379
3 14.037% 0.42112 1.26337 3.79010
4 17.547% 0.70187 2.80748 11.22991
5 17.547% 0.87734 4.38668 21.93342
6 14.622% 0.87734 5.26402 31.58413
7 10.444% 0.73111 5.11780 35.82459
8 6.528% 0.52222 4.17779 33.42236
9 3.627% 0.32639 2.93751 26.43761
10 1.813% 0.18133 1.81328 18.13279
11 0.824% 0.09066 0.99730 10.97034
12 0.343% 0.04121 0.49453 5.93437
13 0.132% 0.01717 0.22323 2.90193
14 0.047% 0.00660 0.09246 1.29444
15 0.016% 0.00236 0.03538 0.53070
16 0.005% 0.00079 0.01258 0.20127
17 0.001% 0.00025 0.00418 0.07101
Sum 0.99999458366 4.99990 29.99818 204.96644
8!
10.3. E. f(x) = qx (1- q)8 - x , for x = 0 to 8.
x! (8 - x)!
x=8 x=8
∑ x(x -1)(x - 2)f(x) = ∑ x(x -1)(x - 2) x! (8 - x)! qx (1- q)8 - x =

8!
E[X(X-1)(X-2)] =
x=3 x=3
x=8 y=5
∑ ∑ y! (55!- y)! qy (1- q)5 - y = 336q3 .

5!
(8)(7)(6)q3 qx - 3 (1- q)8 - x = 336q3
(x- 3)! (8 - x)!
x=3 y=0
Alternately, the 3rd factorial moment is the 3rd derivative of the p.g.f. at z = 1.
For the Binomial: P(z) = {1+ q(z-1)}m. dP/dz = mq{1+ q(z-1)}(m-1).
Pʼʼ(z) = m(m-1)q2 {1+ q(z-1)}(m-2). Pʼʼʼ(z) = m(m-1)(m-2)q3 {1+ q(z-1)}(m-3) .
Pʼʼʼ(1) = m(m-1)(m-2)q3 = (8)(7)(6)q3 = 336q3 .
Comment: Note that the product x(x -1)(x-2) is zero for x = 0,1 and 2, so only terms for x ≥3
contribute to the sum. Then a change of variables is made: y = x-3. Then the resulting sum is the
sum of Binomial terms from y = 0 to 5, which sum is one, since the Binomial is a Distribution, with a
support in this case 0 to 5. The expected value of: X(X -1)(X - 2), is an example of what is referred
to as a factorial moment.
In the case of the Binomial, the kth factorial moment for k≤m is:
p k(m!)/(m-k)! = pk(m)(m-1)...(m-(k-1)). In our case we have the 3rd factorial moment (involving the
product of 3 terms) equal to: q3 (m)(m-1)(m-2).
10.4. A. 10 = P'(1) = E[N]. 98 = P"(1) = E[N(N-1)] = E[N2 ] - E[N]. E[N2 ] = 98 + 10 = 108.

Var[N] = E[N2 ] - E[N]2 = 108 - 102 = 8.
(x + 8)! (7 / 3)x
10.5. C. f(x) = . E[ X(X-1)(X-2)(X-3) ] =
x! 8! (1+ 7 / 3)x + 9
x=∞ x=∞
(7 / 3)x
∑ ∑
(x + 8)!
x(x- 1)(x - 2)(x - 3)f(x) = x(x- 1)(x - 2)(x - 3) =
x! 8! (1+ 7 / 3)x + 9
x=0 x=4
x=∞
(7 / 3)x- 4
∑
(x + 8)!
(12)(11)(10)(9)(7/3)4 =
(x - 4)! 12! (1+ 7 / 3)x + 9
x=4
y=∞ y
∑ (y + 12)! (7 / 3)
352,147 = 352,147.
y! 12! (1+ 7 / 3)y + 13
y=0
Alternately, the 4th factorial moment is the 4th derivative of the p.g.f. at z = 1.
For the Negative Binomial: P(z) = {1- β(z-1)}-r. dP/dz = rβ{1- β(z-1)}-(r+1).
Pʼʼ(z) = r(r+1)β2{1- β(z-1)}-(r+2). Pʼʼʼ(z) = r(r+1)(r+2)β3{1- β(z-1)}-(r+3).
Pʼʼʼʼ(z) = r(r+1)(r+2)(r+3)β4{1- β(z-1)}-(r+4).
Pʼʼʼʼ(1) = r(r+1)(r+2)(r+3)β4 = (9)(10)(11)(12)(7/3)4 = 352,147.

Comments: Note that the product x(x -1)(x-2)(x-3) is zero for x = 0,1,2 and 3, so only terms for x ≥4
contribute to the sum. Then a change of variables is made: y = x-4. Then the resulting sum is the
sum of Negative Binomial terms, with β = 7/3 and r = 13, from y = 0 to infinity, which sum is one,
since the Negative Binomial is a Distribution with support from 0 to ∞.
The expected value of X(X -1)(X - 2)(X-3), is an example of a factorial moment.
In the case of the Negative Binomial, the mth factorial moment is: βm (r)(r+1)...(r+m-1).
In our case we have the 4th factorial moment (involving the product of 4 terms) equal to:
β 4 (r)(r+1)(r+2)(r+3), with β = 7/3 and r = 9.
10.6. B. P(z) = {1 + q(z-1)}m = {1 + 0.3(z-1)}10 = {0.7 + 0.3z}10.

Pʼ(z) = (10)(0.3){0.7 + 0.3z}9 . Pʼʼ(z) = (3)(2.7){0.7 + 0.3z}8 . Pʼʼʼ(z) = (3)(2.7)(2.4){0.7 + 0.3z}7 .
mean = first factorial moment = Pʼ(1) = 3.
Second factorial moment = Pʼʼ(1) = (3)(2.7) = 8.1.
Third factorial moment = Pʼʼʼ(1) = (3)(2.7)(2.4) = 19.44.
Second factorial moment = 8.1 = E[X(X-1)] = E[X2 ] - E[X]. ⇒ E[X2 ] = 8.1 + 3 = 11.1.
Third factorial moment = 19.44 = E[X(X-1)(X-2)] = E[X3 ] - 3E[X2 ] + 2E[X].
⇒ E[X3 ] = 19.44 + (3)(11.1) - (2)(3) = 46.74.
Comment: E[X2 ] = variance + mean2 = 2.1 + 32 = 11.1.
One could compute all of the densities and then calculate the third moment:
0 2.825% 0.00000 0.00000 0.00000
1 12.106% 0.12106 0.12106 0.12106
2 23.347% 0.46695 0.93390 1.86780
3 26.683% 0.80048 2.40145 7.20435
4 20.012% 0.80048 3.20194 12.80774
5 10.292% 0.51460 2.57298 12.86492
6 3.676% 0.22054 1.32325 7.93949
7 0.900% 0.06301 0.44108 3.08758
8 0.145% 0.01157 0.09259 0.74071
9 0.014% 0.00124 0.01116 0.10044
10 0.001% 0.00006 0.00059 0.00590
Sum 1 3.00000 11.10000 46.74000
10.7. C. P(z) = {1 - β(z-1)}-r = {1 - 3(z-1)}-10 = (4 - 3z)-10.

Pʼ(z) = (-10)(-3)(4 - 3z)-11. Pʼʼ(z) = (30)(33)(4 - 3z)-12. Pʼʼʼ(z) = (30)(33)(36)(4 - 3z)-13.
mean = first factorial moment = Pʼ(1) = 30.
Second factorial moment = Pʼʼ(1) = (30)(33) = 990.
Third factorial moment = Pʼʼʼ(1) = (30)(33)(36) = 35,640.
Second factorial moment = 990 = E[X(X-1)] = E[X2 ] - E[X]. ⇒ E[X2 ] = 990 + 30 = 1020.
Third factorial moment = 35,640 = E[X(X-1)(X-2)] = E[X3 ] - 3E[X2 ] + 2E[X].
⇒ E[X3 ] = 35,640 + (3)(1020) - (2)(30) = 38,640.
Comment: E[X2 ] = variance + mean2 = (10)(3)(4) + 302 = 1020.
10.8. D. For a discrete distribution, the expected value of a quantity is determined by taking the
sum of its product with the probability density function. In this case, the density of the Poisson is:
e−λ λ x / x! , x = 0, 1, 2... Thus E[ X(X -1)...(X - 9) ] =
x=∞
e - λ λx
∑ x(x- 1)(x - 2)(x - 3)(x - 4)(x - 5)(x - 6)(x - 7)(x - 8)(x - 9)
x!
=
x=0
x=∞ y=∞
λx - 10 y
e−λλ 10 ∑ (x -10)!
= e−λλ 10 ∑ λ
y!
= e−λλ 10 eλ = λ10.
x=10 y=0
Alternately, the 10th factorial moment is the 10th derivative of the p.g.f. at z = 1.
For the Poisson: P(z) = exp(λ(z-1)). dP/dz = λ exp(λ(z-1)). Pʼʼ(z) = λ2 exp(λ(z-1)).
Pʼʼʼ(z) = λ3 exp(λ(z-1)). P10(z) = λ10 exp(λ(z-1)). P10(1) = λ10.

Comment: Note that the product x(x -1)...(x - 9) is zero for x = 0,1...,9, so only terms for x ≥10
contribute to the sum. The expected value of X(X -1)...(X - 9), is an example of a factorial moment.
In the case of the Poisson, the nth factorial moment is λ to the nth power. In this case we have the
10th factorial moment (involving the product of 10 terms) equal to λ10.
10.9. D. 2 = P'(1) = E[N]. 6 = P"(1) = E[N(N-1)] = E[N2 ] - E[N]. E[N2 ] = 6 + 2 = 8.

Var[N] = E[N2 ] - E[N]2 = 8 - 22 = 4.
Comment: P(z) = E[zn ] = Σ f(n)zn . Pʼ(z) = Σ nf(n)zn-1. Pʼ(1) = Σ nf(n) = E[N].
Pʼʼ(z) = Σ n(n-1)f(n)zn-2. Pʼʼ(1) = Σ n(n-1)f(n) = E[N(N-1)].
2016-C-1, Frequency Distributions, §11 (a, b, 0) Class HCM 10/21/15, Page 193
Section 11, (a, b, 0) Class of Distributions
The “(a,b,0) class of frequency distributions” consists of the three common

distributions: Binomial, Poisson, and Negative Binomial.
Distribution Mean Variance Variance / Mean
Binomial mq mq(1-q) 1-q <1 Variance < Mean
Poisson λ λ 1 Variance = Mean
Negative Binomial rβ rβ(1+β) 1+β > 1 Variance > Mean
Distribution Skewness
1 - 2q
Binomial If q < 0.5 skewed right, if q > 0.5 skewed left
m q (1 - q)
Poisson 1/ λ Skewed to the right
1 + 2β
Negative Binomial Skewed to the right
rβ(1 + β)
Distribution f(x) f(x+1) f(x+1) / f(x)
m! qx (1- q)m - x m! qx+ 1 (1- q)m - x - 1 q m - x

Binomial
x! (m - x)! (x +1)! (m- x - 1)! 1 - q x + 1
Poisson λx e−λ / x! λx+1 e−λ / (x+1)! λ / (x +1)
r(r +1)...(r + x - 1) βx r(r +1)...(r + x) βx + 1 β x + r

Negative
x! (1+ β)x + r (x + 1)! (1+β) x + r + 1 1+β x + 1
Binomial
(a,b,0) relationship:
b
For each of these three frequency distributions: f(x+1) / f(x) = a + , x = 0, 1, 2, ...
x+1
where a and b depend on the parameters of the distribution:68
Distribution a b f(0)
Binomial -q/(1-q) (m+1)q/(1-q) (1-q)m
Poisson 0 λ e−λ
Negative Binomial β/(1+β) (r-1)β/(1+β) 1/(1+β)r
Loss Models writes this recursion formula equivalently as: pk/pk-1 = a + b/k, k = 1, 2, 3 ... 69
This relationship defines the (a,b,0) class of frequency distributions.70 The (a, b ,0) class of
frequency distributions consists of the three common distributions: Binomial, Poisson, and Negative
Binomial.71 Therefore, it also includes the Bernoulli, which is a special case of the Binomial, and the
Geometric, which is a special case of the Negative Binomial.
Note that a is positive for the Negative Binomial, zero for the Poisson, and negative for the Binomial.
These formula can be useful when programming these frequency distributions into spreadsheets.
One calculates f(0) and then one gets additional values of the density function via iteration:
f(x+1) = f(x){a + b / (x+1)}.
f(1) = f(0) (a + b). f(2) = f(1) (a + b/2). f(3) = f(2) (a + b/3). f(4) = f(3) (a + b/4), etc.
68
These a and b values are shown in the tables attached to the exam. This relationship is used in the Panjer
Algorithm (recursive formula), a manner of computing either the aggregate loss distribution or a compound
frequency distribution. For a member of the (a,b,0) class, the values of a and b determine everything about the
distribution. Given the density at zero, all of the densities would follow; however, the sum of all of the densities must
be one.
69
70
See Table 6.1 and Appendix B.2 in Loss Models. The (a, b, 0) class is distinguished from the (a, b, 1) class, to be
discussed in a subsequent section, by the fact that the relationship holds starting with the density at zero, rather
than possibly only starting with the density at one.
71
As stated in Loss Models, these are the only members of the (a, b, 0) class. This is proved in Lemma 6.6.1 of
Insurance Risk Models, by Panjer and Willmot. Only certain combinations of a and b are acceptable. Each of the
densities must be nonnegative and they must sum to one, a finite amount.
Thinning and Adding:
Distribution Thinning by factor of t Adding n independent, identical copies
Binomial q → tq m → nm
Poisson λ → tλ λ → nλ
Negative Binomial β → tβ r → nr
If for example, we assume 1/4 of all claims are large:
If All Claims Then Large Claims
Binomial m = 5, q = 0.04 Binomial m = 5, q = 0.01
Poisson λ = 0.20 Poisson λ = 0.05
Negative Binomial r = 2, β = 0.10 Negative Binomial r = 2, β = 0.025
In the Poisson case, small and large claims are independent Poisson Distributions.72
For X and Y independent:
X Y X+Y
Binomial(m1 , q) Binomial(m2 , q) Binomial(m1 + m2 , q)
Poisson(λ 1) Poisson(λ 2) Poisson(λ 1 + λ2)
Negative Binomial(r1 , β) Negative Bin.(r2 , β) Negative Bin.(r1 + r2 , β)
If Claims each Year Then Claims for 6 Independent Years
Binomial m = 5, q = 0.04 Binomial m = 30, q = 0.04
Poisson λ = 0.20 Poisson λ = 1.20
Negative Binomial r = 2, β = 0.10 Negative Binomial r = 12, β = 0.10
72
As discussed in the section on the Gamma-Poisson Frequency Process, in the Negative Binomial case, the
number of large and small claims are positively correlated. In the Binomial case, the number of large and small claims
are negatively correlated.
Probability Generating Functions:
Recall that the probability generating function for a given distribution is P(z) = E[zN].
Distribution Probability Generating Function
Binomial P(z) = {1 + q(z-1)}m
Poisson P(z) = eλ(z-1)
Negative Binomial P(z) = {1 - β(z-1)}-r, z < 1 + 1/β
Parametric Models:
Some advantages of parametric models:

1. They summarize the information in terms of the form of the distribution and the parameter values.
2. They serve to smooth the empirical data.
3. They greatly reduce the dimensionality of the information.
In addition one can use parametric models to extrapolate beyond the largest observation.
As will be discussed in a subsequent section, the behavior in the righthand tail is an important
feature of any frequency distribution.
Some advantages of working with separate distributions of frequency and severity:73
1. Can obtain a deeper understanding of a variety of issues surrounding insurance.

2. Allows one to address issues of modification of an insurance contract (for example, deductibles.)
3. Frequency distributions are easy to obtain and do a good job of modeling the empirical
situations.
73
Limits:
Since, the probability generating function determines the distribution, one can take limits of a
distribution by instead taking limits of the Probability Generating Function.
Assume one takes a limit of the probability generating function of a Binomial distribution for qm = λ
as m → ∞ and q → 0 :
P(z) = {1 + q(z-1)}m = {1 + q(z-1)}λ/q = [{1 + q(z-1)}1/q]λ → {e(z-1)}λ = eλ(z-1).
Where we have used the fact that as x → 0, (1+ax)1/x → ea.
Thus the limit of the Binomial Probability Generating Function is the Poisson Probability Generating
Function. Therefore, as we let q get very small in a Binomial but keep the mean constant, in the limit
one approaches a Poisson with the same mean.74
For example, a Poisson (triangles) with mean 10 is compared to a Binomial (squares) with q = 1/3
and m = 30 (mean = 10, variance = 20/3):
0.15
0.125
0.1
0.075
0.05
0.025
5 10 15 20
While the Binomial is shorter-tailed than the Poisson, they are not that dissimilar.
74
The limit of the probability generating function is the probability generating function of the limit of the distributions
if it exists.
Assume one takes a limit of the probability generating function of a Negative Binomial distribution for
rβ = λ as r → ∞ and β → 0:
P(z) = {1- β(z-1)}-r = {1- β(z-1)}−λ/β = [{1- β(z-1)} 1/β ]-λ → {e−(z-1)}-λ = eλ(z-1).
Thus the limit of the Negative Binomial Probability Generating Function is the Poisson Probability
Generating Function. Therefore, as we let β get very close to zero in a Negative Binomial but keep
the mean constant, in the limit one approaches a Poisson with the same mean.75
A Poisson (triangles) with mean 10 is compared to a Negative Binomial Distribution (squares) with
r = 20 and β = 0.5 (mean = 10, variance = 15):
0.12
0.1
0.08
0.06
0.04
0.02
5 10 15 20
For the three distributions graphed here and previously, while the means are the same, the
variances are significantly different; thus the Binomial is more concentrated around the mean while the
Negative Binomial is more dispersed from the mean. Nevertheless, one can see how the three
distributions are starting to resemble each other.76
75
The limit of the probability generating function is the probability generating function of the limit of the distributions
if it exists.
76
They are each approximated by a Normal Distribution. While these three Normal Distributions have the same mean,
they have different variances.
If the Binomial q were smaller and m larger such that the mean remained 10, for example q = 1/30
and m = 300, then the Binomial would have been much closer to the Poisson. Similarly, if on the
Negative Binomial one had β closer to zero with r larger such that the mean remained 10, for
example β = 1/9 and r = 90, then the Negative Binomial would have been much closer to the
Poisson.
Thus the Poisson is the limit of either a series of Binomial or Negative Binomial Distributions as they
“come from different sides.”77 The Binomial has q go to zero; one adds up very many Bernoulli
Trials each with a very small chance of success. This approaches a constant chance of success per
very small unit of time, which is a Poisson. Note that for each Binomial the mean is greater than the
variance, but as q goes to zero the variance approaches the mean.
For the Negative Binomial one lets β goes to zero; one adds up very many Geometric distributions
each with very small chance of a claim.78 Again this limit is a Poisson, but in this case for each
Negative Binomial the variance is greater than the mean. As β goes to zero, the variance
approaches the mean.
As mentioned previously the Distribution Function of the Binomial Distribution is a form of the
Incomplete Beta Function, while that of the Poisson is in the form of an Incomplete Gamma
Function. As q → 0 and the Binomial approaches a Poisson, the Distribution Function of the
Binomial approaches that of the Poisson. An Incomplete Gamma Function can thus be
obtained as a limit of Incomplete Beta Distributions. Similarly, the Distribution Function of the
Negative Binomial is a somewhat different form of the Incomplete Beta Distribution.
As β → 0 and the Negative Binomial approaches a Poisson, the Distribution Function of the
Negative Binomial approaches that of the Poisson. Again, an Incomplete Gamma Function can
be obtained as a limit of Incomplete Beta Distributions.
77
One can also show this via the use of Sterlingʼs formula to directly calculate the limits rather than via the use of
Probability Generating Functions.
78
The mean of a Geometric is β, thus as β → 0, the chance of a claim becomes very small.
For the Negative Binomial, r = mean/β, so that as β → 0 for a fixed mean, r → ∞.
Modes:
The mode, where the density is largest, can be located by observing where f(x+1)/f(x) switches
from being greater than 1 to being less than 1.79
Exercise: For a member of the (a, b, 0) frequency class, when is f(x+1)/f(x) greater than one, equal
to one, and less than one?
[Solution: f(x+1)/f(x) = 1 when a + b/(x+1) = 1. This occurs when x = b/(1-a) - 1.
For x < b/(1-a) - 1, f(x+1)/f(x) > 1. For x > b/(1-a) - 1, f(x+1)/f(x) < 1.]
For example, for a Binomial Distribution with m = 10 and q = .23, a = - q/(1-q) = -.2987 and
b = (m+1)q/(1-q) = 3.2857. For x > b/(1-a) - 1 = 1.53, f(x+1)/f(x) < 1.
Thus f(3) < f(2). For x < 1.53, f(x+1)/f(x) > 1. Thus f(2) > f(1). Therefore, the mode is 2.
In general, since for x < b/(1-a) - 1, f(x+1) > f(x), if c is the largest integer in b/(1-a), f(c) > f(c-1).
Since for x > b/(1-a) - 1, f(x+1) < f(x), f(c+1) > f(c). Thus c is the mode.
For a member of the (a, b, 0) class, the mode is the largest integer in b/(1-a).
If b/(1-a) is an integer, then f(b/(1-a) - 1) = f(b/(1-a)), and there are two modes.
For the Binomial Distribution, a = - q/(1-q) and b = (m+1)q/(1-q), so b/(1-a) = (m+1)q.

Thus the mode is the largest integer in (m+1)q.
If (m+1)q is an integer, there are two modes at: (m+1)q and (m+1)q - 1.
For the Poisson Distribution, a = 0 and b = λ, so b/(1-a) = λ. Thus the mode is the largest integer in
λ. If λ is an integer, there are two modes at: λ and λ - 1.
For the Negative Binomial Distribution, a = β/(1+β) and b = (r-1)β/(1+β), so b/(1-a) =

(r-1)β . Thus the mode is the largest integer in (r-1)β.
If (r-1)β is an integer, there are two modes at: (r-1)β and (r-1)β - 1.
Note that in each case the mode is close to the mean.80

So one could usefully start a numerical search for the mode at the mean.
79
In general this is only a local maximum, but members of the (a,b, 0) class do not have local maxima other than the
mode.
80
The means are mq, λ, and rβ.
Moments:
Formulas for the Factorial Moments of the (a, b, 0) class have been discussed in a previous section.
It can be derived from those formulas that for a member of the (a, b, 0) class:
Mean (a + b)/(1 - a)
Second Moment (a + b)(a + b + 1)/(1 - a)2
Variance (a + b)/(1 - a)2
Third Moment (a + b){(a + b + 1)(a + b + 2) + a - 1}/(1 - a)3
Skewness (a + 1)/ a + b
A Generalization of the (a, b, 0) Class:
The (a, b, 0) relationship is: f(x+1) / f(x) = a + {b / (x+1)}, x = 0, 1, 2, ...

or equivalently: pk / pk-1 = a + b/k, k = 1, 2, 3, ...
A more general relationship is: pk / pk-1 = (ak + b) / (k + c), k = 1, 2, 3, ...
If c = 0, then this would reduce to the (a, b, 0) relationship.

Contagion Model:81
Assume one has a claim intensity of λ. Then the chance of having a claim over an extremely small
period of time Δt is approximately λ (Δt).82 We assume there is (virtually) no chance of having more
than one claim over extremely small time period Δt.
As mentioned previously, if the claim intensity is a constant over time, then the number of claims
observed over a period time t is given by a Poisson Distribution, with mean λt. If the claim intensity
depends on the number of claims that have occurred so far then the frequency distribution is other
than Poisson.
Given one has had k-1 claims so far, let λk Δt be the chance of having the kth claim in small time
period Δt. Then the times between claims are independent Exponentials; the mean time between
claim k -1 and k is 1/λk. Assume that this claims intensity is linear in k: λk = c + d k, c > 0.
Then for d > 0, it turns out that one gets a Negative Binomial Distribution. As one observes more
claims the chance of observing another claim goes up. This is referred to as positive contagion;
examples might be claims due to a contagious disease or from a very large fire. Over time period
(0, t), the parameters of the Negative Binomial are: r = c/d, and β = edt - 1.
For d < 0, it turns out that one gets a Binomial distribution. As one observes more claims, the chance
of future claims goes down. This is referred to as negative contagion. Over time period (0, t), the
parameters of the Binomial are: m = -c/d, and q = 1 - edt.
For d = 0 one gets the Poisson. There is no contagion and the claim intensity is constant. Thus the
contagion model is another mathematical connection between these three common frequency
distributions. We expect as d → 0 in either the Binomial or Negative Binomial that we approach a
Poisson. This is indeed the case as discussed previously.
As discussed in “Mahlerʼs Guide to Simulation,” in Section 20.2.3 of Loss Models this model is
used to simulate members of the (a, b, 0) class.
There we simulate the number of claims from time 0 to time 1. Thus β = ed - 1, and q = 1 - ed .
Thus for the Negative Binomial: d = ln[1 + β], and c = r ln[1 + β].
For the Binomial: d = ln[1 - q], and c = -m ln[1 - q].
81
Not on the syllabus of your exam. See pages 52-53 of Mathematical Methods of Risk Theory by Buhlmann.
82
The claim intensity is analogous to the force of mortality in Life Contingencies.
As used in the Heckman-Meyers algorithm to calculate aggregate losses, the frequency distributions
are parameterized in a related but somewhat different manner via their mean λ and
a “contagion parameter” c:83
λ c
Binomial mq -1/m
Poisson λ 0
Negative Binomial rβ 1/r
HyperGeometric Distribution:
For the HyperGeometric Distribution with parameters m, n, and N, the density is:84
⎛ m⎞ ⎛ N - m⎞
⎜ x⎟ ⎜ n - x ⎟
⎝ ⎠ ⎝ ⎠
f(x) = , x = 0, 1, ..., n.
⎛N⎞
⎜ m⎟
⎝ ⎠
⎛ m ⎞ ⎛ N - m ⎞
⎜ x + 1⎟ ⎜ n - x - 1⎟
f(x +1) ⎝ ⎠ ⎝ ⎠ (m- x)! x! (N - m - n + x)! (n- x)!
= = .
f(x) ⎛ m⎞ ⎛ N - m⎞ (m- x - 1)! (x +1)! (N - m - n + x + 1)! (n - x - 1)!
⎜ x⎟ ⎜ n - x ⎟
⎝ ⎠ ⎝ ⎠
m - x n - x
= .
x + 1 N - m - n + x + 1
Thus the HyperGeometric Distribution is not a member of the (a, b, 0) family.
nm nm (N - m) (N - n)
Mean = . Variance = .
N (N - 1) N2
83
Not on the syllabus of your exam. See PCAS 1983 p. 35-36, “The Calculation of Aggregate Loss Distributions
from Claim Severity and Claim Count Distributions,” by Phil Heckman and Glenn Meyers.
84
Not on the syllabus of your exam. See for example, A First Course in Probability, by Sheldon Ross.
If we had an urn with N balls,of which m were white, and we took a sample of size n, then f(x) is the probability that x of
the balls in the sample were white.
For example, tests with 35 questions will be selected at random from a bank of 500 questions.
Treat the 35 questions on the first randomly selected test as white balls.
Then the number of white balls in a sample of size n from the 500 balls is HyperGeometric with m = 35 and N = 500.
Thus the number of questions a second test of 35 questions has in common with the first test is HyperGeometric
with m = 35, n = 35, and N = 500. The densities from 0 to 10 are: 0.0717862, 0.204033, 0.272988, 0.228856,
0.134993, 0.0596454, 0.0205202, 0.00564155, 0.00126226, 0.000232901, 0.000035782.
Problems:
11.1 (1 point) Which of the following statements are true?

1. The variance of the Negative Binomial Distribution is less than the mean.
2. The variance of the Poisson Distribution only exists for λ > 2.
3. The variance of the Binomial Distribution is greater than the mean.
A. 1 B. 2 C. 3 D. 1, 2, and 3 E. None of A, B, C, or D
11.2 (1 point) A member of the (a, b, 0) class of frequency distributions has a = -2.
Which of the following types of Distributions is it?
A. Binomial B. Poisson C. Negative Binomial D. Logarithmic E. None A, B, C, or D.
11.3 (1 point) A member of the (a, b, 0) class of frequency distributions has a = 0.4 and b = 2.
Given f(4) = 0.1505, what is f(7)?
A. Less than 0.06
E. At least 0.09
11.4 (2 points) X is a discrete random variable with a probability function which is a member of the
(a,b,0) class of distributions. P(X = 1) = 0.0064. P(X = 2) = 0.0512. P(X = 3) = 0.2048.
Calculate P(X = 4).
(A) 0.37 (B) 0.38 (C) 0.39 (D) 0.40 (E) 0.41
11.5 (2 points) For a discrete probability distribution, you are given the recursion relation:
1 0.6
f(x+1) = { + } f(x), x = 0, 1, 2,….
3 x +1
Determine f(3).
(A) 0.09 (B) 0.10 (C) 0.11 (D) 0.12 (E) 0.13
11.6 (2 points) A member of the (a, b, 0) class of frequency distributions has a = 0.4, and
b = 2.8. What is the mode?
A. 0 or 1 B. 2 C. 3 D. 4 E. 5 or more
11.7 (2 points) For a discrete probability distribution, you are given the recursion relation:
p(x) = (-2/3 + 4/x)p(x-1), x = 1, 2,….
Determine p(3).
(A) 0.19 (B) 0.20 (C) 0.21 (D) 0.22 (E) 0.23
(a, b, 0) class of distributions.
P(X = 100) = 0.0350252. P(X = 101) = 0.0329445. P(X = 102) = 0.0306836.
Calculate P(X = 105).
(A) .022 (B) .023 (C) .024 (D) .025 (E) .026
P(X = 10) = 0.1074. P(X = 11) = 0.
Calculate P(X = 6).
(A) 6% (B) 7% (C) 8% (D) 9% (E) 10%
11.10 (3 points) Show that the (a, b, 0) relationship with a = -2 and b = 6 leads to a legitimate
distribution while a = -2 and b = 5 does not.
11.11 (2 points) A discrete probability distribution has the following properties:

(i) pk = c(-1 + 4/k)pk-1 for k = 1, 2,…
(ii) p0 = 0.7.
Calculate c.
(A) 0.06 (B) 0.13 (C) 0.29 (D) 0.35 (E) 0.40
11.12 (3 points) Show that the (a, b, 0) relationship with a = 1 and b = -1/2 does not lead to a
legitimate distribution.
11.13 (3 points) A member of the (a, b, 0) class of frequency distributions has been fit via
maximum likelihood to the number of claims observed on 10,000 policies.
Number of claims Number of Policies Fitted Model
0 6587 6590.79
1 2598 2586.27
2 647 656.41
3 136 136.28
4 25 25.14
5 7 4.29
6 or more 0 0.80
Determine what type of distribution has been fit and the value of the fitted parameters.
11.14 (4, 5/86, Q.50) (1 point) Which of the following statements are true?
1. For a Poisson distribution the mean and variance are equal.
2. For a binomial distribution the mean is less than the variance.
3. The negative binomial distribution is a useful model of the distribution of claim
frequencies of a heterogeneous group of risks.
A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
11.15 (4B, 11/92, Q.21) (1 point) A portfolio of 10,000 risks yields the following:
Number of Claims Number of Insureds
0 6,070
1 3,022
2 764
3 126
4 18
Based on the portfolio's sample moments, which of the following distributions provides the best fit
to the portfolio's number of claims?
A. Binomial B. Poisson C. Negative Binomial D. Lognormal E. Pareto
11.16 (5A, 11/94, Q.24) (1 point) Let X and Y be random variables representing the number of
claims for two separate portfolios of insurance risks. You are asked to model the distributions of the
number of claims using either the Poisson or Negative Binomial distributions. Given the following
information about the moments of X and Y, which distribution would be the best choice for each?
E[X] = 2.40 E[Y] = 3.50
E[X2 ] = 8.16 E[Y2 ] = 20.25
A. X is Poisson and Y is Negative Binomial
B. X is Poisson and Y is Poisson
C. X is Negative Binomial and Y is Negative Binomial
D. X is Negative Binomial and Y is Poisson
E. Neither distribution is appropriate for modeling numbers of claims.
11.17 (5A, 11/99, Q.39) (2 points) You are given the following information concerning the
distribution, S, of the aggregate claims of a particular line of business:
E[S] = $500,000 and Var[S] = 7.5 x 109 .
The claim severity follows a Normal Distribution with both mean and standard deviation equal to
$5,000.
What conclusion can be drawn regarding the individual claim propensity of the insureds in this line of
business?
11.18 (3, 5/01, Q.25 & 2009 Sample Q.108) (2.5 points) For a discrete probability distribution,
you are given the recursion relation
p(k) = (2/k) p(k-1), k = 1, 2,….
Determine p(4).
(A) 0.07 (B) 0.08 (C) 0.09 (D) 0.10 (E) 0.11
11.19 (3, 11/02, Q.28 & 2009 Sample Q.94) (2.5 points) X is a discrete random variable with a
probability function which is a member of the (a,b,0) class of distributions.
You are given:
(i) P(X = 0) = P(X = 1) = 0.25
(ii) P(X = 2) = 0.1875
Calculate P(X = 3).
(A) 0.120 (B) 0.125 (C) 0.130 (D) 0.135 (E) 0.140
11.20 (CAS3, 5/04, Q.32) (2.5 points) Which of the following statements are true about the sums
of discrete, independent random variables?
1. The sum of two Poisson random variables is always a Poisson random variable.
2. The sum of two negative binomial random variables with parameters (r, β) and (r', β') is a
negative binomial random variable if r = r'.
3. The sum of two binomial random variables with parameters (m, q) and (m', q') is a binomial
random variable if q = q'.
A. None of 1, 2, or 3 is true. B. 1 and 2 only C. 1 and 3 only D. 2 and 3 only E. 1, 2, and 3
11.21 (CAS3, 5/05, Q.16) (2.5 points)

Which of the following are true regarding sums of random variables?
1. The sum of two independent negative binomial distributions with parameters (r1 , β1) and
(r2 , β2) is negative binomial if and only if r1 = r2 .

2. The sum of two independent binomial distributions with parameters (q1 , m1 ) and (q2 , m2 )
is binomial if and only if m1 = m2 .
3. The sum of two independent Poison distributions with parameters λ1 and λ2 is Poisson if
and only if λ1 = λ2.

A. None are true B. 1 only C. 2 only D. 3 only E. 1 and 3 only
11.22 (SOA M, 5/05, Q.19 & 2009 Sample Q.166) (2.5 points)
A discrete probability distribution has the following properties:
(i) pk = c (1 + 1/k) pk-1 for k = 1, 2,…
(ii) p0 = 0.5.
Calculate c.
(A) 0.06 (B) 0.13 (C) 0.29 (D) 0.35 (E) 0.40
11.23 (CAS3, 5/06, Q.31) (2.5 points)

N is a discrete random variable from the (a, b, 0) class of distributions.
The following information is known about the distribution:
• Pr(N = 0) = 0.327680
• Pr(N = 1) = 0.327680
• Pr(N = 2) = 0.196608
• E(N) = 1.25
Based on this information, which of the following are true statements?
I. Pr(N = 3) = 0.107965
II. N is from a Binomial distribution.
III. N is from a Negative Binomial distribution.
A. I only B. II only C. III only D. I and II E. I and III
11.1. E. 1. The variance of the Negative Binomial Distribution is greater than the mean. Thus
Statement #1 is false. 2. The variance of the Poisson always exists (and is equal to the mean.)
Thus Statement #2 is false.
3. The variance of the Binomial Distribution is less than the mean. Thus Statement #3 is false.
11.2. A. For a < 0, one has a Binomial Distribution.

Comment: Since a = -q/(1-q), q = a/(a-1) = -2/(-3) = 2/3.
a = 0 is a Poisson, 1> a > 0 is a Negative Binomial. The Logarithmic Distribution is not a member of
the (a,b,0) class. The Logarithmic Distribution is a member of the (a,b,1) class.
11.3. B. f(x+1) = f(x) {a + b/(x+1)} = f(x){.4 + 2/(x+1)} = f(x)(.4)(x+6)/(x+1).

Then proceed iteratively. For example f(5) = f(4)(.4)(10)/5 = (.1505)(.8) = .1204.
n 0 1 2 3 4 5 6 7
f(n) 0.0467 0.1120 0.1568 0.1672 0.1505 0.1204 0.0883 0.0605
Comment: Since 0 < a < 1 we have a Negative Binomial Distribution. r = 1 + b/a = 1+ (2/.4) = 6.
β = a/(1-a) = .4/.6 = 2/3. Thus once a and b are given in fact f(4) is determined. Normally one would
compute f(0) = (1+β)-r = .66 = .0467, and proceed iteratively from there.
11.4. E. For a member of the (a,b,0) class of distributions, f(x+1) / f(x) = a + {b / (x+1)}.
f(2)/f(1) = a + b/2. ⇒ 0.0512/0.0064 = 8 = a + b/2.
f(3)/f(2) = a + b/3. ⇒ 0.2048/0.0512 = 4 = a + b/3.

Therefore, a = -4 and b = 24. f(4) = f(3)(a + b/4) = (.2048)(-4 + 24/4) = 0.4096.
Alternately, once one solves for a and b, a < 0 ⇒ a Binomial Distribution.
-4 = a = -q/(1-q) ⇒ q = 0.8. 24 = b = (m+1)q/(1-q). ⇒ m + 1 = 6. ⇒ m = 5.

f(4) = (5)(0.84 )(0.2) = 0.4096.
Comment: Similar to 3, 11/02, Q.28.
11.5. B. This is a member of the (a, b, 0) class of frequency distributions with a = 1/3 and
b = 0.6. Since a > 0, this is a Negative Binomial, with a = β/(1+β) = 1/3, and
b = (r - 1)β/(1 + β) = 0.6. Therefore, r - 1 = 0.6/(1/3) = 1.8. ⇒ r = 2.8. β = 0.5.

f(3) = {(2.8)(3.8)(4.8)/3!} 0.53 /(1.52.8+3) = 0.1013.
Comment: Similar to 3, 5/01, Q.25. f(x+1) = f(x) {a + b/(x+1)}, x = 0, 1, 2, ...
11.6. D. For a member of the (a, b, 0) class, the mode is the largest integer in b/(1-a) =
2.8/(1-.4) = 4.667. Therefore, the mode is 4.
Alternately, f(x+1)/f(x) = a + b/(x+1) = .4 + 2.8/(x+1).
x 0 1 2 3 4 5 6
f(x+1)/f(x) 3.200 1.800 1.333 1.100 0.960 0.867 0.800
Therefore, f(4) = 1.1 f(3) > f(3), but f(5) = .96f(4) < f(4). Therefore, the mode is 4.
Alternately, since a > 0, this a Negative Binomial Distribution with a = β/(1+β) and
b = (r-1)β/(1+β). Therefore, β = a/(1-a) = 0.4/0.6 = 2/3 and r = b/a + 1 = 2.8/0.4 + 1 = 8.
The mode of a Negative Binomial is the largest integer in: (r-1)β = (7)(2/3) = 4.6667.
Therefore, the mode is 4.
11.7. E. This is a member of the (a, b, 0) class of frequency distributions with a = -2/3 and
b = 4. Since a < 0, this is a Binomial, with a = -q/(1-q) = -2/3, and b = (m+1)q/(1-q) = 4.
Therefore, m + 1 = 4/(2/3) = 6; m = 5. q = .4. f(3) = {(5!)/((3!)(2!))}0.43 0.62 = 0.2304.
Comment: Similar to 3, 5/01, Q.25. f(x) = f(x-1) {a + b/x}, x = 1, 2, 3, ...
11.8. B. For a member of the (a,b,0) class of distributions, f(x+1) / f(x) = a + {b / (x+1)}.
f(101)/f(100) = a + b/101. ⇒ 0.0329445/0.0350252 = .940594 = a + b/101.
f(102)/f(101) = a + b/102. ⇒ 0.0306836 /0.0329445 = .931372 = a + b/102.

Therefore, a = 0 and b = 95.0. f(105) = f(102)(a + b/103)(a + b/104)(a + b/105) =
(0.0306836)(95/103)(95/104)(95/105) = 0.0233893.
Comment: Alternately, once one solves for a and b, a = 0 ⇒ a Poisson Distribution.
λ = b = 95. f(105) = e-95(95105)(105!) = .0233893, difficult to calculate using most calculators.
11.9. D. P(X = 11) = 0. ⇒ finite support.

The only member of the (a, b, 0) class with finite support is the Binomial Distribution.
P(X = 11) = 0 and P(X = 10) > 0 ⇒ m = 10. 0.1074 = P(X = 10) = q10. ⇒ q = .800.
10!
P(X = 6) = (1-q)4 q6 = (210) (0.24 ) (0.86 ) = 0.088.
6! 4!
11.10. f(1) = f(0) (a + b). f(2) = f(1) (a + b/2). f(3) = f(2) (a + b/3). f(4) = f(3) (a + b/4), etc.
For a = -2 and b = 6:
f(1) = f(0) (-2 + 6) = 4 f(0). f(2) = f(1) (-2 + 6/2) = f(1). f(3) = f(2) (-2 + 6/3) = 0. f(4) = 0, etc.
This is a Binomial with m = 2 and q = a/(a-1) = 2/3.
f(0) = 1/9. f(1) = 4/9. f(2) = 4/9.
For a = -2 and b = 5:
f(1) = f(0) (-2 + 5) = 3 f(0). f(2) = f(1) (-2 + 5/2) = 1.5f(1). f(3) = f(2) (-2 + 5/3) < 0. No good!
Comment: Similar to Exercise 6.3 in Loss Models.
For a < 0, we require that b/a be a negative integer.
11.11. B. This is the (a, b, 0) relationship, with a = -c and b = 4c.

For the Binomial, a < 0. For the Poisson a = 0. For the Negative Binomial, a > 0.
c must be positive, since the densities are positive, therefore, a < 0 and this is a
Binomial. For the Binomial, a = -q/(1-q) and b = (m+1)q/(1-q).
b = -4a. ⇒ m + 1 = 4. ⇒ m = 3.
0.7 = p0 = (1 - q)m = (1 - q)3 . ⇒ q = .1121.

c = -a = q/(1-q) = .1121/.8879 = 0.126.
Comment: Similar to SOA M, 5/05, Q.19.
11.12. f(1) = f(0) (1 - 1/2) = (1/2) f(0). f(2) = f(1) (1 - 1/4) = (3/4)f(1).
f(3) = f(2) (1 - 1/6) = (5/6)f(2). f(4) = f(3) (1 - 1/8) = (7/8)f(3). f(5) = f(4) (1 - 1/10) = (9/10)f(4).
The sum of these densities is:
f(0){1 + 1/2 + (3/4)(1/2) + (5/6)(3/4)(1/2) + (7/8)(5/6)(3/4)(1/2) + (9/10)(7/8)(5/6)(3/4)(1/2) + ...}
f(0){1 + 1/2 + 3/8 + 5/16 + 35/128 + 315/1280 + ...} > f(0){1 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6 + ...}.
However, the sum 1 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6 + ..., diverges.
Therefore, these densities would sum to infinity.
Comment: We require that a < 1. a is positive for a Negative Binomial; a = β/(1 + β) < 1.
11.13. For a member of the (a, b, 0) class f(1)/f(0) = a + b, and f(2)/f(1) = a + b/2.
a + b = 2586.27/6590.79 = 0.39241.
a + b/2 = 656.41/2586.27 = 0.25381.
⇒ b = 0.27720. ⇒ a = 0.11521.
Looking in Appendix B in the tables attached to the exam, a is positive for the Negative Binomial.
Therefore, we have a Negative Binomial.
0.11521 = a = β/(1+β).
⇒ 1/β = 1/0.11521 - 1 = 7.6798. ⇒ β = 0.1302.

0.27720 = b = (r-1) β/(1+β).
⇒ r - 1 = 0.27720/0.11521 = 2.4060. ⇒ r = 3.406.
Comment: Similar to Exercise 16.21b in Loss Models.
11.14. C. 1. True.
2. False. The variance = nq(1-q) is less than the mean = nq, since q < 1.
3. True. Statement 3 is referring to the mixture of Poissons via a Gamma, which results in a Negative
Binomial frequency distribution for the entire portfolio.
11.15. B. The mean frequency is .5 and the variance is: 0.75 - 0.52 = 0.5.
Number Number Square of
of Insureds of Claims Number of Claims
6070 0 0
3022 1 1
764 2 4
126 3 9
18 4 16
Average 0.5000 0.7500
Since estimated mean = estimated variance, we expect the Poisson to provide the best fit.
Comment: If the estimated mean is approximately equal to the estimated variance, then the
Poisson is likely to provide a good fit. The Pareto and the LogNormal are continuous distributions
not used to fit discrete frequency distributions.
11.16. A. Var[X] = E[X2 ] - E[X]2 = 8.16 - 2.402 = 2.4 = E[X], so a Poisson Distribution is a good
choice for X. Var[Y] = E[Y2 ] - E[Y]2 = 20.25 - 3.502 = 8 > 3.5 = E[Y], so a Negative Binomial
Distribution is a good choice for Y.
11.17. Mean frequency = $500,000/$5000 = 100. Assuming frequency and severity are
independent: Var[S] = 7.5 x 109 = (100)(50002 ) + (50002 ) (Variance of the frequency).
Variance of the frequency = 200. Thus if each insured has the same frequency distribution, then it has
variance > mean, so it might be a Negative Binomial. Alternately, each insured could have a Poisson
frequency, but with the means varying across the portfolio. In that case, the mean of mixing
distribution = 100. When mixing Poisons, Variance of the mixed distribution
= Mean of mixing Distribution + Variance of the mixing distribution,
so the variance of the mixing distribution = 200 - 100 = 100.
Comment: There are many possible other answers.
11.18. C. f(x+1)/f(x) = 2/(x+1), x = 0, 1, 2,...

This is a member of the (a, b , 0) class of frequency distributions:
with f(x+1)/f(x) = a + b/(x+1), for a = 0 and b = 2.
Since a = 0, this is a Poisson with λ = b = 2. f(4) = e-2 24 /4! = 0.090.
Alternately, let f(0) = c. Then f(1) = 2c, f(2) = 22 c/2!, f(3) = 23 c/3!, f(4) = 24 c/4!, ....
1 = Σ f(i) = Σ2ic/i! = cΣ2i/i! = c e2 . Therefore, c = e-2. f(4) = e-2 24 /4! = 0.090.
11.19. B. For a member of the (a, b, 0) class of distributions, f(x+1) / f(x) = a + {b / (x+1)}.
f(1)/f(0) = a + b. ⇒ 0.25/0.25 = 1 = a + b.
f(2)/f(1) = a + b/2. ⇒ 0.1875/0.25 = 0.75 = a + b/2.

Therefore, a = 0.5 and b = 0.5.
f(3) = f(2)(a + b/3) = (0.1875)(0.5 + .5/3) = 0.125.
Alternately, once one solves for a and b, a > 0 ⇒ a Negative Binomial Distribution.
1/2 = a = β/(1 + β). ⇒ β = 1. 1/2 = b = (r-1)β/(1 + β). ⇒ r - 1 = 1. ⇒ r = 2.
f(3) = r(r + 1)(r + 2) β3/{(1 + β)r+3 3!} = (2)(3)(4)/{(25 )(6)} = 0.125.
11.20. C. 1. True. 2. False. Would be true if β = β', in which case the sum would have the sum of
the r parameters. 3. True. The sum would have the sum of the m parameters.
Comment: Note the requirement that the variables be independent.
11.21. A. The sum of two independent negative binomial distributions with parameters
(r1 , β1) and (r2 , β2) is negative binomial if and only if β1 = β2. Statement 1 is false.
The sum of two independent binomial distributions with parameters (q1 , m1 ) and (q2 , m2 ) is
binomial if and only if q1 = q2 . Statement 2 is false.
The sum of two independent Poison distributions with parameters λ1 and λ2 is Poisson, regardless
of the values of lambda. Statement 3 is false.
11.22. C. This is the (a, b, 0) relationship, with a = c and b = c.

For the Binomial, a < 0. For the Poisson a = 0. For the Negative Binomial, a > 0.
c must be positive, since the densities are positive, therefore, a > 0 and this is a
Negative Binomial. For the Negative Binomial, a = β/(1+β) and b = (r-1)β/(1+β).
a = b. ⇒ r - 1 = 1. ⇒ r = 2.
0.5 = p0 = 1/(1+β)r = 1/(1+β)2 . ⇒ (1+β)2 = 2. ⇒ β = 2 - 1 = 0.4142.
c = a = β/(1+β) = 0.4142/1.4142 = 0.293.
11.23. C. For a member of the (a, b, 0) class, f(1)/f(0) = a + b, and f(2)/f(1) = a + b/2.
Therefore, a + b = 1, and a + b/2 = 0.196608/0.327680 = 0.6. ⇒ a = 0.2 and b = 0.8.
Since a is positive, we have a Negative Binomial Distribution. Statement III is true.
f(3) = f(2)(a + b/3) = (0.196608)(0.2 + 0.8/3) = 0.0917504. Statement I is false.
Comment: 0.2 = a = β/(1+β) and 0.8 = b = (r-1)β/(1+β). ⇒ r = 5 and β = 0.25.
E[N] = rβ = (5)(0.25) = 1.25, as given.
f(3) = {r(r+1)(r+2)/3!}β3/(1+β)3+r = {(5)(6)(7)/6}.253 /1.258 = 0.0917504.

2016-C-1, Frequency Distributions, §12 Accident Profiles HCM 10/21/15, Page 215
Section 12, Accident Profiles85
Constructing an “Accident Profile” is a technique in Loss Models that can be used to decide whether
data was generated by a member of the (a, b, 0) class of frequency distributions and if so which
member.
As discussed previously, the (a,b,0) class of frequency distributions consists of the three common
distributions: Binomial, Poisson, and Negative Binomial. Therefore, it also includes the Bernoulli,
which is a special case of the Binomial, and the Geometric, which is a special case of the Negative
Binomial. As discussed previously, for members of the (a, b, 0) class:
f(x+1) / f(x) = a + b / (x+1),
where a and b depend on the parameters of the distribution:86
Poisson 0 λ e−λ
Note that a < 0 is a Binomial, a = 0 is a Poisson, and 1 > a > 0 is a Negative Binomial.
For the Binomial: q = a/(a-1) = |a| / ( |a| +1).
For the Negative Binomial: β = a/(1-a).
For the Binomial: m = -(a+b)/a = (a + b)/ |a|.87 The Bernoulli has m =1 and b = -2a.
For the Poisson: λ = b.
For the Negative Binomial: r = 1 + b/a. The Geometric has r =1 and b = 0.
Thus given values of a and b, one can determine which member of the (a,b,0) class one has and its
parameters.
85
See Example 6.2 in Loss Models.
86
See Appendix B of Loss Models.
87
Since for the Binomial m is an integer, we require that b/ |a| to be an integer.
Accident Profile:
Also note that for a member of the (a, b, 0) class, (x+1)f(x+1)/f(x) = (x+1)a + b, so that
(x+1)f(x+1)/f(x) is linear in x. It is a straight line with slope a and intercept a + b.
Thus graphing (x+1)f(x+1)/f(x) can be a useful method of determining whether one of these three
distributions fits the given data.88 If a straight line does seem to fit this “accident profile”, then one
should use a member of the (a, b, 0) class.
The slope determines which of the three distributions is likely to fit: if the slope is close to zero then a
Poisson, if significantly negative then a Binomial, and if significantly positive then a Negative
Binomial.
For example, here is the accident profile for some data:

Number of Observed Observed
Claims Density Function (x+1)f(x+1)/f(x)
0 17,649 0.73932 0.27361
1 4,829 0.20229 0.45807
2 1,106 0.04633 0.62116
3 229 0.00959 0.76856
4 44 0.00184 1.02273
5 9 0.00038 2.66667
6 4 0.00017 1.75000
7 1 0.00004 8.00000
8 1 0.00004
9&+ 0
Prior to the tail where the data thins out,(x+1)f(x+1)/f(x) approximately follows a straight line with a
positive slope of about 0.2, which indicates a Negative Binomial with β/(1+β) ≅ 0.2.89 90
The intercept is rβ/(1+β), so that r ≅ 0.27 / 0.2 ≅ 1.4.91
In general, an accident profile is used to see whether data is likely to have come from a member of
the (a, b, 0) class. One would do this test prior to attempting to fit a Negative Binomial, Poisson, or
Binomial Distribution to the data. One starts with the hypothesis that the data was drawn from a
member of the (a, b, 0) class, without specifying which one. If this hypothesis is true the accident
profile should be approximately linear.92
88
This computation is performed using the empirical densities.
89
One should not significantly rely on those ratios involving few observations.
90
Slope is: a = β/(1+β).
91
Intercept is: a + b = β/(1+β) + (r-1)β/(1+β) = rβ/(1+β).
92
Approximate, because any finite sample of data is subject to random fluctuations.
If the accident profile is “approximately” linear, then we do not reject the hypothesis and decide
which member of the (a, b, 0) to fit based on the slope of this line.93
Comparing the Mean to the Variance:
Another way to decide which of the members of the (a,b,0) class is most likely to fit a given set of
data is to compare the sample mean and sample variance.
Binomial Mean > Variance
Poisson Mean = Variance94
Negative Binomial Mean < Variance
93
There is not a numerical statistical test to perform, such as with the Chi-Square Test.
94
For data from a Poisson Distribution, the sample mean and sample variance will be approximately equal rather than
equal, because any finite sample of data is subject to random fluctuations.
Problems:
12.1 (2 points) You are given the following accident data:

Number of accidents Number of policies
0 91,304
1 7,586
2 955
3 133
4 18
5 3
6 1
7+ 0
Total 100,000
Which of the following distributions would be the most appropriate model for this data?
(A) Binomial (B) Poisson (C) Negative Binomial, r ≤ 1
(D) Negative Binomial, r > 1 (E) None of A, B, C, or D

0 860
1 2057
2 2506
3 2231
4 1279
5 643
6 276
7 101
8 41
9 4
10 2
11&+ 0
Total 10,000
(A) Binomial
(B) Poisson
(C) Negative Binomial, r ≤ 1
(D) Negative Binomial, r > 1
(E) None of the above
0 518,288
1 105,070
2 47,936
3 21,673
4 9736
5 4033
6 1689
7 639
8 274
9 107
10 36
11 25
12 5
13 7
14 1
15 0
16 1
Total 709,460
(A) Binomial (B) Poisson
(C) Negative Binomial, r ≤ 1 (D) Negative Binomial, r > 1

0 820
1 1375
2 2231
3 1919
4 1397
5 1002
6 681
7 330
8 172
9 56
10 14
11 3
12&+ 0
Total 10,000
12.5 (2 points) You are given the following distribution of the number of claims per policy during a
one-year period for 20,000 policies.
Number of claims per policy Number of Policies
0 6503
1 8199
2 4094
3 1073
4 128
5 3
6+ 0
12.6 (2 points) You are given the following distribution of the number of claims on motor vehicle
policies:
Number of claims in a year Observed frequency
0 565,664
1 68,714
2 5,177
3 365
4 24
5 6
6 0
12.7 (4, 5/00, Q.40) (2.5 points)

You are given the following accident data from 1000 insurance policies:
0 100
1 267
2 311
3 208
4 87
5 23
6 4
7+ 0
Total 1000
(A) Binomial
(B) Poisson
(C) Negative Binomial
(D) Normal
(E) Gamma
12.8 (4, 11/03, Q.32 & 2009 Sample Q.25) (2.5 points)
The distribution of accidents for 84 randomly selected policies is as follows:
Number of Accidents Number of Policies
0 32
1 26
2 12
3 7
4 4
5 2
6 1
Total 84
Which of the following models best represents these data?
(A) Negative binomial
(B) Discrete uniform
(C) Poisson
(D) Binomial
(E) Either Poisson or Binomial
12.1. C. Calculate (x+1)f(x+1)/f(x). Since it is approximately linear, we seem to have a member of

the (a, b, 0) class. f(x+1)/f(x) = a + b/(x+1), so (x+1)f(x+1)/f(x) = a(x+1) + b =
ax + a + b. The slope is positive, so a > 0 and we have a Negative Binomial.
The slope, a ≅ 0.17. The intercept is about 0.08. Thus a + b ≅ 0.08.
Therefore, b ≅ 0.08 - 0.17 = -0.09 < 0.

For the Negative Binomial b = (r-1)β/(1+β). Thus b < 0, implies r < 1.
Observed
Number of Density (x+1)f(x+1)/f(x) Differences
Accident Observed Function
0 91,304 0.91304 0.083
1 7,586 0.07586 0.252 0.169
2 955 0.00955 0.418 0.166
3 133 0.00133 0.541 0.124
4 18 0.00018 0.833 0.292
5 3 0.00003 2.000
6 1 0.00001
7+ 0 0.00000
Comment: Similar to 4, 5/00, Q.40. Do not put much weight on the values of (x+1)f(x+1)/f(x) in the
righthand tail, which can be greatly affected by random fluctuation.
The first moment is 0.09988, and the second moment is 0.13002.
The variance is: 0.13002 - 0.099882 = 0.12004, significantly greater than the mean.
Thus, if we have a member of the (a, b, 0) class, it is a Negative Binomial.
Fitting regressions is not on the syllabus of this exam; thus you should not be asked to do so on
your exam. If one fit a regression to all of the points:
{0, 0.083}, {1, 0.252}, {2, 0.418}, {3, 0.541}, {4, 0.833}, {5, 2.000},
one gets a slope of 0.32 and an intercept of -0.13.
If one fits a regression to all but the last point, which is based on very little data:
{0, 0.083}, {1, 0.252}, {2, 0.418}, {3, 0.541}, {4, 0.833},
one gets a slope of 0.179 and an intercept of -0.068.
This is similar to what I discuss in my solution.
12.2. B. Calculate (x+1)f(x+1)/f(x). Since it is approximately linear, we seem to have a member of

the (a, b, 0) class. f(x+1)/f(x) = a + b/(x+1), so (x+1)f(x+1)/f(x) = a(x+1) + b =
ax + a + b. The slope seems close to zero, until the data starts to get thin, so a ≅ 0 and therefore we
assume this data probably came from a Poisson.
Observed
Number of Density (x+1)f(x+1)/f(x)
0 860 0.0860 2.392
1 2,057 0.2057 2.437
2 2,506 0.2506 2.671
3 2,231 0.2231 2.293
4 1,279 0.1279 2.514
5 643 0.0643 2.575
6 276 0.0276 2.562
7 101 0.0101 3.248
8 41 0.0041 0.878
9 4 0.0004 5.000
10 2 0.0002
Comment: Any actual data set is subject to random fluctuation, and therefore the observed slope of
the accident profile will never be exactly zero. One can never distinguish between the possibility
that the model was a Binomial with q small, a Poisson, or a Negative Binomial with β small.
This data was simulated as 10,000 independent random draws from a Poisson with λ = 2.5.
On the exam they should give you something that is either obviously linear or obviously not linear.
On the exam you will not be required to fit a straight line or perform a statistical test to see how good
a linear fit is. In contrast this question is a realistic situation. One either graphs or just eyeballs the
numbers in the last column not worrying too much about the last few numbers which are based on
very little data. Here, the values look approximately linear, in fact they seem approximately flat.
(x+1) f(x+1) / f(x)
3.2
3.0
2.8
2.6
2.4
x
0 1 2 3 4 5 6 7
12.3. E. Calculate (x+1)f(x+1)/f(x).

Note that f(x+1)/f(x) = (number with x + 1)/(number with x).
Since (x+1)f(x+1)/f(x) is not linear, we do not have a member of the (a, b, 0) class.
Number of (x+1)f(x+1)/f(x) Differences
runs Observed
0 518,228 0.203
1 105,070 0.912 0.710
2 47,936 1.356 0.444
3 21,673 1.797 0.441
4 9,736 2.071 0.274
5 4,033 2.513 0.442
6 1,689 2.648 0.136
7 639 3.430 0.782
8 274 3.515 0.084
9 107 3.364 -0.150
10 36 7.639 4.274
11 25
Accident Profile
0
x
0 1 2 3 4 5 6 7 8 9 10
Comment: At high numbers of runs, where the data starts to thin out, one would not put much
reliance on the values of (x+1)f(x+1)/f(x). The data is taken from “An Analytic Model for Per-inning
Scoring Distributions,” by Keith Woolner.
12.4. E. Calculate (x+1)f(x+1)/f(x). Since it is does not appear to be linear, we do not seem to
have a member of the (a, b, 0) class.
Observed
Number of Density (x+1)f(x+1)/f(x)
0 820 0.0820 1.677
1 1,375 0.1375 3.245
2 2,231 0.2232 2.580
3 1,919 0.1920 2.912
4 1,397 0.1397 3.586
5 1,002 0.1002 4.078
6 681 0.0681 3.392
7 330 0.0330 4.170
8 172 0.0172 2.930
9 56 0.0056 2.500
10 14 0.0014 2.357
11 3 0.0003
12.5. A. Calculate (x+1)f(x+1)/f(x) = (x+1)(number with x + 1)/(number with x).

Number of (x+1)f(x+1)/f(x) Differences
claims Observed
0 6,503 1.261
1 8,199 0.999 -0.262
2 4,094 0.786 -0.212
3 1,073 0.477 -0.309
4 128 0.117 -0.360
5 3
Since (x+1)f(x+1)/f(x) is approximately linear, we probably have a member of the (a, b, 0) class.
a = slope < 0. ⇒ Binomial Distribution.
Comment: The data was simulated from a Binomial Distribution with m = 5 and q = 0.2.
The sample mean is: 20,133/20,000 = 1.00665.
The second moment is: 36,355/20,000 = 1.81775.
The sample variance is: (20,000/19,999) (1.81755 - 1.006652 ) = 0.804.
Since the sample mean is greater than the sample variance by a significant amount, if this is a
member of the (a, b, 0) class then it is a Binomial Distribution.
12.6. E. Calculate (x+1)f(x+1)/f(x).

Note that f(x+1)/f(x) = (number with x + 1)/(number with x).
Number of claims Observed (x+1)f(x+1)/f(x) Differences
0 565,664 0.121
1 68,714 0.151 0.029
2 5,177 0.212 0.061
3 365 0.263 0.052
4 24 1.250 0.987
5 6
Even ignoring the final value, (x+1)f(x+1)/f(x) is not linear.
Therefore, we do not have a member of the (a, b, 0) class.
Comment: Data taken from Table 6.6.2 in Introductory Statistics with Applications in General
Insurance by Hossack, Pollard and Zehnwirth. See also Table 7.1 in Loss Models.
12.7. A. Calculate (x+1)f(x+1)/f(x). Since it seems to be decreasing linearly, we seem to have a

member of the (a, b, 0) class, with a < 0, which is a Binomial Distribution.
Accident Density Function (x+1)f(x+1)/f(x)
0 100 0.10000 2.67
1 267 0.26700 2.33
2 311 0.31100 2.01
3 208 0.20800 1.67
4 87 0.08700 1.32
5 23 0.02300 1.04
6 4 0.00400
7+ 0 0.00000
Alternately, the mean is 2, and the second moment is 5.494. Therefore, the sample variance is
(1000/999)(5.494 - 22 ) = 1.495. Since the variance is significantly less than the mean, this indicates
a Binomial Distribution.
Comment: One would not use a continuous distribution such as the Normal or the Gamma to model
a frequency distribution. (x+1)f(x+1)/f(x) = a(x+1) + b. In this case, a ≅ -0.33.
For the Binomial, a = -q/ (1-q), so q ≅ 0.25. In this case, b ≅ 2.67+ 0.33 = 3.00.
For the Binomial, b = (m+1)q/ (1-q), so m ≅ (3/.33) -1 = 8.

12.8. A. Calculate (x+1)f(x+1)/f(x). For example, (3)(7/84)/(12/84) = (3)(7)/12 = 1.75.

Number of (x+1)f(x+1)/f(x)
Accident Observed
0 32 0.81
1 26 0.92
2 12 1.75
3 7 2.29
4 4 2.50
5 2 3.00
6 1
Since this quantity seems to be increasing roughly linearly, we seem to have a member of the
(a, b, 0) class, with a = slope > 0, which is a Negative Binomial Distribution.
Alternately, the mean is: 103/84 = 1.226, and the second moment is: 287/84 = 3.417.
The sample variance is: (84/83)(3.417 - 1.2262 ) = 1.937. Since the sample variance is significantly
more than the sample mean, this indicates a Negative Binomial.
Comment: If (x+1)f(x+1)/f(x) had been approximately linear with a slope that was close to zero,
then one could not distinguish between the possibility that the model was a Binomial with q small, a
Poisson, or a Negative Binomial with β small. If the correct model were the discrete uniform, then we
would expect the observed number of policies to be similar for each number of accidents.
2016-C-1, Frequency Distributions, §13 Zero-Truncated HCM 10/21/15, Page 229
Section 13, Zero-Truncated Distributions95
Frequency distributions can be constructed that have support on the positive integers or
equivalently have a density at zero of 0.
For example, let f(x) = (e-3 3x / x!) / (1 - e-3), for x = 1, 2, 3, ...
x 1 2 3 4 5 6 7
f(x) 15.719% 23.578% 23.578% 17.684% 10.610% 5.305% 2.274%
F(x) 15.719% 39.297% 62.875% 80.558% 91.169% 96.4736% 98.74718%
Exercise: Verify that the sum of f(x) = (e-3 3x / x!) / (1 - e-3) from x =1 to ∞ is unity.
∞
[Solution: The sum of the Poisson Distribution from 0 to ∞ is 1. ∑ e−3 3x / x! = 1.
i= 0
∞ ∞
Therefore, ∑ e−3 3x / x! =1- e-3. ⇒ ∑ f(x) = 1.]
i= 1 i= 1
This is an example of a Poisson Distribution Truncated from Below at Zero, with λ = 3.
In general, if f is a distribution on 0, 1, 2, 3,...,

f(k)
then pTk = 1 - f(0) is a distribution on 1, 2, 3, ...
This is a special case of truncation from below. The general concept of truncation of a distribution is
covered in a “Mahlerʼs Guide to Loss Distributions.”
We have the following three examples, shown in Appendix B.3.1 of Loss Models:
Distribution Density of the Zero-Truncated Distribution

m! qx (1- q)m - x
x! (m - x)!
Binomial x = 1, 2, 3,... , m
1 - (1- q)m
e- λ λx / x!
Poisson x = 1, 2, 3,...
1 - e- λ
r(r +1)...(r + x - 1) βx
x! (1+ β)x + r
Negative Binomial x = 1, 2, 3,...
1 - 1/ (1+ β)r
95
See Section 6.6 in Loss Models.
Moments:
Exercise: For a Zero-Truncated Poisson with λ = 3, what is the mean?

f(k)
[Solution: pT
k = , k = 1, 2, 3,. The mean of the zero-truncated distribution is:
1 - f(0)
∞
∞ ∞ ∑ k f(k)
∑ ∑ λ
f(k) mean of f 3
k pT = k = k=0 = = = = 3.157.]
k 1 - f(0) 1 - f(0) 1 - f(0) 1 - e- λ 1 - e- 3
k=1 k=1
In general, the moments of a zero-truncated distribution are given in terms of those of the
E [Xn]
corresponding untruncated distribution, f, by: ETruncated[Xn ] = f .
1 - f(0)
For example for the Zero-Truncated Poisson the mean is: λ / (1 - e−λ),
while the second moment is: (λ + λ2) / (1 - e−λ).
Exercise: For a Zero-Truncated Poisson with λ = 3 what is the second moment?

[Solution: The second moment of untruncated Poisson is its variance plus the square of its mean:
λ + λ2. The second moment of the zero-truncated Poisson is:
(the second moment of f) / {1 - f(0)} = (λ + λ2) / (1 - e−λ) = (3 + 32 ) / ( 1 - e-3) =12.629.]
Thus a Zero-Truncated Poisson with λ = 3 has a variance of 12.629 - 3.1572 = 2.66.

This matches the result of using the formula in Appendix B of Loss Models:
λ{1 - (λ+1)e−λ} / (1 - e−λ)2 = (3){1 - 4e-3} / (1 - e-3)2 = (3)(0.8009)/(0.9502)2 = 2.66.
It turns out that for the Zero-Truncated Negative Binomial, the parameter r can take on values
between -1 and 0, as well as the usual positive values, r > 0.
This is sometimes referred to as the Extended Zero-Truncated Negative Binomial.
However, provided r ≠ 0, all of the same formulas apply.
As r approaches zero, the Zero-Truncated Negative Binomial approaches

the Logarithmic Distribution, to be discussed next.
Logarithmic Distribution:96
The Logarithmic Distribution with parameter β has support equal to the positive integers:
⎛ β ⎞x
⎜ ⎟
⎝ 1+ β ⎠
f(x) = , for x = 1, 2, 3,...
x ln(1+ β)
β
1 + β -
β ln(1+β)
with mean: , and variance: β .
ln(1+β) ln(1+β)
ln[1 - β(z - 1)]

a = β / (1 +β). b = -β / (1 +β) P(z) = 1 - , z < 1 + 1/β.
ln[1+ β]
Exercise: Assume the number of vehicles involved in each automobile accident is given by
f(x) = 0.2x / {x ln(1.25)}, for x = 1, 2, 3,...
Then what is the mean number of vehicles involved per automobile accident?
[Solution: This is a Logarithmic Distribution with β = 0.25. Mean β/ln(1+β) = 0.25/ ln(1.25) = 1.12.
Comment: β / (1 + β) = 0.25/1.25 = 0.2.]
The density function of this Logarithmic Distribution with β = 0.25 is as follows:

x 1 2 3 4 5 6 7
f(x) 89.6284% 8.9628% 1.1950% 0.1793% 0.0287% 0.0048% 0.0008%
F(x) 89.628% 98.591% 99.786% 99.966% 99.994% 99.9990% 99.9998%
Exercise: Show that the densities of a Logarithmic Distribution sum to one.

∞
∑ k , for |y| < 1.

yk 97
Hint: ln[1/ (1 - y)] =
k=1
∞ ∞
⎛ β ⎞k
∑ ∑
1
[Solution: f(k) = ⎜ ⎟ / k.
ln(1+ β) ⎝ 1+β ⎠
k=1 k=1
Let y = β/(1+β). Then, 1/(1 - y) = 1 + β.

∞ ∞
∑ ∑k
1 yk ln[1/ (1 - y)] ln(1+ β)
Thus f(k) = = = = 1. ]
ln(1+ β) ln(1+ β) ln(1+ β)
k=1 k=1
96
Sometimes called instead a Log Series Distribution.
97
Not something you need to know for your exam. This result can be derived as a Taylor series.
(a,b,1) Class:98
The (a,b,1) class of frequency distributions in Loss Models is a generalization of the (a,b,0)
density at x + 1 b
class. As with the (a,b,0) class, the recursion formula applies: =a+ .
density at x x + 1
However, this relationship need only apply now for x ≥ 1, rather than x ≥ 0.
Members of the (a,b,1) family include: all the members of the (a,b,0) family,99 the zero-truncated
versions of those distributions: Zero-Truncated Binomial, Zero-Truncated Poisson, and
Extended Truncated Negative Binomial,100 and the Logarithmic Distribution.
In addition the (a,b,1) class includes the zero-modified distributions corresponding to these, to be
discussed in the next section.
Loss Models Notation:
p k the density function of the untruncated frequency distribution at k.
pT
k the density function of the zero-truncated frequency distribution at k.
pM
k the density function of the zero-modified frequency distribution at k.
101
Exercise: Give a verbal description of the following terms: p7 , pM T

4 , and p 6 .
[Solution: p7 is the density of the frequency at 7, f(7).
pM
4 is the density of the zero-modified frequency at 4, fM(4).
pT
6 is the density of the zero-truncated frequency at 6, fT(6).]
98
See Table 6.4 and Appendix B.3 in Loss Models.
99
Binomial, Poisson, and the Negative Binomial.
100
The Zero-Truncated Negative Binomial where in addition to r > 0, -1 < r < 0 is also allowed.
101
Zero-modified distributions will be discussed in the next section.
The Probability Generating Function, P(z) = E[zN], for a zero-truncated distribution can be obtained
from that for the untruncated distribution.
P(z) - f(0)
PT(z) = ,
1 - f(0)
where P(z) is the p.g.f. for the untruncated distribution and PT(z) is the p.g.f. for the zero-truncated
distribution, and f(0) is the probability at zero for the untruncated distribution.
Exercise: What is the Probability Generating Function for a Zero-Truncated Poisson Distribution?
[Solution: For the untruncated Poisson P(z) = eλ(z-1). f(0) = e−λ.
PT(z) = {P(z) - f(0)} / {1 - f(0)} = {eλ(z-1) - e−λ} / {1- e−λ} = {eλz - 1} / {eλ - 1 }.]
One can derive this relationship as follows:

∞ ∞
∞ ∑ zn f(n) ∑ zn f(n) - f(0)

∑ zn pTn n=1 n=0 P(z) - f(0)
PT(z) = = = = .
1 - f(0) 1 - f(0) 1 - f(0)
n=1
In any case, Appendix B of Loss Models displays the Probability Generating Functions for all of the
Zero-Truncated Distributions.
For example, for the zero-truncated Geometric Distribution, in Appendix B it is shown that:
{1 - β(z -1)}- 1 - (1+ β)- 1
PT(z) = .
1 - (1+ β)- 1
This can be simplified:

{1 - β(z -1)}- 1 - (1+ β)- 1 (1+ β) / (1 + β - βz) - 1 (1+ β) - (1 + β - βz) z
= = = .
1 - (1+ β)- 1 (1+β) - 1 β (1 + β - βz) 1 + β - βz
Logarithmic Distribution as a Limit of Zero-Truncated Negative Binomial Distributions:
Exercise: Show that the limit as r → 0 of Zero-Truncated Negative Binomial Distributions with the
other parameter β fixed, is a Logarithmic Distribution with parameter β.
[Solution: For the Zero-Truncated Negative Binomial Distribution:
r(r +1)...(r + k -1) βk
k! (1+ β) k + r r(r +1)...(r + k -1) {β / (1+ β)} k
pT = = .
k 1 - 1/ (1+ β)r k! (1+ β) r - 1
{β / (1+ β)} k r {β / (1+ β)} k r

lim pT
k = lim (r+1)...(r+k-1) r - 1 = (k-1)! lim .
r→0 k! r→0 (1+ β) k! r → 0 (1+ β) r - 1
r 1 1
Using LʼHospitalʼs Rule, lim r = lim r = .
r → 0 (1+ β) - 1 r → 0 ln[1+β] (1+β) ln[1+β]
⎛ β ⎞k
⎜ ⎟
T {β / (1+ β)} k 1 ⎝ 1+ β⎠
Thus, lim pk = = .
r→0 k ln[1+β] k ln(1+ β)
This is the density of a Logarithmic Distribution.

Alternately, as shown in Appendix B of Loss Models,
the p.g.f. of a Zero-Truncated Negative Binomial Distribution is:
{1 - β(z -1)} - r - (1+β) - r
PT(z) = .
1 - (1+ β)- r
{1 - β(z -1)} - r - (1+β) - r

lim PT(z) = lim .
r→0 r→0 1 - (1+ β)- r
-r -r
- ln[1- β(z -1)] {1 - β(z - 1)} + ln[1+ β] (1 + β)
Using LʼHospitalʼs Rule, lim PT(z) = lim
r→0 r→0 ln[1+ β] (1 + β)- r
ln[1+β] - ln[1-β(z -1)] ln[1-β(z -1)]

= =1- .
ln[1+ β] ln[1+ β]
As shown in Appendix B of Loss Models, this is the p.g.f. of a Logarithmic Distribution.]
Problems:
13.1 (1 point) The number of persons injured in an accident is assumed to follow a

Zero -Truncated Poisson Distribution with parameter λ = 0.3.
Given an accident, what is the probability that exactly 3 persons were injured in it?
A. Less than 1.0%
B. At least 1.0% but less than 1.5%
C. At least 1.5% but less than 2.0%
D. At least 2.0% but less than 2.5%
E. At least 2.5%
Use the following information for the next four questions:
The number of vehicles involved in an automobile accident is given by a Zero-Truncated Binomial

Distribution with parameters q = 0.3 and m = 5.
13.2 (1 point) What is the mean number of vehicles involved in an accident?

A. less than 1.8
E. at least 2.1
13.3 (2 points) What is the variance of the number of vehicles involved in an accident?
A. less than 0.5
E. at least 0.8
13.4 (1 point) What is the chance of observing exactly 3 vehicles involved in an accident?
A. less than 11%
E. at least 17%
13.5 (2 points) What is the median number of vehicles involved in an accident??

A. 1 B. 2 C. 3 D. 4 E. 5

The number of family members is given by a Zero-Truncated Negative Binomial Distribution with
parameters r = 4 and β = 0.5.
13.6 (1 point) What is the mean number of family members?

A. less than 2.0
E. at least 2.3
13.7 (2 points) What is the variance of the number of family members?

A. less than 2.0
E. at least 2.6
13.8 (2 points) What is the chance of a family having 7 members?

A. less than 1.1%
E. at least 1.7%
13.9 (3 points) What is the probability of a family having more than 5 members?
A. less than 1%
B. at least 1%, but less than 3%
C. at least 3%, but less than 5%
D. at least 5%, but less than 7%
E. at least 7%
13.10 (1 point) What is the probability generating function?


A Logarithmic Distribution with parameter β = 2.

A. less than 2.0
E. at least 2.3
13.12 (2 points) What is the variance?

A. less than 2.0
E. at least 2.6
13.13 (1 point) What is the density function at 6?

A. less than 1.1%
E. at least 1.7%
13.14 (1 point) For a Zero-Truncated Negative Binomial Distribution with parameters r = -0.6
and β = 3, what is the density function at 5?
A. less than 1.1%
E. at least 1.7%

Shoeless Joe is a baseball player.
The number of games until Joe goes hitless is a zero-truncated Geometric Distribution with
parameter β = 4.
13.15 (1 point) What is the mean number of games until Joe goes hitless?
A. 3 B. 4 C. 5 D. 6 E. 7
13.16 (1 point) What is the variance of the number of games until Joe goes hitless?
A. 8 B. 12 C. 16 D. 20 E. 24
13.17 (1 point) What is the probability that it is exactly 6 games until Joe goes hitless?
A. less than 3%
E. at least 6%
13.18 (2 points) What is the probability that it is more than 6 games until Joe goes hitless?
A. 26% B. 27% C. 28% D. 29% E. 30%
13.19 (3 points) You are given the following information from a survey of the number of persons
per household in the United States:
Number of Persons Number of Households
1 29,181
2 35,569
3 17,314
4 15,828
5 7,003
6 2,552
7 or more 1,425
Determine whether or not this data seems to be drawn from a member of the (a, b, 1) class.
13.20 (3 points) If N follows a zero-truncated Poisson Distribution, demonstrate that:

1 1 1
E[ ]= - .
N+ 1 λ eλ - 1

The number of days per hospital stay is given by a Zero-Truncated Poisson Distribution with
parameter λ = 2.5.
13.21 (1 point) What is the mean number of days per hospital stay?
A. less than 2.5
E. at least 2.8
13.22 (2 points) What is the variance of the number of days per hospital stay?
A. less than 2.2
E. at least 2.5
13.23 (1 point) What is the chance that a hospital stay is 6 days?

A. less than 3%
E. at least 6%
13.24 (2 points) What is the chance that a hospital stay is fewer than 4 days?
A. less than 50%
E. at least 80%
13.25 (2 points) What is the mode of this frequency distribution?

A. 1 B. 2 C. 3 D. 4 E. 5
13.26 (4 points) Let X follow an Exponential with mean θ.

Let Y be the minimum of a random sample from X of size k.
However, K in turn follows a Logarithmic Distribution with parameter β.
What is the distribution function of Y?

• Harvey Wallbanker, the Automatic Teller Machine, works 24 hours a day, seven days a week,
without a vacation or even an occasional day off.
• Harvey services on average one customer every 10 minutes.
• 60% of Harveyʼs customers are male and 40% are female.
• The gender of a customer is independent of the gender of the previous customers.
• Harveyʼs hobby is to observe patterns of customers. For example, FMF denotes a female
customer, followed by a male customer, followed by a female customer.
Harvey starts looking at customers who arrive after serving Pat, his most recent customer.
How long does it take on average until he sees the following patterns?
13.27 (2 points) How long on average until Harvey sees “M”?
13.28 (2 points) How long on average until Harvey sees “F”?
13.29 (1 point) X and Y are independently, identically distributed

Zero-Truncated Poisson Distributions, each with λ = 3.
What is the probability generating function of their sum?
13.30 (3 points) Let X follow an Exponential with mean θ.

Let Y be the minimum of a random sample from X of size k.
However, K in turn follows a Zero-Truncated Geometric Distribution with parameter β.
What is the mean of Y?
Hint: The densities of a Logarithmic Distribution sum to one.
A. θ / (1 + β) B. (θ/β) ln[1 + β] C. θ / (1 + ln[1 + β]) D. θ (1 + β)
13.31 (5 points) At the Hyperion Hotel, the number of days a guest stays is distributed via
a zero-truncated Poisson with λ = 4.
On the day they check out, each guest leaves a tip for the maid equal to $3 per day of their stay.
The guest in room 666 is checking out today. What is the expected value of the tip?
13.32 (5 points) The Krusty Burger Restaurant has started a new sales promotion.
With the purchase of each meal they give the customer a coupon.
There are ten different coupons, each with the face of a different famous resident of Springfield.
A customer is equally likely to get each type of coupon, independent of the other coupons he has
gotten in the past.
Once you get one coupon of each type, you can turn in your 10 different coupons for a free meal.
(a) Assuming a customer saves his coupons, and does not trade with anyone else, what is the
mean number of meals he must buy until he gets a free meal?
(b) What is the variance of the number of meals until he gets a free meal?
13.33 (2 points) For a member of the (a, b, 1) class, you are given:
p 21 = 0.04532.
p 22 = 0.02987.
p 23 = 0.01818.
Determine p24.
A. 0.9% B. 1.0% C. 1.1% D. 1.2% E. 1.3%
13.34 (3 points) You are given the following information from a survey of the number of rooms per
year-round housing units in the United States:
Number of Rooms Number of Housing Units
1 556
2 1,292
3 10,319
4 21,599
5 27,687
6 24,810
7 or more 34,269
Determine whether or not this data seems to be drawn from a member of the (a, b, 1) class.
13.35 (Course 151 Sample Exam #1, Q.12) (1.7 points)

A new business has initial capital 700 and will have annual net earnings of 1000.
It faces the risk of a one time loss with the following characteristics:
• The loss occurs at the end of the year.
• The year of the loss is one plus a Geometric distribution with β = 0.538.
(So the loss may either occur at the end of the first year, second year, etc.)
• The size of the loss is uniformly distributed on the ten integers:
500,1000,1500, ..., 5000.
Determine the probability of ruin.
(A) 0.00 (B) 0.41 (C) 0.46 (D) 0.60 (E) 0.65
13.1. B. Let f(x) be the density of a Poisson Distribution, then the distribution truncated from below
at zero is: g(x) = f(x) / {1-f(0)}. Thus for θ = 0.3, g(x) = {.3x e-.3 / x!} / {1-e-.3}.
g(3) = {.33 e-.3 / 3!} / {1-e-.3} = 0.00333 / 0.259 = 1.3%.
13.2. B. Mean is that of the non-truncated binomial, divided by 1 - f(0): (.3)(5) / (1-.75 ) = 1.803.
13.3. D. The second moment is that of the non-truncated binomial, divided by 1 - f(0):
(1.05 + 1.52 ) / (1 - 0.75 ) = 3.967. Variance = 3.967 - 1.8032 = 0.716.
Comment: Using the formula in Appendix B of Loss Models:
Variance = mq{(1-q) - (1 - q + mq)(1-q)m} / {1-(1-q)m}2
= (5)(0.3){(0.7 - (0.7 + 1.5)(0.7)5 } / {1-(0.7)5 }2 = (1.5)(0.3303)/0.83192 = 0.716.
13.4. D. For a non-truncated binomial, f(3) = 5!/{(3!)(2!)} .33 .72 = 0.1323. For the zero-truncated
distribution one gets the density by dividing by 1 - f(0): (0.1323) / (1 - 0.75 ) = 15.9%.
13.5. B. For a discrete distribution such as we have here, employ the convention that the median is
the first value at which the distribution function is greater than or equal to 0.5.
F(1) = 0.433 < 50%, F(2) = .804 > 50%, and therefore the median is 2.
Number Untruncated
Binomial
Zero-Truncated Cumulative
of Vehicles CoefficientBinomial
Binomial Zero-Truncated
0 16.81% Binomial
1 36.02% 43.29% 43.29%
2 30.87% 37.11% 80.40%
3 13.23% 15.90% 96.30%
4 2.83% 3.41% 99.71%
5 0.24% 0.29% 100.00%
13.6. E. Mean is that of the non-truncated negative binomial, divided by 1-f(0):

(4)(0.5) / (1 - 1.5-4) = 2 / 0.8025 = 2.49
13.7. D. The second moment is that of the non-truncated negative binomial, divided by 1-f(0):
(3 + 22 ) / (1 - 1.5-4) = 8.723. Variance = 8.723 - 2.4922 = 2.51.
Variance = rβ{(1+β) - (1 + β + rβ)(1+β)-r} / {1-(1+β)-r}2
= (4)(0.5){(1.5 - (1 + 0.5 + 2)(1.5-4)} / (1 - 1.5-4)2 = (2)(0.8086)/0.80252 = 2.51.
The non-truncated negative binomial has mean = rβ = 2, and variance = rβ(1+β) = 3,
and thus a second moment of: 3 + 22 = 7.
13.8. C. For the non-truncated negative binomial,

f(7) = (4)(5)(6)(7)(8)(9)(10) .57 /((7!)(1.5)11) = 1.08%. For the zero-truncated distribution one gets
the density by dividing by 1-f(0): (1.08%) / (1 - 1.5-4) = 1.35%.
13.9. D. The chance of more than 5 is: 1 - .9471 = 5.29%.

Number Untruncated Zero-Truncated Cumulative
of Members Neg. Binomial Neg. Binomial Zero-Truncated
0 19.75% Neg. Binomial
1 26.34% 32.82% 32.82%
2 21.95% 27.35% 60.17%
3 14.63% 18.23% 78.40%
4 8.54% 10.64% 89.04%
5 4.55% 5.67% 94.71%
6 2.28% 2.84% 97.55%
7 1.08% 1.35% 98.90%
8 0.50% 0.62% 99.52%
9 0.22% 0.28% 99.79%
13.10. As shown in Appendix B of Loss Models,

{1 - β(z -1)}- r - (1+β)- r (1.5 - 0.5z)- 4 - 1/ 1.54 1.54 / (1.5 - 0.5z)4 - 1
P(z) = = = .
1 - (1+ β)- r 1 - 1/ 1.54 1.54 - 1
Alternately, for the Negative Binomial, f(0) = 1/(1+β)r = 1/1.54 ,

and P(z) = {1 - β(z-1)}-r = {1 - (0.5)(z - 1)}-4 = (1.5 - 0.5z)-4.
P(z) - f(0) (1.5 - 0.5z)- 4 - 1/ 1.54 1.54 / (1.5 - 0.5z)4 - 1
PT(z) = = = .
1 - f(0) 1 - 1/ 1.54 1.54 - 1
Comment: This probability generating function only exists for z < 1 + 1/β = 1 + 1/0.5 = 3.
13.11. A. Mean of the logarithmic distribution is: β/ln(1+β) = 2 / ln(3) = 1.82.

13.12. B. Variance of the logarithmic distribution is: β{1 + β − β/ln(1+β)}/ln(1+β) =

2{3 - 1.82}/ ln(3) = 2.15.
13.13. C. For the logarithmic distribution, f(x) = {β/ (1+β)}x / {x ln(1+β)}

f(6) = (2/3)6 / {6 ln(3)} = 1.33%.
13.14. A. For the zero-truncated Negative Binomial Distribution,

f(5) = r(r+1)(r+2)(r+3)(r+4) (β/(1+β))x /{(5!)((1+β)r -1)} =
(-.6)(.4)(1.4)(2.4)(3.4)(3/4)5 / {(120)(4-.6 -1) = (-2.742)(.2373) / (120)(-.5647) = .96%.
Comment: Note this is an extended zero-truncated negative binomial distribution, with
0 > r > -1. The same formulas apply as when r > 0. (As r approaches zero one gets a logarithmic
distribution.) For the untruncated negative binomial distribution we must have r > 0. So in this case
there is no corresponding untruncated distribution.
13.15. C. mean = 1 + β = 5.
13.16. D. variance = β(1 + β) = (4)(5) = 20.
5
6 = β / (1+β) = 4 / 5 = 6.55%.
13.17. E. pT 6 5 6
Comment: Due to the memoryless property of the Geometric, if one were to truncate and shift at 1,
in other words get rid of the the zeros and subtract one from the number of claims, one gets the
same Geometric. Therefore, if N follows a the zero-truncated Geometric with parameter β,
then N -1 follows a Geometric Distribution with mean β.
In this case, the number of games in a row in which Joe gets a hit is Geometric with β = 4.
Prob[6 games until Joe goes hitless] = Prob[Joe hits in exactly 5 games in a row] =
density at 5 of a Geometric with β equals 4 = 45 / 56 = 6.55%.
13.18. A. pT T T 6 7 7 8 8 9 6 7
7 + p 8 + p 9 + ... = 4 / 5 + 4 / 5 + 4 / 5 + ... = (4 / 5 ) / (1 - 4/5) = 26.2%.
Alternately, due to the memoryless property of the Geometric, the number of games in a row in
which Joe gets a hit is Geometric with β = 4.
The probability that Joe gets a hit in at least the next 6 games is:
S(5) = {β/(1+β)}6 = (4/5)6 = 26.2%.
13.19. The result of the accident profile is definitely not linear.

Number of Observed
Number of Households Density (x+1)f(x+1)/f(x)
Persons Observed Function
1 29,181 0.26803 2.438

2 35,569 0.32670 1.460
3 17,314 0.15903 3.657
4 15,828 0.14538 2.212
5 7,003 0.06432 2.186
6 2,552 0.02344
7+ 1,425 0.01309
Thus I conclude that this data is not drawn from a member of the (a, b, 1) class.
Comment: Data is for 2005, taken from Table A-59 of “32 Years of Housing Data”
prepared for U.S. Department of Housing and Urban Development Office of Policy Development
and Research, by Frederick J. Eggers and Alexander Thackeray of Econometrica, Inc.
∞ ∞ ∞
e- λ λn / n! 1 e-λ λn + 1
∑ f(n)/ (n +1) = ∑ ∑
1 1
13.20. E[ ]= =
N+ 1 1 - e- λ n + 1 1 - e - λ λ (n +1)!
n=1 n=1 n=1
∞ ∞
∑ λj
∑ λj
1 1 1 1 1 1 λ 1 1
= = { - 1 - λ} = (e - 1 - λ) = - .
eλ - 1 λ j! eλ - 1 λ j! eλ - 1 λ λ eλ - 1
j=2 j=0
13.21. D. Mean is that of the non-truncated Poisson, divided by 1- f(0):

(2.5) / (1 - e-2.5) = 2.5/0.9179 = 2.724.
Comment: Note that since the probability at zero has been distributed over the positive integers,
the mean is larger for the zero-truncated distribution than for the corresponding untruncated
distribution.
13.22. A. The second moment is that of the non-truncated Poisson, divided by 1 - f(0):
(2.5 + 2.52 ) / (1 - e-2.5) = 9.533. Variance = 9.533 - 2.7242 = 2.11.
Variance = λ{1 - (λ+1)e−λ} /(1-e−λ)2 = (2.5){1 - 3.5e-2.5}/(1 - e-2.5)2 = (2.5)(.7127)/.91792 = 2.11.
13.23. B. For a untruncated Poisson, f(6) = (2.56 )e-2.5/6! = 0.0278. For the zero-truncated
distribution one gets the density by dividing by 1-f(0): (0.0278) / (1-e-2.5) = 3.03%.
13.24. D. One adds up the chances of 1, 2 and 3 days, and gets 73.59%.
Number Untruncated
Binomial
Zero-Truncated Cumulative
of Days CoefficientPoisson
Poisson Zero-Truncated
0 8.21% Poisson
1 20.52% 22.36% 22.36%
2 25.65% 27.95% 50.30%
3 21.38% 23.29% 73.59%
4 13.36% 14.55% 88.14%
5 6.68% 7.28% 95.42%
6 2.78% 3.03% 98.45%
7 0.99% 1.08% 99.54%
8 0.31% 0.34% 99.88%
Comment: By definition, there is no probability of zero items for a zero-truncated distribution.
13.25. B. The mode is where the density function is greatest, 2.

Number Untruncated
Binomial
Zero-Truncated
of Days CoefficientPoisson
Poisson
0 8.21%
1 20.52% 22.36%
2 25.65% 27.95%
3 21.38% 23.29%
4 13.36% 14.55%
Comment: Unless the mode of the untruncated distribution is 0, the mode of the zero-truncated
distribution is the same as that of the untruncated distribution. For example, in this case all the
densities on the positive integers are increased by the same factor 1/(1 - 0.0821). Thus since the
density at 2 was largest prior to truncation, it remains the largest after truncation at zero.
13.26. Assuming a sample of size k, then

Prob[Min > y | k] = Prob[all elements of the sample > y] = (e-y/θ)k = e-yk/θ.
Let pk be the Logarithmic density.
∞ ∞
Prob[Min > y] = ∑ Prob[Min > y | k] pk = ∑ (e- y / θ )k pk = EK[(e-y/θ)k].
k=1 k=1
However, the P.G.F. of a frequency distribution is defined as E[zk].

ln[1 - β (z - 1)]
For the Logarithmic Distribution, P(z) = 1 - .
ln(1+β)
Therefore, taking z = e-y/θ,

ln[1 - β (e- y / θ - 1)]
Prob[Min > y] = 1 - .
ln(1+ β)
Thus Prob[Min ≤ y], in other words the distribution function is:

ln[1 - β (e- y / θ - 1)]
F(y) = , y > 0.
ln(1+ β)
Comment: The distribution of Y is called an Exponential-Logarithmic Distribution.

If one lets p = 1/(1+β), then one can show that F(y) = 1 - ln[1 - (1-p) e-y/θ] / ln(p).
As β approaches 0, in other words as p approaches 1, the distribution of Y approaches an
Exponential Distribution.
The Exponential-Logarithmic Distribution has a declining hazard rate.
In general, if S(x) is the survival function of severity, Y is the minimum of a random sample from X of
size k, and K in turn follows a frequency distribution with support k ≥ 1 and Probability Generating
Function P(z), then F(y) = 1 - P(S(y)).
13.27. The number of customers he has to wait is a Zero-Truncated Geometric Distribution with
β = chance of failure / chance of success = (1 - 0.6)/0.6 = 1/0.6 - 1.
So the mean number of customers is 1/0.6 = 1.67. ⇒ 16.7 minutes on average.

β
Comment: The mean of the Zero-Truncated Geometric Distribution is: = 1 + β.
1 - 1/ (1+β)
13.28. The number of customers he has to wait is a Zero-Truncated Geometric Distribution with
β = chance of failure / chance of success = (1 - 0.4)/0.4 = 1/0.4 - 1.
So the mean number of customers is 1/0.4 = 2.5. ⇒ 25 minutes on average.

Comment: Longer patterns can be handled via Markov Chain ideas not on the syllabus.
See Example 4.20 in Probability Models by Ross.
13.29. As shown in Appendix B of Loss Models, for the zero-truncated Poisson:

eλz - 1 e3z - 1
P(z) = = 3 .
eλ - 1 e - 1
The p.g.f. for the sum of two independently, identically distributed variables is: P(z) P(z) = P(z)2 :
⎛ e3z - 1⎞ 2
.
⎝ e3 - 1 ⎠
Comment: The sum of two zero-truncated distributions has a minimum of two events.
Therefore, the sum of two zero-truncated Poissons is not a zero-truncated Poisson.
13.30. B. Prob[Min > y | k] = Prob[all elements of the sample > y] = (e-y/θ)k = e-yk/θ.
Thus the minimum from a sample of size k, follows an Exponential Distribution with mean θ/k.
Therefore, E[Y] = E[E[Y|k]] = E[θ/k] = θ E[1/k].
For a Zero-Truncated Geometric, pk = βk-1 / (1+β)k, for k = 1, 2, 3,...
∞
⎛ β ⎞k
Thus E[1/k] = (1/β) ∑ ⎜ ⎟ / k.
⎝ 1+ β ⎠
k=1
⎛ β ⎞k
⎜ ⎟
⎝ 1+ β⎠
However, for the Logarithmic: pk = , for k = 1, 2, 3,...
k ln(1+ β)
∞
⎛ β ⎞k
Therefore, since these Logarithmic densities sum to one: ∑ ⎜ ⎟ / k = ln(1 +β).
⎝ 1+ β ⎠
k=1
Thus E[1/k] = (1/β) ln[1 +β]. Thus E[Y] = θ E[1/k] = (θ/β) ln[1 + β].
13.31. The probability of a stay of length k is pT

k .
If a stay is of length k, the probability that today is the last day is 1/k.
Therefore, for an occupied room picked at random, the probability that its guest is checking out
∞
today is: ∑ pTk / k .
k=1
The tip for a stay of length k is 3k.

Thus, the expected tip left by the guest checking out of room 666 is:
∞ ∞
∑ 3k pT
k / k 3 ∑ pTk 3
k=1
∞ = ∞ k=1 = ∞ .
∑ pTk / k ∑ pTk / k ∑
pTk / k
k=1 k=1 k=1
∞
For the zero-truncated Poisson, ∑ pTk / k =
k=1
e- λ
(λ + λ2/4 + λ3/18 + λ4/96 + λ5/600 + λ6/4320 + λ7/35,280 + λ8/322,560
1 - e- λ
+ λ 9/3,265,920 + λ10/36,288,000 + λ11/439,084,800 + ...) = 0.330.

Thus, the expected tip left by the guest checking out of room 666 is: 3 / 0.330 = 9.09.
Alternately, the (average) tip per day is 3.
3 = (0)(probability not last day) + (average tip if last day)(probability last day).
3 = (average tip if last day)(0.333).
Therefore, the average tip if it is the last day is: 3 / 0.330 = 9.09.
13.32. (a) After the customer gets his first coupon, there is 9/10 probability that his next coupon is
different. Therefore, the number of meals it takes him to get his next unique coupon after his first is
a zero-truncated Geometric Distribution,
with β = (probability of failure) / (probability of success) = (1/10)/(9/10) = 1/9.
(Alternately, it is one plus a Geometric Distribution with β = 1/9.)
Thus the mean number of meals from the first to the second unique coupon is: 1 + 1/9 = 10/9.
After the customer gets his second unique coupon, there is 8/10 probability his next coupon is
different than those he already has. Therefore, the number of meals it takes him to get his third
unique coupon after his second is a zero-truncated Geometric Distribution,
with β = (probability of failure) / (probability of success) = (2/10)/(8/10) = 2/8.
Thus the mean number of meals from the second to the third unique coupon is: 1 + 2/8 = 10/8.
Similarly, the number of meals it takes him to get his fourth unique coupon after his third is a
zero-truncated Geometric Distribution, with β = 3/7, and mean 10/7.
Proceeding in a similar manner, the means to get the remaining coupons are: 10/6 + ... + 10/1.
Including one meal to get the first coupon, the mean total number of meals is:
(10) (1/10 + 1/9 + 1/8 + 1/7 + 1/6 + 1/5 + 1/4 + 1/3 + 1/2 + 1/1) = 29.29.
(b) It takes one meal to get the first coupon; variance is zero.
The number of additional meals to get the second unique coupon is a zero-truncated Geometric
Distribution, with β = 1/9 and variance: (1/9)(10/9).
Similarly, the variance of the number of meals from the second to the third unique coupon is:
(2/8)(10/8).
The number of meals in intervals between unique coupons are independent, so their variances add.
Thus, the variance of the total number of meals is:
(10) (1/92 + 2/82 + 3/72 + 4/62 + 5/52 + 6/42 + 7/32 + 8/22 + 9/12 ) = 125.69.
Comment: The coupon collectorʼs problem.
13.33. B. p22 / p21 = a + b/22. ⇒ 0.02987 / 0.04532 = 0.65909 = a + b/22.
p 23 / p22 = a + b/23. ⇒ 0.01818 / 0.02987 = 0.60864 = a + b/23.
⇒ 0.05045 = b(1/22 - 1/23). ⇒ b = 25.53. ⇒ a = -0.5014.

p 24 = (a + b/24) p23 = (-0.5014 + 25.53/24) (0.01818) = 0.01022.
13.34. The result of the accident profile is definitely not linear.

Number of Observed
Number of Housing Units Density (x+1)f(x+1)/f(x)
Rooms Observed Function
1 556 0.00461 4.647
2 1,292 0.01072 23.961
3 10,319 0.08561 8.373
4 21,599 0.17920 6.409
5 27,687 0.22971 5.377
6 24,810 0.20584
7+ 34,269 0.28431
Thus I conclude that this data is not drawn from a member of the (a, b, 1) class.
Comment: Data is for 2005, taken from Table A-19 of “32 Years of Housing Data”
prepared for U.S. Department of Housing and Urban Development Office of Policy Development
and Research, by Frederick J. Eggers and Alexander Thackeray of Econometrica, Inc.
13.35. D. At the end of year one the business has 1700. Thus, if the loss occurs at the end of year
one, there is ruin if the size of loss is > 1700, a 70% chance. Similarly, at the end of year 2, if the loss
did not occur in year 1, the business has 2700. Thus, if the loss occurs at the end of year two there
is ruin if the size of loss is > 2700, a 50% chance.
If the loss occurs at the end of year three there is ruin if the size of loss is > 3700, a 30% chance.
If the loss occurs at the end of year four there is ruin if the size of loss is > 4700, a 10% chance.
If the loss occurs in year 5 or later there is no chance of ruin.
The probability of the loss being in year n is: (1/(1+β))(β/(1+β))n-1 = .65(.35n-1).
A B C D
Probability of Loss Probability of Ruin if Loss Column B
Year in this year Occurs in this year times Column C
1 0.6500 0.7 0.4550
2 0.2275 0.5 0.1138
3 0.0796 0.3 0.0239
4 0.0279 0.1 0.0028
5 0.0098 0 0.0000
0.5954
Alternately, if the loss is of size 500, 1000, or 1500 there is not ruin. If the loss is of size 2000 or
2500, then there is ruin if the loss occurs in year 1. If the loss is of size 3000 or 3500, then there is
ruin if the loss occurs by year 2. If the loss is of size 4000 or 4500, then there is ruin if the loss occurs
by year 3. If the loss is of size 5000, then there is ruin if the loss occurs by year 4.
A B C D E
Year by which Probability that Column B
Size Probability of a loss occurs Loss Occurs by times
of Loss Loss of this Size for Ruin this year Column D
500, 1000, 1500 0.3 none 0.000 0.000
2000 or 2500 0.2 1 0.650 0.130
3000 or 3500 0.2 2 0.877 0.175
4000 or 4500 0.2 3 0.957 0.191
5000 0.1 4 0.985 0.099
0.595
2016-C-1, Frequency Distributions, §14 Zero-Modified HCM 10/21/15, Page 253
Section 14, Zero-Modified Distributions102
Frequency distributions can be constructed whose densities on the positive integers are
proportional to those of a well-known distribution, but with f(0) having any value between zero and
one.
e - 3 3x 1 - 0.25
For example, let g(x) = , for x = 1, 2, 3, ..., and g(0) = 0.25.
x! 1 - e- 3
Exercise: Verify that the sum of this density is in fact is unity.

∞
[Solution: The sum of the Poisson Distribution from 0 to ∞ is 1. ∑ e−3 3x / x! = 1.
i=0
∞ ∞ ∞
Therefore, ∑e
i= 1
−3 3x / x! = 1 - e-3 . ⇒ ∑ g(x) (1 - 0.25). ⇒ ∑ g(x) = 1 - 0.25 + 0.25 = 1.]
i= 1 i= 0
This is just an example of a Poisson Distribution Modified at Zero, with λ = 3 and 25% probability
placed at zero.
For a Zero-Modified distribution, an arbitrary amount of probability has been placed at zero. In the
example above it is 25%. Loss Models uses pM
0 to denote this probability at zero.
The remaining probability is spread out proportional to some well-known distribution such as the
Poisson. In general if f is a distribution on 0, 1, 2, 3,..., and 0 < pM M
0 < 1, then probability at zero is p 0 ,
1 - pM
and pM
k = f(k) 0 = (1 - pM
0 ) pT
k , k = 1, 2, 3,... is a distribution on 0, 1, 2, 3, ....
1 - f(0)
Exercise: For a Poisson Distribution Modified at Zero, with λ = 3 and 25% probability placed at
zero, what are the densities at 0, 1, 2, 3, and 4?
34 e - 3 1 - 0.25
[Solution: For example the density at 4 is: = 0.133.
4! 1 - e- 3
x 0 1 2 3 4
f(x) 0.250 0.118 0.177 0.177 0.133
In the case of a Zero-Modified Distribution, there is no relationship assumed between the density at
zero and the other densities, other than the fact that all of the densities sum to one.
102
See Section 6.6 and Appendix B.3.2 in Loss Models.
We have the following four cases:
Distribution Zero-Modified Distribution, density at zero pM

0
m! qx (1- q)m - x
x! (m - x)!
Binomial 0)
(1- pM x = 1, 2, 3,... , m
1 - (1- q)m
e- λ λx / x!
Poisson 0)
(1- pM x = 1, 2, 3,...
1 - e- λ
r(r +1)...(r + x - 1) βx
x! (1+ β)x + r
Negative Binomial103 0)
(1- pM x =1, 2, 3,...
1 - 1/ (1+ β)r
⎛ β ⎞x
⎜ ⎟
⎝ 1+ β ⎠
(1- p0 )
M
Logarithmic x = 1, 2, 3,...
x ln(1+β)
These four zero-modified distributions complete the (a, b, 1) class of frequency distributions.104
density at x + 1 b
They each follow the formula: =a+ , for x ≥ 1.
density at x x + 1
Note that if pM
0 = 0, the zero-modified distribution reduces to a zero-truncated distribution.
However, even though it might be useful to think of the zero-truncated distributions as a special case
of the zero-modified distributions, Loss Models restricts the term zero-modified to those cases
where the density at zero is positive.
Moments:
The moments of a zero-modified distribution are given in terms of those of unmodified f by

Ef [Xn]
EModified[Xn ] = (1 - pM
0 ) = (1 - pM
0)E
Truncated[Xn ].
1 - f(0)
For example for the Zero-Truncated Poisson the mean is:

λ 2
M λ+ λ
(1 - pM
0) , while the second moment is: (1 - p ) .
1 - e- λ 0
1 - e- λ
103
The zero-modified version of the Negative Binomial is referred to by Loss Models as the Zero-Modified Extended
Truncated Negative Binomial.
104
See Table 6.4 and Appendix B.3 in Loss Models.
Exercise: For a Zero-modified Poisson with λ = 3 and 25% chance of zero claims, what is the mean?
Ef[X] λ 3
[Solution: EModified[X] = (1 - pM
0) = (1 - pM ) = (1 - 0.25) = 2.3679.
1 - f(0) 0
1 - e - λ 1 - e- 3
λ 3
Alternately, mean of zero-truncated Poisson is: = = 3.1572.
1 - e - λ 1 - e- 3
mean of zero-modified Poisson is: (1 - pM

0 ) (mean of zero-truncated) = (1- 0.25)(3.1572) = 2.3679.
Alternately, we can do the calculation from first principles.
Let f(x) be the untruncated Poisson, and pM
k be the zero-modified distribution.
0.75 f(x)
Then pM
k = , x > 0. The mean of the zero-modified distribution is:
1 - f(0)
∞
∞ ∞ ∞ ∑ x f(x)
∑ k pMk = ∑ k pMk ∑ 0.75 λ
x f(x) mean of f
= = 0.75 x=0 = 0.75 = 0.75 =
x=1
1 - f(0) 1 - f(0) 1 - f(0) 1 - e- λ
k=0 k=1
3
(0.75) = (0.75)(3.1572) = 2.3679.
1 - e- 3
Comment: In the summation, the term involving k = 0 would contribute nothing to the mean.]
Exercise: For a Zero-modified Poisson with λ = 3 and 25% chance of zero claims,
what is the variance?
[Solution: The second moment of the zero-modified Poisson is:
(1 - pM
0 ) (second moment of Poisson) / {1 - f(0)} = (1 - 0.25) (3 + 3 ) / (1 - e ) = 9.4716.
2 -3
Thus the variance of the zero-modified Poisson is: 9.4716 - 2.36792 = 3.8646.
Alternately, for the zero-truncated Poisson:
mean = λ / (1 - e−λ) = 3 / (1 - e-3) = 3.1572, and
λ {1- (λ + 1)e - λ } (3) (1 - 4e- 3 )

variance = = = 2.6609.
(1 - e - λ )2 (1 - e- 3 )2
Then for the zero-modified Poisson, as shown in the Tables attached to the exam:
variance = (1- pM M M
0 ) (variance of zero truncated) + p 0 (1- p 0 ) (mean of zero truncated) =
2
(1 - 0.25)(2.6609) + (0.25)(1-0.25)(3.15722 ) = 3.8647.]

Exercise: For a Negative Binomial with r = 0.7 and β = 3 what is the second moment?
[Solution: The mean is (0.7)(3) = 2.1, the variance is (0.7)(3)(1+3) = 8.4, so the second moment is:
8.4 + 2.12 = 12.81.]
Exercise: For a Zero-Truncated Negative Binomial with r = 0.7 and β = 3 what is the second
moment?
[Solution: For a Negative Binomial with r = 0.7 and β = 3,
the density at zero is: 1/(1+β)r = 4-0.7 = 0.3789, and the second moment is 12.81.
Thus the second moment of the zero-truncated distribution is: 12.81 / (1 - 0.3789) = 20.625.]
Exercise: For a Zero-Modified Negative Binomial with r = 0.7 and β = 3, with a 15% chance of zero
claims, what is the second moment?
[Solution: For a Zero-Truncated Negative Binomial with r = 0.7 and β = 3, the second moment is
20.625. Thus the second moment of the zero-modified distribution with a 15% chance of zero
claims is: (20.625)(1 - 0.15) = 17.531.]
Exercise: For a Zero-Modified Negative Binomial with r = 0.7 and β = 3, with a 15% chance of zero
claims, what is the variance?
[Solution: mean of the zero-truncated Negative Binomial is:
(mean of Negative Binomial) / {1 - f(0)} = (0.7)(3) / (1 - 0.3789) = 3.3811.
mean of the zero-modified Negative Binomial is:
(1 - pM
0 ) (mean of zero-truncated) = (1 - 0.15)(3.3811) = 2.8739.
Thus, the variance of the zero-modified Negative Binomial is: 17.531 - 2.87392 = 9.272.
Alternately, for the zero-truncated Negative Binomial:
rβ (0.7)(3)
mean = - r = = 3.3813, and
1 - (1+β) 1 - 4 - 0.7
rβ {(1+ β) - (1 + β + rβ)(1+β) - r } (0.7)(3) {4 - (4 + 2.1)(4- 0.7 )}

variance = = = 9.1928.
{1 - (1+β) - r }2 {1 - 4- 0.7 }2
Then for the zero-modified Negative Binomial, as shown in the Tables attached to the exam:
variance = (1- pM M M
0 ) (variance of zero truncated) + p 0 (1- p 0 ) (mean of zero truncated) =
2
(1 - 0.15)(9.1928) + (0.15)(1-0.15)(3.38132 ) = 9.272.]

The zero-modified distribution, can be thought of a mixture of a point mass of probability at zero and
a zero-truncated distribution. The probability generating function of a mixture is the mixture of the
probability generating functions. A point mass of probability at zero, has a probability generating
function E[zn ] = E[z0 ] = 1. Therefore, the Probability generating function, P(z) = E[zN], for a
zero-modified distribution can be obtained from that for zero-truncated distribution:105
M M
PM(z) = p0 + (1 - p0 ) PT(z).
where PM(z) is the p.g.f. for the zero-modified distribution and PT(z) is the p.g.f. for the
zero-truncated distribution, and pM
0 is the probability at zero for the zero-modified distribution.
Exercise: What is the Probability Generating Function for a Zero-Modified Poisson Distribution,
with 30% probability placed at zero?
[Solution: For the zero-truncated Poisson. PT(z) = (eλz - 1) / (eλ - 1).
M M eλz - 1
PM(z) = p0 + (1 - p0 ) PT(z) = 0.3 + 0.7 λ .]
e - 1
One can derive this relationship as follows:
M
pM T
k = p k (1 - p 0 ) for k > 0.
∞ ∞ ∞
PM(z) = ∑ zn pM
n = pM
0 + ∑ zn (1 - pM
0) pnT = pM
0 + (1 - pM
0) ∑ zn pTn = pM M T
0 + (1 - p 0 ) P (z).
n=0 n=1 n=1
105
The probability generating functions of the zero-modified distributions are shown in Appendix B.
Thinning:106
If we take at random a fraction of the events, then we get a distribution of the same family.
One parameter is altered by the thinning as per the non-zero-modified case.
In addition, the probability at zero, pM
0 , is altered by thinning.
Distribution Result of thinning by a factor of t

Zero-Modified Binomial q → tq m remains the same
pM m m M
0 - (1- q) + (1- tq) - p0 (1- tq)
m
pM
0 →
1 - (1- q)m
Zero-Modified Poisson λ → tλ
pM - λ + e - tλ - pM e - tλ
0 - e
pM
0 → 0
1 - e- λ
Zero-Modified Negative Binomial107 β → tβ r remains the same
pM
0 - (1+ β)
- r + (1+ tβ) - r - pM (1+ tβ)- r
pM
0 → 0
1 - (1+ β) - r
Zero-Modified Logarithmic β → tβ
M ln[1+ tβ]
0 → 1 - (1 - p 0 ) ln[1+ β]
pM
In each case, the new probability of zero claims is the probability generating function for the original
zero-modified distribution at 1 - t, where t is the thinning factor.
For example, for the Zero-Modified Binomial, PM(z) = pM M T

0 + (1 - p 0 ) P (z) =
{1 + q(z -1)}m - (1- q)m
pM
0 + (1 - pM
0) .
1 - (1- q)m
M M
{1 - qt}m - (1- q)m p0 - (1- q)m + (1- tq)m - p0 (1- tq)m
P(1 - t) = pM
0 + (1 - pM
0) = .
1 - (1- q)m 1 - (1- q)m
106
See Table 8.3 in Loss Models.
107
Including the special case the zero-modified geometric.
For example, let us assume we look at only large claims, which are t of all claims.
Then if we have n claims, the probability of zero large claims is: (1-t)n .
Thus the probability of zero large claims is:
Prob[zero claims] (1-t)0 + Prob[1 claim] (1-t)1 + Prob[2 claims] (1-t)2 + Prob[3 claims] (1-t)3 + ...
E[(1-t)n ] = P(1 - t) = p.g.f. for the unthinned distribution at 1 - t.
Exercise: Show that the p.g.f. for the original zero-modified Logarithmic distribution at 1 - t matches
the above result for the density at zero for the thinned distribution.
[Solution: For the Zero-Modified Logarithmic, P(z) = pM M
0 + (1 - p 0 ) (p.g.f. of Logarithmic) =
ln[1 - β(z - 1)]
pM M
0 + (1 - p 0 ) {1 - ln[1+ β]
}.
M ln[1+ tβ]
P(1 - t) = pM M
0 + (1 - p 0 ) {1 - ln[1 - βt] / ln[1+β]} = 1 - (1 - p 0 ) ln[1+ β] . ]
Exercise: The number of losses follows a zero-modified Poisson with λ = 2 and pM

0 = 10%.
30% of losses are large. What is the distribution of the large losses?
[Solution: Large losses follow a zero-modified Poisson with λ = (30%)(2) = 0.6 and
0.1 - e -2 + e - 0.6 - (0.1) e - 0.6
pM
0 = = 0.5304.]
1 - e- 2
Exercise: The number of members per family follows a zero-truncated Negative Binomial
with r = 0.5 and β = 4.
It is assumed that 60% of people have first names that begin with the letters A through M,
and that size of family is independent of the letters of the first names of its members.
What is the distribution of the number of family members with first names that begin with the letters
A through M?
[Solution: The zero-truncated distribution is mathematically the same as a zero-modified distribution
with pM
0 = 0.
Thus the thinned distribution is a zero-modified Negative Binomial with r = 0.5, β = (60%)(4) = 2.4,
0 - 5 - 0.5 + 3.4 - 0.5 - (0) (3.4 -0.5 )
and pM
0 = = 0.1721.
1 - 5 - 0.5
Comment: While prior to thinning there is no probability of zero members, after thinning there is a
probability of zero members with first names that begin with the letters A through M.
Thus the thinned distribution is zero-modified rather than zero-truncated.]
Problems:
Use the following information for the next six questions:

The number of claims per year is given by a Zero-Modified Binomial Distribution with parameters
q = 0.3 and m = 5, and with 15% probability of zero claims.
14.1 (1 point) What is the mean number of claims over the coming year?
A. less than 1.4
E. at least 1.7
14.2 (2 points) What is the variance of the number of claims per year?
A. less than 0.98
E. at least 1.04
14.3 (1 point) What is the chance of observing 3 claims over the coming year?
A. less than 13.0%
E. at least 14.2%
14.4 (2 points) What is the 95th percentile of the distribution of the number of claims per year?
A. 1 B. 2 C. 3 D. 4 E. 5
14.5 (2 points) What is the probability generating function at 3?

A. less than 9
E. at least 12
14.6 (2 points) Small claims are 70% of all claims.

What is the chance of observing exactly 2 small claims over the coming year?
A. 20% B. 22% C. 24% D. 26% E. 28%

The number of claims per year is given by a Zero-Modified Negative Binomial Distribution with
parameters r = 4 and β = 0.5, and with 35% chance of zero claims.
A. less than 1.7
E. at least 2.0
14.8 (2 points) What is the variance of the number of claims year?

A. less than 2.0
E. at least 2.6
A. less than 0.8%
E. at least 1.4%
14.10 (3 points) What is the probability of more than 5 claims in the coming year?
A. less than 1%
B. at least 1%, but less than 3%
C. at least 3%, but less than 5%
D. at least 5%, but less than 7%
E. at least 7%
14.11 (3 points) Large claims are 40% of all claims.

What is the chance of observing more than 1 large claim over the coming year?
A. 10% B. 12% C. 14% D. 16% E. 18%
The number of claims per year is given by a Zero-Modified Logarithmic Distribution with parameter
β = 2, and a 25% chance of zero claims.
A. less than 1.0
E. at least 1.3
A. less than 2.0
E. at least 2.6
A. less than 1.1%
E. at least 1.7%
14.15 (2 points) Medium sized claims are 60% of all claims.

What is the chance of observing exactly one medium sized claim over the coming year?
A. 31% B. 33% C. 35% D. 37% E. 39%
14.16 (1 point) The number of claims per year is given by a Zero-Modified Negative Binomial
Distribution with parameters r = -0.6 and β = 3, and with a 20% chance of zero claims.
What is the chance of observing 5 claims over the coming year?
A. less than 0.8%
E. at least 1.4%
The number of claims per year is given by a Zero-Modified Poisson Distribution with parameter
λ = 2.5, and with 30% chance of zero claims.
A. 1.9 B. 2.0 C. 2.1 D. 2.2 E. 2.3
A. less than 2.7
E. at least 3.0
A. less than 2%
E. at least 5%
A. 18% B. 20% C. 22% D. 24% E. 26%
14.21 (2 points) What is the chance of observing fewer than 4 claims over the coming year?
A. less than 70%
E. at least 85%
14.22 (2 points) What is the mode of this frequency distribution?

A. 0 B. 1 C. 2 D. 3 E. 4
14.23 (2 points) Large claims are 20% of all claims.

What is the chance of observing exactly one large claim over the coming year?
A. 15% B. 17% C. 19% D. 21% E. 23%
14.24 (3 points) Let pk denotes the probability that the number of claims equals k for k = 0, 1, ...
If pn / pm = 2.4n-m m! / n!, for m ≥ 0, n ≥ 0, then using the corresponding zero-modified claim count
distribution with pM M
0 = 0.31, calculate p 3 .
(A) 16% (B) 18% (C) 20% (D) 22% (E) 24%
14.25 (3 points) The number of losses follow a zero-modified Poisson Distribution with λ and pM
0.
Small losses are 70% of all losses.
From first principles determine the probability of zero small losses.
14.26 (3 points) The following data is the number sick days taken at a large company during the
previous year.
Number of days: 0 1 2 3 4 5 6 7 8+
Number of employees: 50,122 9190 5509 3258 1944 1160 693 418 621
Is it likely that this data was drawn from a member of the (a, b, 0) class?
Is it likely that this data was drawn from a member of the (a, b, 1) class?
14.27 (3 points) For a zero-modified Poisson, pM M

2 = 27.3%, and p 3 = 12.7%.
Determine pM
0.
(A) 11% (B) 12% (C) 13% (D) 14% (E) 15%
p k denotes the probability that X = k.
p 1 = 0.1637, p2 = 0.1754, and p3 = 0.1503.
Calculate p5 .
(A) 7.5% (B) 7.7% (C) 7.9% (D) 8.1% (E) 8.3%
14.29 (2 points) The number of claims per year is given by a Zero-Modified Poisson Distribution
with λ = 2, and with 25% chance of zero claims.
Where N is the number of claims, determine E[N ∧ 3].
A. 1.50 B. 1.55 C. 1.60 D. 1.65 E. 1.70
14.30 (1 point) What is the probability generating function for a Zero-Modified Poisson Distribution
with λ = 0.1 and pM
0 = 60%?
14.31 (7 points) For a zero-truncated Geometric Distribution:

{1 - β(z -1)}- 1 - (1+ β)- 1 z
PT(z) = = .
1 - (1+ β)- 1 1 + β - βz
(a) (1 point) X follows a zero-modified Geometric Distribution with β = 0.25 and pM

0 = 40%.
Determine the probability generating function for X.
(b) (3 points) Let Y be the sum of two independent, identically distributed such variables X.
With the aid of a computer, using its probability generating function, determine for Y the densities at:
0, 1, 2, 3, 4, 5, and 6.
(c) (3 points) Let Z be the zero-modified Negative Binomial Distribution with r = 2, β = 0.25, and the
same probability of zero claims as Y.
With the aid of a computer, determine for Z the densities at: 0, 1, 2, 3, 4, 5, and 6.
14.32 (3, 5/00, Q.37) (2.5 points) Given:

(i) pk denotes the probability that the number of claims equals k for k = 0, 1, 2, ...
(ii) pn / pm = m! / n!, for m ≥ 0, n ≥ 0
Using the corresponding zero-modified claim count distribution with pM M

0 = 0.1, calculate p1 .
(A) 0.1 (B) 0.3 (C) 0.5 (D) 0.7 (E) 0.9
14.1. C. Mean is that of the unmodified Binomial, multiplied by (1 - 0.15) and divided by 1- f(0):
(0.3)(5)(0.85) / (1 - 0.75 ) = 1.533.
14.2. D. The second moment is that of the unmodified Binomial,

multiplied by (1-.15) and divided by 1 - f(0):
(1.05 + 1.52 )(0.85) / (1 - 0.75 ) = 3.372. Variance = 3.372 - 1.5332 = 1.022.
14.3. C. For an unmodified binomial, f(3) = (5!/(3!)(2!)) 0.33 0.72 = 0.1323.

For the zero-truncated distribution one gets the density by multiplying by (1-.15)
and dividing by 1 - f(0):
(0.1323)(0.85) / (1 - 0.75 ) = 13.5%.
14.4. C. The 95th percentile is that value corresponding to the distribution function being 95%.
For a discrete distribution such as we have here, employ the convention that the 95th percentile is
the first value at which the distribution function is greater than or equal to 0.95. F(2) = 0.8334 < 95%,
F(3) = 0.9686 ≥ 95%, and therefore the 95th percentile is 3.
Number Unmodified
Binomial
Zero-Modified Cumulative
of Claims CoefficientBinomial
Binomial Zero-Modified
0 16.81% 15.00% Binomial
1 36.02% 36.80% 51.80%
2 30.87% 31.54% 83.34%
3 13.23% 13.52% 96.86%
4 2.83% 2.90% 99.75%
5 0.24% 0.25% 100.00%
14.5. C. As shown in Appendix B of Loss Models, for the zero-truncated Binomial Distribution:
{1 + q(z -1)} m - (1- q)m 5 5
T(3) = {1 + (0.3)(3 -1)} - (1- 0.3) = 12.402.
PT(z) = . ⇒ P
1 - (1- q)m 1 - (1- 0.3)5
The p.g.f. for the zero-modified distribution is:
0 + (1 - p 0 )P (z). ⇒ P (3) = (0.15) + (0.85)(12.402) = 10.69.

PM(z) = pM M T M
Comment: The densities of the zero-modified distribution:

Number Unmodified
Binomial
Zero-Modified
of Claims CoefficientBinomial
Binomial
0 16.807% 15.000%
1 36.015% 36.797%
2 30.870% 31.541%
3 13.230% 13.517%
4 2.835% 2.897%
5 0.243% 0.248%
PT(3) is the expected value of 3n :
(15%)(30 ) + (36.797%)(31 ) + (31.541%)(32 ) + (13.517%)(33 ) + (2.897%)(34 ) + (0.248%)(35 ) =
10.69.
14.6. B. After thinning we get another zero-modified Binomial, with m = 5,

but q = (0.7)(0.3) = 0.21, and
pM m m M
0 - (1- q) + (1- tq) - p0 (1- tq)
m
0.15 - 0.75 + 0.795 - (0.15) (0.795 )
pM
0 → = = 0.2927.
1 - (1- q)m 1 - 0.75
The density at two of the new zero-modified Binomial is:

1 - 0.2927
(10) (0.212 ) (0.793 ) = 22.21%.
1 - 0.795
Comment: The probability of zero claims for the thinned distribution is the p.g.f. for the original
14.7. A. Mean is that of the unmodified negative binomial, multiplied by (1-.35) and divided by
1 - f(0): (4)(0.5)(0.65)/ (1 - 1.5-4) = 2 / 0.8025 = 1.62
14.8. E. The second moment is that of the unmodified negative binomial, multiplied by (1-.35) and
divided by 1 - f(0): (3+22 ) (0.65)/ (1 - 1.5-4) = 5.67. Variance = 5.67 - 1.622 = 3.05.
14.9. B. For the unmodified negative binomial,

f(7) = (4)(5)(6)(7)(8)(9)(10) .57 / {(7!)(1.5)11} = 1.08%. For the zero-truncated distribution one gets
the density by multiplying by (1-.35) and dividing by 1 - f(0): (1.08%)(0.65) / (1 - 1.5-4) = 0.87%.
14.10. C. The chance of more than 5 claims is: 1 - 0.9656 = 3.44%.

Number Unmodified Zero-Modified Cumulative
of Claims Neg. Binomial Neg. Binomial Zero-Modified
0 19.75% 35.00% Neg. Binomial
1 26.34% 21.33% 56.33%
2 21.95% 17.78% 74.11%
3 14.63% 11.85% 85.96%
4 8.54% 6.91% 92.88%
5 4.55% 3.69% 96.56%
6 2.28% 1.84% 98.41%
7 1.08% 0.88% 99.29%
8 0.50% 0.40% 99.69%
9 0.22% 0.18% 99.87%
14.11. D. After thinning we get another zero-modified Negative Binomial, with r = 4,

but β = (40%)(0.5) = 0.2, and
pM
0 - (1+ β)
- r + (1+ tβ) - r - pM (1+ tβ)- r
0.35 - 1.5 - 4 + 1.2 - 4 - (0.35) (1.2 - 4 )
pM
0 → 0 = =
1 - (1+ β) - r 1 - 1.5 - 4
0.5806.
The density at one of the new zero-modified Negative Binomial is:
1 - 0.5806 (4)(0.2)
= 0.2604.
1 - 1/ 1.24 1.25
Probability of more than one large claim is: 1 - 0.5806 - 0.2604 = 15.90%.
Similar to Example 8.9 in Loss Models.
14.12. E. Mean of the logarithmic distribution is: β/ln(1+β) = 2 / ln(3) = 1.82.

For the zero-modified distribution, the mean is multiplied by 1 - 0.25: (0.75)(1.82) = 1.37.
Comment: Note the unmodified logarithmic distribution has no chance of zero claims.
Therefore, we need not divide by 1 - f(0) to get to the zero-modified distribution (or alternately we
are dividing by 1 - 0 = 1.)
14.13. C. Variance of the unmodified logarithmic distribution is:

β{1 + β − β/ln(1+β)}/ln(1+β) = 2{3 -1.82}/ ln(3) = 2.15.
Thus the unmodified logarithmic has a second moment of: 2.15 + 1.822 = 5.46.
For the zero-modified distribution, the second moment is multiplied by 1 - 0.25:
(0.75)(5.46) = 4.10.
Thus the variance of the zero-modified distribution is: 4.10 - 1.372 = 2.22.
14.14. A. For the unmodified logarithmic distribution, f(x) = {β/ (1+β)}x / {x ln(1+β)}
f(6) = (2/3)6 / {6ln(3)} = 1.33%.
For the zero-modified distribution, the density at 6 is multiplied by 1 - 0.25:
(0.75)(1.33%) = 1.00%.
14.15. D. After thinning we get another zero-modified Logarithmic, with β = (60%)(2) = 1.2, and
M ln[1+ tβ] ln[2.2]
0 → 1 - (1 - p 0 ) ln[1+ β] = 1 - (0.75) ln[3] = 0.4617.
pM
The density at one of the new zero-modified Logarithmic is:

1.2
(1 - 0.4617) = 37.23%.
2.2 ln[2.2]
14.16. A. For the zero-truncated Negative Binomial Distribution,

f(5) = r(r+1)(r+2)(r+3)(r+4) (β/(1+β))x /{(5!)((1+β)r -1)} =
(-0.6)(0.4)(1.4)(2.4)(3.4)(3/4)5 / {(120)(4-0.6 -1) = (-2.742)(.2373) / {(120)(-0.5647)} = 0.96%.
For the zero-modified distribution, multiply by 1 - 0.2:
M M
p 5 = (1- p0 ) pT 5 = (0.8)(0.96%) = 0.77%.
Comment: Note this is an extended zero-truncated negative binomial distribution, with
0 > r > -1. The same formulas apply as when r > 0. (As r approaches zero one gets a logarithmic
distribution.) For the unmodified negative binomial distribution we must have r > 0. So in this case
there is no corresponding unmodified distribution.
14.17. A. The mean is that of the non-modified Poisson, multiplied by (1-.3) and divided by
1- f(0): (2.5) (0.7) / (1 - e-2.5) = 1.907.
14.18. E. The second moment is that of the unmodified Poisson, multiplied by (1-.3) and divided
by 1-f(0): (2.5+2.52 )(0.7) / (1 - e-2.5) = 6.673. Variance = 6.673 - 1.9072 = 3.04.
14.19. B. For an unmodified Poisson, f(6) = (2.56 )e-2.5/6! = 0.0278.

For the zero-modified distribution one gets the density by multiplying by (1 - 0.3) and dividing by
1 - f(0): (0.0278)(0.7) / (1 - e-2.5) = 2.12%.
14.20. B. For the unmodified Poisson f(0) = e-2.5 = 8.208%, and f(2) = 2.52 e-2.5/2 = 25.652%.
The zero-modified Poisson has a density at 2 of: (25.652%)(1 - 30%)/(1 - 8.208%) = 19.56%.
14.21. D. One adds up the chances of 0, 1, 2 and 3 claims, and gets 81.5%.
Number Unmodified
BinomialZero-Modified Cumulative
of Claims Poisson
Coefficient Poisson Zero-Modified
0 8.21% 30.00% Poisson
1 20.52% 15.65% 45.65%
2 25.65% 19.56% 65.21%
3 21.38% 16.30% 81.51%
4 13.36% 10.19% 91.70%
5 6.68% 5.09% 96.80%
6 2.78% 2.12% 98.92%
7 0.99% 0.76% 99.68%
8 0.31% 0.24% 99.91%
Comment: We are given a 30% chance of zero claims.
The remaining 70% is spread in proportion to the unmodified Poisson. For example,
(70%)(20.52%)/(1 - 0.0821) = 15.65%, and (70%)(25.65%)/(1 - 0.0821) = 19.56%
Unlike the zero-truncated distribution, the zero-modified distribution has a probability of zero events.
14.22. A. The mode is where the density function is greatest, 0.

Number Unmodified
Binomial
Zero-Modified
of Claims CoefficientPoisson
Poisson
0 8.21% 30.00%
1 20.52% 15.65%
2 25.65% 19.56%
3 21.38% 16.30%
4 13.36% 10.19%
5 6.68% 5.09%
6 2.78% 2.12%
7 0.99% 0.76%
8 0.31% 0.24%
Comment: If the mode of the zero-modified and unmodified distribution are ≠ 0, then the
zero-modified distribution has the same mode as the unmodified distribution, since all the densities
on the positive integers are multiplied by the same factor.
14.23. E. After thinning we get another zero-modified Poisson, with λ = (20%)(2.5) = 0.5, and
pM - λ + e - tλ - pM e - tλ 0.3 - e - 2.5 + e - 0.5 - (0.3) (e - 0.5 )

0 - e
pM
0 → 0 = = 0.6999.
1 - e- λ 1 - e - 2.5
The density at one of the new zero-modified Poisson is:
1 - 0.6999
(0.5 e-0.5) = 23.13%.
1 - e - 0.5
14.24. A. f(x+1)/ f(x) = 2.4{x!/ (x+1)!} = 2.4/(x+1). Thus this is a member of the (a, b, 0) subclass,
f(x+1)/ f(x) = a + b/(x+1), with a = 0 and b = 2.4. This is a Poisson Distribution, with λ = 2.4.
For the unmodified Poisson, the probability of more than zero claims is: 1 - e-2.4 .
After, zero-modification, this probability is: 1- 0.31 = 0.69. Thus the zero-modified distribution is,
fM(x) = (0.69/(1 - e-2.4))f(x) = {0.69/(1 - e-2.4)} e-2.4 2.4x /x! = 2.4x(0.69)/((e2.4 - 1) x!), x≥1.
fM(3) = 2.43 {0.69/((e2.4 - 1) 3!} = 0.159.

# claims 0 1 2 3 4 5 6 7
zero-modified density 0.31 0.1652 0.1983 0.1586 0.0952 0.0457 0.0183 0.0063
Comment: For a Poisson with λ = 2.4, f(n)/f(m) = (e-2.4 2.4n / n!)/(e-2.4 2.4m / m!) =
2.4n-m m! / n!.
14.25. If there are n losses, then the probability that zero of them are small is 0.3n .
Prob[0 small losses] =
Prob[0 losses] + Prob[1 loss] Prob[loss is big] + Prob[2 losses] Prob[both losses are big] + ... =
1 - pM 1 - pM 1 - pM
pM
0 +{ 0 λe−λ} (0.3) + { 0 λ2e−λ / 2!} (0.32 ) +{ 0 λ3e−λ / 3!} (0.33 ) + ... =
1 - e- λ 1 - e- λ 1 - e- λ
M 1 - pM
p0 + 0 e−λ {0.3λ + (0.3λ)2 / 2! + (0.3λ)3 / 3! + ...} =
1 - e- λ
M 1 - pM M 1 - pM
p0 + 0 e−λ {e0.3λ - 1} = p0 + 0 {e-0.7λ - e−λ} =
1 - e- λ 1 - e- λ
(1 - e - λ ) pM M - 0.7 λ - e - λ ) pM - λ + e - 0.7 λ - pM e - 0.7 λ

0 + (1 - p0 ) (e 0 - e 0
= .
1 - e- λ 1 - e- λ
0 - e
Comment: Matches the general formula with t = 0.7: pM
0 → 0 .
1 - e- λ
The thinned distribution is also a zero-modified Poisson, with λ* = 0.7λ.

The probability of zero claims for the thinned distribution is the P.G.F. for the original
14.26. Calculate (x+1)f(x+1)/f(x) = (x+1) (number with x+1) / (number with x).
Number of
Days Observed (x+1)f(x+1)/f(x) Differences
0 50,122 0.183
1 9,190 1.199 1.016
2 5,509 1.774 0.575
3 3,258 2.387 0.613
4 1,944 2.984 0.597
5 1,160 3.584 0.601
6 693 4.222 0.638
7 418
8+ 621
The accident profile is not approximately linear starting at zero.
Thus, this is probably not from a member of the (a, b, 0) class.
The accident profile is approximately linear starting at one.
Thus, this is probably from a member of the (a, b, 1) class.
Comment: f(x+1)/f(x) = a + b/(x+1), so (x+1)f(x+1)/f(x) = a(x+1) + b = ax + a + b.
The slope is positive, so a > 0 and we have a Negative Binomial.
The slope, a ≅ 0.6. The intercept is about 0.6. Thus a + b ≅ 0.6. Therefore, b ≅ 0.
For the Negative Binomial b = (r-1)β/(1+β). Thus b = 0, implies r ≅ 1.
Thus the data may have been drawn from a Zero-Modified Geometric, with β ≅ 0.6.
1 - pM 1 - pM
14.27. E. pM
2 = f(2) 0 . pM
3 = f(3) 0 .
1 - f(0) 1 - f(0)
λ 3 e- λ / 6
Thus pM pM
3/ 2 = f(3) / f(2) = 2 = λ/3. ⇒ λ/3 = 12.7%/27.3%. ⇒ λ = 1.396.
λ e- λ / 2
1 - pM
⇒ 27.3% = {(1.3962 ) e-1.396 / 2} 0 . ⇒ pM
0 = 14.86%.
1 - e- 1.396
Comment: pM = 39.11%.
1
14.28. B. Since we have a member of the (a, b, 1) family:

p 2 /p1 = a + b/2. ⇒ 2a + b = (2)(0.1754)/0.1637 = 2.1429.
p 3 /p2 = a + b/3. ⇒ 3a + b = (3)(0.1503)/0.1754 = 2.5707.
⇒ a = 0.4278. ⇒ b = 1.2873.
p 4 = (a + b/4) p3 = (0.4278 + 1.2873/4) (0.1503) = 0.1127.
p 5 = (a + b/45) p4 = (0.4278 + 1.2873/5) (0.1127) = 0.0772.
Comment: Based on a zero-modified Negative Binomial, with r = 4, β = 0.75, and pM

0 = 20%.
14.29. B. E[N ∧ 3] = 0 f(0) + 1 f(1) + 2 f(2) + 3 {1 - f(0) - f(1) - f(2)} =

0.2348 + (2)(0.2348) + (3)(1 - 0.7196) = 1.55.
Number Unmodified Zero-Modified Cumulative
of Claims Poisson Poisson Zero-Modified
0 13.53% 25.00% Poisson
1 27.07% 23.48% 48.48%
2 27.07% 23.48% 71.96%
3 18.04% 15.65% 87.61%
14.30. From Appendix B of Loss Models, for the zero-truncated Poisson:

PT(z) = (e0.1z - 1) / (e0.1 - 1).
Therefore, for the zero-modified Poisson:
PM(z) = 0.6 + 0.4 (e0 . 1 z - 1) / (e0 . 1 - 1).
z z 4z
14.31. (a) PT(z) = = = .
1 + β - βz 1.25 - 0.25z 5 - z
4z 2 + 2z
PM(z) = 0.4 + 0.6 = .
5- z 5- z
(b) The probability generating function of Y is the square of that of the zero-modified Geometric
4 + 8z + 4z2
Distribution: .
(5 - z) 2
f(n) = Pn (0) / n!. Using a computer, the densities at 0 to 6 are:

0.16, 0.384, 0.3072, 0.10752, 0.03072, 0.0079872, 0.00196608.
Note that for Y to be zero, each of the zero-modified Geometric Distribution have to be zero:
0.42 = 0.16.
For Y to be one, one of the zero-modified Geometric Distribution has to be zero, while the other
one is 1: (2)(0.4) {(0.25/1.252 )(0.6)/(1 - 1/1.25)} = (2)(0.4)(0.48) = 0.384.
(c) The densities from 0 to 6 for a Negative Binomial Distribution with r = 2, β = 0.25:
0.64, 0.256, 0.0768, 0.02048, 0.00512, 0.0012288, 0.00028672.
To get the densities of the zero-modified distribution for x > 0, we multiply by : 0.84/(1 - 0.64).
Thus the densities from 0 to 6 for the zero-modified Negative Binomial Distribution are:
0.16, 0.597333, 0.1792, 0.0477867, 0.0119467, 0.0028672, 0.000669013.
Comment: The sum of two zero-modified Geometric Distributions is not a zero-modified Negative
Binomial Distribution.
14.32. C. f(x+1)/ f(x) = x!/ (x+1)! = 1/(x+1). Thus this is a member of the (a, b, 0) subclass,
f(x+1) / f(x) = a + b/(x+1), with a = 0 and b = 1. This is a Poisson Distribution, with λ = 1.
For the unmodified Poisson, the probability of more than zero claims is: 1 - e-1 .
After, zero-modification, this probability is: 1 - 0.1 = .9. Thus the zero-modified distribution is,
fM(x) = {0.9/(1-e-1)} f(x) = {0.9/(1-e-1)} e-1 1x /x! = 0.9/((e - 1) x!), x≥1.
fM(1) = 0.9/(e-1) = 0.524.
# claims 0 1 2 3 4 5 6
zero-modified density 0.1 0.5238 0.2619 0.0873 0.0218 0.0044 0.0007
Comment: For a Poisson with λ = 1, f(n)/f(m) = (e 1n / n!)/(e-1 1m / m!) = m! / n!.
-1
2016-C-1, Frequency Distributions, §15 Compound Dists. HCM 10/21/15, Page 275
Section 15, Compound Frequency Distributions108
A compound frequency distribution has a primary and secondary distribution, each of which is a
frequency distribution. The primary distribution determines how many independent random draws
from the secondary distribution we sum.
For example, assume the number of taxicabs that arrive per minute at the Heartbreak Hotel is
Poisson with mean 1.3. In addition, assume that the number of passengers dropped off at the hotel
by each taxicab is Binomial with q = 0.4 and m = 5. The number of passengers dropped off by
each taxicab is independent of the number of taxicabs that arrive and is independent of the number
of passengers dropped off by any other taxicab.
Then the aggregate number of passengers dropped off per minute at the Heartbreak Hotel is an
example of a compound frequency distribution. It is a compound Poisson-Binomial distribution, with
parameters λ = 1.3, q = 0.4, m = 5.109
The distribution function of the primary Poisson is as follows:

1.3 Cumulative
Number Probability Distribution
of Claims Density Function Function
0 27.253% 0.27253
1 35.429% 0.62682
2 23.029% 0.85711
3 9.979% 0.95690
4 3.243% 0.98934
5 0.843% 0.99777
6 0.183% 0.99960
So for example, there is a 3.243% chance that 4 taxicabs arrive; in which case the number
passengers dropped off is the sum of 4 independent identically distributed Binomials110, given by
the secondary Binomial Distribution. There is a 27.253% chance there are no taxicabs, a 35.429%
chance we take one Binomial, 23.029% chance we sum the result of 2 independent identically
distributed Binomials, etc.
108
See Section 7.1 of Loss Models, not on the syllabus. However, compound distributions are mathematically the
same as aggregate distributions. See “Mahlerʼs Guide to Aggregate Distributions.” Some of you may better
understand the idea of compound distributions by seeing how they are simulated in “Mahlerʼs Guide to Simulation.”
109
In the name of a compound distribution, the primary distribution is listed first and the secondary distribution is
listed second.
110
While we happen to know that the sum of 4 independent Binomials each with q = 0.4, m = 5 is another Binomial
with parameters q = 0.4, m = 20, that fact is not essential to the general concept of a compound distribution.
The secondary Binomial Distribution with q = 0.4, m = 5 is as follows:

Cumulative
Number Probability Distribution
of Claims Density Function Function
0 7.776% 0.07776
1 25.920% 0.33696
2 34.560% 0.68256
3 23.040% 0.91296
4 7.680% 0.98976
5 1.024% 1.00000
Thus assuming a taxicab arrives, there is a 34.560% chance that 2 passengers are dropped off.
In this example, the primary distribution determines how many taxicabs arrive, while the secondary
distribution determines the number of passengers departing per taxicab. Instead, the primary
distribution could be the number of envelopes arriving and the secondary distribution could be the
number of claims in each envelope.111
Actuaries often use compound distributions when the primary distribution determines
how many accidents there are, while for each accident the number of persons injured or
number of claimants is determined by the secondary distribution.112 This particular model,
while useful for comprehension, may or may not apply to any particular use of the mathematical
concept of compound frequency distributions.
There are number of methods of computing the density of compound distributions, among them the
use of convolutions and the use of the Recursive Method (Panjer Algorithm.)113
Probability Generating Function of Compound Distributions:
One can get the Probability Generating Function of a compound distribution in terms of those of its
primary and secondary distributions:
p.g.f. of compound distribution = p.g.f. of primary distribution[p.g.f. of secondary distribution]
P(z) = P1 [P2 (z)].
111
See 3, 11/01, Q.30.
112
See 3, 5/01, Q.36.
113
Both discussed in “Mahlerʼs Guide to Aggregate Distributions,” where they are applied to both compound and
aggregate distributions.
Exercise: What is the Probability Generating Function of a Compound

Geometric-Binomial Distribution, with β = 3, q = 0.1, and m = 2.
[Solution: The p.g.f. of the primary Geometric is: 1 / {1 -3 (z-1)} = 1 / (4 - 3z), z < 1 +1/β = 4/3.
The p.g.f. of the secondary Binomial is: {1 + (0.1)(z-1)}2 = (0.9 + 0.1z)2 = 0.01z2 + 0.18z + 0.81.
P(z) = P1 [P2 (z)] = 1 / {4 - 3(0.01z2 + 0.18z + 0.81)} = -1/(0.03z2 + 0.54z - 1.57), z < 4/3.]
Recall, that for any frequency distribution, f(0) = P(0). Therefore, for a compound distribution,
c(0) = Pc(0) = P1 [P2 (0)] = P1 [s(0)].
compound density at 0 = p.g.f. of the primary at density at 0 of the secondary.114
For example, in the previous exercise, the density of the compound distribution at zero is its p.g.f. at
z = 0: 1/1.57 = 0.637. The density at 0 of the secondary Binomial Distribution is:
0.92 = 0.81. The p.g.f. of the primary distribution at 0.81 is: 1 / {4 - (3)(0.81)} = 1/1.57 = 0.637.
If one takes the p.g.f. of a compound distribution to a power ρ > 0, P(z)ρ = P1 ρ [P2 (z)].
Thus if the primary distribution is infinitely divisible, i.e., P1 ρ has the same form as P1 , then Pρ
has the same form as P. If the primary distribution is infinitely divisible, then so is the
compound distribution.
Since the Poisson and the Negative Binomial are each infinitely divisible, so are compound
distributions with a primary distribution which is either a Poisson or a Negative Binomial
(including a Geometric.)
Adding Compound Distributions:
For example, let us assume that taxi cabs arrive at a hotel (primary distribution) and drop people off
(secondary distribution.) Assume two independent Compound Poisson Distributions with the same
secondary distribution. The first compound distribution represents those cabs whose drivers were
born in January through June and has λ = 11, while the second compound distribution represents
those cabs whose drivers were born in July through December and has λ = 9.
Then the sum of the two distributions represents the passengers from all of the cabs, and is a
Compound Poisson Distribution with λ = 11 + 9 = 20, and the same secondary distribution as each
of the individual Compound Distributions.
Note that the parameter of the primary rather than secondary distribution was affected.
114
This is the first step of the Panjer Algorithm, discussed in “Mahlerʼs Guide to Aggregate Distributions.”
Exercise: Let X be a Poisson-Binomial Distribution compound frequency distribution with

λ = 4.3, q = 0.2, and m = 5. Let Y be a Poisson-Binomial Distribution compound frequency
distribution with λ = 2.4, q = 0.2, and m = 5. What is the distribution of X + Y?
[Solution: A Poisson-Binomial Distribution with λ = 4.3 + 2.4 = 6.7, q = 0.2, and m = 5.]
The sum of two independent identically distributed Compound Poisson variables

has the same form. The sum of two independent identically distributed Compound Negative
Binomial variables has the same form.
Exercise: Let X be a Negative Binomial-Poisson compound frequency distribution with

β = 0.7, r = 2.5, and λ = 3.
What is the distributional form of the sum of two independent random draws from X?
[Solution: A Negative Binomial-Poisson with β = 0.7, r = (2)(2.5) = 5, and λ = 3.]
Exercise: Let X be a Poisson-Geometric compound frequency distribution with λ = 0.3 and

β = 1.5.
What is the distributional form of the sum of twenty independent random draws from X?
[Solution: The sum of 20 independent identically distributed variables is of the same form.
However, λ = (20)(0.3) = 6. We get a Poisson-Geometric compound frequency distribution with
λ = 6 and β = 1.5.]
If one adds independent identically distributed Compound Binomial variables one gets the same
form.
Exercise: Let X be a Binomial-Geometric compound frequency distribution with q = 0.2, m = 3, and

β = 1.5.
What is the distributional form of the sum of twenty independent random draws from X?
[Solution: The sum of 20 independent identically distributed binomial variables is of the same form,
with m = (20)(3) = 60. We get a Binomial-Geometric compound frequency distribution with q = 0.2,
m = 60, and β = 1.5.]
Thinning Compound Distributions:
Thinning compound distributions can be done in two different manners, one manner affects
the primary distribution, and the other manner affects the secondary distribution.
For example, assume that taxi cabs arrive at a hotel (primary distribution) and drop people off
(secondary distribution.) Then we can either select certain types of cabs or certain types of people.
Depending on which we select, we affect the primary or secondary distribution.
Assume we select only those cabs that are less than one year old (and assume age of cab is
independent of the number of people dropped off and the frequency of arrival of cabs.)
Then this would affect the primary distribution, the number of cabs.
Exercise: Cabs arrive via a Poisson with mean 1.3. The number of people dropped off by each
cab is Binomial with q = 0.2 and m = 5. The number of people dropped off per cab is independent
of the number of cabs that arrive. 30% of cabs are less than a year old.
The age of cabs is independent of the number of people dropped off and the frequency of arrival
of cabs.
What is the distribution of the number of people dropped off by cabs less than one year old?
[Solution: Cabs less than a year old arrive via a Poisson with λ = (30%)(1.3) = 0.39.
There is no effect on the number of people per cab (secondary distribution.)
We get a Poisson-Binomial Distribution compound frequency distribution with λ = 0.39, q = 0.2, and
m = 5.]
This first manner of thinning affects the primary distribution. For example, it might occur if the primary
distribution represents the number of accidents and the secondary distribution represents the
number of claims.
For example, assume that the number of accidents is Negative Binomial with β = 2 and r = 30, and
the number of claims per accident is Binomial with q = 0.3 and m = 7. Then the total number of claims
is Compound Negative Binomial-Binomial with parameters β = 2, r = 30, q = 0.3 and m = 7.
Exercise: Accidents are assigned at random to one of four claims adjusters:

Jerry, George, Elaine, or Cosmo.
What is the distribution of the number claims adjusted by George?
[Solution: We are selecting at random 1/4 of the accidents. We are thinning the Negative Binomial
Distribution of the number of accidents. Therefore, the number of accidents assigned to George is
Negative Binomial with β = 2/4 = 0.5 and r = 30.
The number claims adjusted by George is Compound Negative Binomial-Binomial with
parameters β = 0.5, r = 30, q = 0.3 and m = 7.]
Returning to the cab example, assume we select only female passengers, (and gender of
passenger is independent of the number of people dropped off and the frequency of arrival of
cabs.). Then this would affect the secondary distribution, the number of passengers.
Exercise: Cabs arrive via a Poisson with mean 1.3. The number of people dropped off by each
cab is Binomial with q = 0.2 and m = 5. The number of people dropped off per cab is independent
of the number of cabs that arrive. 40% of the passengers are female.
The gender of passengers is independent of the number of people dropped off and the frequency
of arrival of cabs.
What is the distribution of the number of females dropped off by cabs?
[Solution: The distribution of female passengers per cab is Binomial with q = (0.4)(0.2) = 0.08 and
m = 5. There is no effect on the number of cabs (primary distribution.)
We get a Poisson-Binomial Distribution compound frequency distribution with λ = 1.3,
q = 0.08, and m = 5.]
This second manner of thinning a compound distribution affects the secondary distribution.
It is mathematically the same as what happens when one takes only the large claims in a frequency
and severity situation, when the frequency distribution itself is compound.115
For example, if frequency is Poisson-Binomial with λ = 1.3, q = 0.2, and m = 5, and 40% of the
claims are large. The number of large claims would be simulated by first getting a random draw from
the Poisson, then simulating the appropriate number of random Binomials, and then for each claim
from the Binomial there is a 40% chance of selecting it at random independent of any other claims.
This is mathematically the same as thinning the Binomial. Therefore, large claims have a Poisson-
Binomial Distribution compound frequency distribution with λ = 1.3, q = (0.4)(0.2) = 0.08 and m = 5.
115
This is what is considered in Section 8.6 of Loss Models.
Exercise: Let frequency be given by a Geometric-Binomial compound frequency distribution with

β = 1.5, q = 0.2, and m = 3. Severity follows an Exponential Distribution with mean 1000.
What is the frequency distribution of losses of size between 500 and 2000?
[Solution: The fraction of losses that are of size between 500 and 2000 is:
F(2000) - F(500) = (1-e-2000/1000) - (1-e-500/1000) = e-0.5 - e-2 = 0.4712. Thus the losses of size
between 500 and 2000 follow a Geometric-Binomial compound frequency distribution with
β = 1.5, q = (0.4712)(0.2) = 0.0942, and m = 3.]
Proof of Some Thinning Results:116
One can use the result for the probability generating function for a compound distribution,
p.g.f. of compound distribution = p.g.f. of primary distribution[p.g.f. of secondary distribution],
in order to determine the results of thinning a Poisson, Binomial, or Negative Binomial Distribution.
Assume one has a Poisson Distribution with mean λ.

Assume one selects at random 30% of the claims.
This is mathematically the same as a compound distribution with a primary distribution that is Poisson
with mean λ and a secondary distribution that is Bernoulli with q = 0.3.
The p.g.f. of the Poisson is P(z) = eλ(z-1).

The p.g.f. of the Bernoulli is P(z) = 1 + 0.3(z-1).
The p.g.f. of the compound distribution is obtained by replacing z in the p.g.f. of the primary
Poisson with the p.g.f. of the secondary Bernoulli:
P(z) = exp[λ{1 + 0.3(z-1) - 1}] = exp[(0.3λ)(z - 1)].
This is the p.g.f. of a Poisson Distribution with mean 0.3λ.

Thus the thinned distribution is also Poisson, with mean 0.3λ.
In general, when thinning a Poisson by a factor of t, the thinned distribution is also Poisson with mean
tλ.
116
Similarly, assume we are thinning a Binomial Distribution with parameters q and m.

The p.g.f. of the Binomial is P(z) = {1 + q(z-1)}m.
This is mathematically the same as a compound distribution with secondary distribution a Bernoulli
with mean t.
The p.g.f. of this compound distribution is: {1 + q(1 + t(z-1) -1)}m = {1 + tq(z-1))}m.
This is the p.g.f. of a Binomial Distribution with parameters tq and m.
In general, when thinning a Binomial by a factor of t, the thinned distribution is also Binomial with
parameters tq and m.
Assume we are thinning a Negative Binomial Distribution with parameters β and r.

The p.g.f. of the Negative Binomial is P(z) = {1 - β(z-1)}-r.
This is mathematically the same as a compound distribution with secondary distribution a Bernoulli
with mean t.
The p.g.f. of this compound distribution is: {1 - β(1 + t(z-1) -1)}-r = {1 - tβ((z-1)}-r.
This is the p.g.f. of a Negative Binomial Distribution with parameters r and tβ.
In general, when thinning a Negative Binomial by a factor of t, the thinned distribution is also
Negative Binomial with parameters tβ and r.117
Since thinning is mathematically the same as a compound distribution with secondary distribution a
Bernoulli with mean t, and the p.g.f. of the Bernoulli is 1 - t + tz,
the p.g.f. of the thinned distribution is P(1 - t + tz),
where P(z) is the p.g.f. of the original distribution. In general, P(0) = f(0).
Thus the density at zero for the thinned distribution is: P(1 - t + t0) = P(1 - t).
The density of the thinned distribution at zero is the p.g.f. of the original distribution at 1 - t.118
Let us assume instead we start with a zero-modified distribution.

Let P(z) be the p.g.f. of the original distribution prior to being zero-modified.
M P(z) - f(0)
Then PZM(z) = pM M M
0 + (1 - p 0 ) PZT(z) = p 0 + (1 - p 0 ) 1 - f(0) .
Now the density at zero for the thinned version of the original distribution is: P(1 - t).
The density at zero for the thinned version of the original distribution is:
M P(1- t) - f(0) 1 - P(1- t)
pM . ⇒ 1 - pM
M
0 * = PZM(1 - t) = p 0 + (1 - p 0 ) 0 * = (1 - pM 0 ) .
1 - f(0) 1 - f(0)
117
Including the special case the Geometric Distribution.
118
This general result was discussed previously with respect to thinning zero-modified distributions.
The p.g.f. of the thinned zero-modified distribution is:

M P(1 - t + tz) - f(0)
PZM(1 - t + tz) = pM0 + (1 - p 0 ) =
1 - f(0)
M P(1- t) - f(0) P(1 - t + tz) - f(0)

pM
0 * - (1 - p 0 ) + (1 - pM
0 ) =
1 - f(0) 1 - f(0)
P(1 - t + tz) - P(1 - t) P(1 - t + tz) - P(1 - t)

pM M
0 * + (1 - p 0 ) = pM
0 * + (1 - pM
0 *) .
1 - f(0) 1 - P(1 - t)
P(1 - t + tz) - P(1 - t)

Now, =
1 - P(1 - t)
(p.g.f. of thinned non- modified dist.) - (density at zero of thinned non- modified dist.)
.
1 - (density at zero of thinned non - modified distribution)
Therefore, the form of the p.g.f. of the thinned zero-modified distribution:
P(1 - t + tz) - P(1 - t)
pM M
0 * + (1 - p 0 *) 1 - P(1 - t)
,
is the usual form of the p.g.f. of a zero-modified distribution,

with the thinned version of the original distribution taking the place of the original distribution.
Therefore, provided thinning preserves the family of the original distribution, the thinned
zero-truncated distribution is of the same family with pM
0 *, and with the other parameters as per
thinning of the non-modified distribution. Specifically as discussed before:
Distribution Result of thinning by a factor of t

Zero-Modified Binomial q → tq m remains the same
pM m m M
0 - (1- q) + (1- tq) - p0 (1- tq)
m
pM
0 →
1 - (1- q)m
Zero-Modified Poisson λ → tλ
0 - e
pM
0 → 0
1 - e - λ
Zero-Modified Negative Binomial β → tβ r remains the same
pM
0 - (1+ β)
- r + (1+ tβ) - r - pM (1+ tβ)- r
pM
0 → 0
1 - (1+ β) - r
As discussed previously, things work similarly for a zero-modified Logarithmic.

Let P(z) be the p.g.f. of the original Logarithmic distribution prior to being zero-modified.
Then PZM(z) = pM M
0 + (1 - p 0 ) P(z).
Now the density at zero for the thinned version of the original distribution is: P(1 - t).
The density at zero for the thinned version of the zero-modified distribution is:
pM M M
0 * = PZM(1 - t) = p 0 + (1 - p 0 ) P(1 - t).
⇒ 1 - pM M
0 * = (1 - p 0 ) {1 - P(1 - t)}.
As before, since the p.g.f. of the secondary Bernoulli is 1 - t + tz,

the p.g.f. of the thinned zero-modified distribution is:
PZM(1 - t + tz) = pM M M M M
0 + (1 - p 0 ) P(1 - t + tz) = p 0 * - (1 - p 0 ) P(1 - t) + (1 - p 0 ) P(1 - t + tz) =
P(1 - t + tz) - P(1 - t)
pM M
0 * + (1 - p 0 *) 1 - P(1 - t)
.
P(1 - t + tz) - P(1 - t)

Now, =
1 - P(1 - t)
(p.g.f. of thinned non- modified dist.) - (density at zero of thinned non- modified dist.)
.
1 - (density at zero of thinned non - modified distribution)
Therefore, the form of the p.g.f. of the thinned zero-modified distribution:

P(1 - t + tz) - P(1 - t)
pM M
0 * + (1 - p 0 *) 1 - P(1 - t)
,
is the usual form of the p.g.f. of a zero-modified distribution,

with the thinned version of the original distribution taking the place of the original distribution.
Therefore, since thinning results in another Logarithmic, the thinned

zero-truncated distribution is of the same family with pM
0 *, and the other parameter as per thinning of
the non-modified distribution. As discussed before:
Zero-Modified Logarithmic β → tβ
M ln[1+ tβ]
0 → 1 - (1 - p 0 ) ln[1+ β]
pM
Problems:
15.1 (2 points) The number of accidents is Geometric with β = 1.7.

The number of claims per accident is Poisson with λ = 3.1.
For the total number of claims, what is the Probability Generating Function, P(z)?
exp[3.1(z - 1)]
A.
2.7 - 1.7z
1
B.
2.7 - 1.7exp[3.1(z - 1)]
C. exp[3.1(z - 1)] + (2.7 - 1.7z)
3.1(z - 1.7)
D. exp[ ]
2.7 - 1.7z
E. None of the above
15.2 (1 point) Frequency is given by a Poisson-Binomial compound frequency distribution, with

λ = 0.18, q = 0.3, and m = 3.
One third of all losses are greater than $10,000. Frequency and severity are independent.
What is frequency distribution of losses of size greater than $10,000?
A. Compound Poisson-Binomial with λ = 0.18, q = 0.3, and m = 3.
B. Compound Poisson-Binomial with λ = 0.18, q = 0.1, and m = 3.
C. Compound Poisson-Binomial with λ = 0.18, q = 0.3, and m = 1.
D. Compound Poisson-Binomial with λ = 0.06, q = 0.3, and m = 3.
15.3 (1 point) X is given by a Binomial-Geometric compound frequency distribution, with

q = 0.15, m = 3, and β = 2.3. Y is given by a Binomial-Geometric compound frequency distribution,
with q = 0.15, m = 5, and β = 2.3. X and Y are independent.
What is the distributional form of X + Y?
A. Compound Binomial-Geometric with q = 0.15, m = 4, and β = 2.3
B. Compound Binomial-Geometric with q = 0.15, m = 8, and β = 2.3
C. Compound Binomial-Geometric with q = 0.15, m = 4, and β = 4.6
D. Compound Binomial-Geometric with q = 0.15, m = 8, and β = 4.6
15.4 (2 points) A compound claims frequency model has the following properties:
(i) The primary distribution has probability generating function:
P(z) = 0.2z + 0.5z2 + 0.3z3 .
(ii) The secondary distribution has probability generating function:
P(z) = exp[0.7(z - 1)].
Calculate the probability of no claims from this compound distribution.
(A) 18% (B) 20% (C) 22% (D) 24% (E) 26%
15.5 (1 point) Assume each exposure has a Poisson-Poisson compound frequency distribution, as
per Loss Models, with λ1 = 0.03 and λ2 = 0.07. You insure 20,000 independent exposures. What
is the frequency distribution for your portfolio?
A. Compound Poisson-Poisson with λ1 = 0.03 and λ2 = 0.07
B. Compound Poisson-Poisson with λ1 = 0.03 and λ2 = 1400
C. Compound Poisson-Poisson with λ1 = 600 and λ2 = 0.07
D. Compound Poisson-Poisson with λ1 = 600 and λ2 = 1400

15.6 (2 points) Frequency is given by a Poisson-Binomial compound frequency distribution,

with parameters λ = 1.2, q = 0.1, and m = 4.
What is the Probability Generating Function?
A. {1+ 0.1(z - 1)}4 B. exp(1.2(z - 1))
C. exp[1.2({1 + 0.1(z - 1)}4 -1)] D. {1 + 0.1(exp[1.2(z - 1)] - 1)}4
15.7 (1 point) The total number of claims from a book of business with 100 exposures has a
Compound Poisson-Geometric Distribution with λ = 4 and β = 0.8.
Next year this book of business will have 75 exposures.
Next year, what is the distribution of the total number of claims from this book of business?
A. Compound Poisson-Geometric with λ = 4 and β = 0.8.
B. Compound Poisson-Geometric with λ = 3 and β = 0.8.
C. Compound Poisson-Geometric with λ = 4 and β = 0.6.
D. Compound Poisson-Geometric with λ = 3 and β = 0.6.
P(z) = 1 / (5 - 4z).
P(z) = (0.8 + 0.2z)3 .
Calculate the probability of no claims from this compound distribution.
(A) 28% (B) 30% (C) 32% (D) 34% (E) 36%
15.9 (1 point) The total number of claims from a group of 50 drivers has a
Compound Negative Binomial-Poisson Distribution with β = 0.4, r = 3, and λ = 0.7.
What is the distribution of the total number of claims from 500 similar drivers?
A. Compound Negative Binomial-Poisson with β = 0.4, r = 30, and λ = 0.7.
B. Compound Negative Binomial-Poisson with β = 4, r = 3, and λ = 0.7.
C. Compound Negative Binomial-Poisson with β = 0.4, r = 3, and λ = 7.
D. Compound Negative Binomial-Poisson with β = 4, r = 30, and λ = 7.
An actuary has created a compound claims frequency model with the following properties:
(i) The primary distribution is the negative binomial with probability generating function
P(z) = [1 - 3(z - 1)]-2.
(ii) The secondary distribution is the Poisson with probability generating function
P(z) = exp[λ(z - 1)].
(iii) The probability of no claims equals 0.067.
Calculate λ.
(A) 0.1 (B) 0.4 (C) 1.6 (D) 2.7 (E) 3.1
15.1. B. P(z) = P1 [P2 (z)].

The p.g.f of the primary Geometric is: 1/{1 - β(z-1)} = 1/{1 - 1.7(z-1)} = 1/(2.7 - 1.7z).
The p.g.f of the secondary Poisson is: exp[λ(z-1)] = exp[3.1(z-1)].
Thus the p.g.f. of the compound distribution is: 1 / {2.7 - 1.7exp[3.1(z-1)]}.
Comment: P(z) only exist for z < 1 + 1/β = 1 + 1/1.7.
15.2. B. We are taking 1/3 of the claims from the secondary Binomial. Thus the secondary
distribution is Binomial with q = 0.3/3 = 0.1 and m = 3. Thus the frequency distribution of losses of
size greater than $10,000 is given by a Poisson-Binomial compound frequency distribution, as per
Loss Models with λ = 0.18, q = 0.1, and m = 3.
15.3. B. Provided the secondary distributions are the same, the primary distributions add as they
usually would. The sum of two independent Binomials with the same q, is another Binomial with the
sum of the m parameters. In this case it is a Binomial with q = 0.15 and
m = 3 + 5 = 8. X + Y is a Binomial-Geometric with q = 0.15, m = 8, and β = 2.3.
Comment: The secondary distributions determine how many claims there are per accident. The
primary distributions determine how many accidents. In this case the Binomial distributions of the
number of accidents add.
15.4. E. P(z) = P1 [P2 (z)].

Density at 0 is: P(0) = P1 [P2 (0)] = P1 [e-0.7] = 0.2e-0.7 + 0.5e-1.4 + 0.3e-2.1 = 0.259.
Alternately, the primary distribution has 20% probability of 1, 50% probability of 2, and 30%
probability of 3, while the secondary distribution is a Poisson with λ = 0.7.
The density at zero of the secondary distribution is e-.7.
Therefore, the probability of zero claims for the compound distribution is:
(0.2)(Prob 0 from secondary) + (0.5)(Prob 0 from secondary)2 + (0.3)(Prob 0 from secondary)3
= 0.2e-0.7 + 0.5(e-0.7)2 + 0.3(e-0.7)3 = 0.259.
15.5. C. One adds up 20,000 independent identically distributed variables. In the case of a
Compound Poisson distribution, the primary Poissons add to give another Poisson with
λ 1 = (20000)(0.03) = 600. The secondary distribution stays the same.
The portfolio has a compound Poisson-Poisson with λ1 = 600 and λ2 = 0.07.

15.6. C. The p.g.f of the primary Poisson is exp(λ(z-1)) = exp(1.2(z-1)).

The p.g.f of the secondary Binomial is {1+ q(z-1)}m = {1+ .1(z-1)}4 .
Thus the p.g.f. of the compound distribution is P(z) = P1 [P2 (z)] = exp[1.2({1+ .1(z-1)}4 -1)].
15.7. B. Poisson-Geometric with λ = (75/100)(4) = 3 and β = 0.8.

Comment: One adjusts the primary Poisson distribution in a manner similar to that if one just had a
Poisson distribution.
15.8. D. P(z) = P1 [P2 (z)].

Density at 0 is: P(0) = P1 [P2 (0)] = P1 [.83 ] = 1/{5 - 4(.83 )} = 0.339.
Alternately, the secondary distribution is a Binomial with m = 3 and q = 0.2.
The density at zero of the secondary distribution is .83 .
Therefore, the probability of zero claims for the compound distribution is:
P1 [.83 ] = 1/{5 - 4(.83 )} = 0.339.
15.9. A. Negative Binomial-Poisson with β = 0.4, r = (500/50)(3) = 30, and λ = 0.7.

Comment: One adjusts the primary Negative Binomial distribution in a manner similar to that if one
just had a Negative Binomial distribution.
15.10. E. The p.g.f. of the compound distribution is the p.g.f. of the primary distribution at the p.g.f.
of the secondary distribution: P(z) = [1 - 3(exp[λ(z - 1)] - 1)]-2.
0.067 = f(0) = P(0) = [1 - 3(exp[λ(0 - 1)] - 1)]-2 = [1 - 3(exp[-λ] - 1)]-2.
⇒ 1 - 3(exp[-λ] - 1) = 3.8633. ⇒ exp[-λ] = .04555. ⇒ λ = 3.089.

Alternately, the Poisson secondary distribution at zero is e−λ.
From the first step of the Panjer Algorithm, c(0) = Pp [s(0)] = [1 - 3(e−λ - 1)]-2. Proceed as before.
Comment: P(z) = E[zn ] = Σ f(n)zn . Therefore, letting z approach zero, P(0) = f(0).
The probability generating function of the Negative Binomial only exists for z < 1 + 1/β = 4/3.
2016-C-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/21/15, Page 290
Section 16, Moments of Compound Frequency Distributions119
One may find it helpful to think of the secondary distribution as taking the role of a severity
distribution in the calculation of aggregate losses.120 Since the situations are mathematically
equivalent, many of the techniques and formulas that apply to aggregate losses apply to
compound frequency distributions.
For example, the same formulas for the mean, variance and skewness apply.121
Mean of Compound Dist. =

(Mean of Primary Dist.) (Mean of Secondary Dist.)
Variance of Compound Dist. =

(Mean of Primary Dist.) (Variance of Secondary Dist.) +
(Mean of Secondary Dist.)2 (Variance of Primary Dist.)
Skewness Compound Dist. =

{(Mean of Primary Dist.)(Variance of Second. Dist.)3/2(Skewness of Secondary Dist.) +
3(Variance of Primary Dist.)(Mean of Secondary Dist.)(Variance of Second. Dist.) +
(Variance of Primary Dist.)3/2(Skewness of Primary Dist.)(Mean of Second. Dist.)3 } /
(Variance of Compound Dist.)3/2
of passengers dropped off by any other taxicab.
119
See Section 7.1 of Loss Models, not on the syllabus. However, since compound distributions are mathematically
the same as aggregate distributions, I believe that a majority of the questions in this section would be legitimate
questions for your exam. Compound frequency distributions used to be on the syllabus.
120
In the case of aggregate losses, the frequency distribution determines how many independent identically
distributed severity variables we will sum.
121
The secondary distribution takes the place of the severity, while the primary distribution takes the place of the
frequency, in the formulas involving aggregate losses. σagg2 = µF σS2 + µS2 σF2.
Then the total number of passengers dropped off in a minute is a compound distribution compound
Poisson-Binomial distribution, with parameters λ = 1.3, m = 5, q = 0.4.
Exercise: What are the mean and variance of this compound distribution?
[Solution: The mean and variance of the primary Poisson Distribution are both 1.3.
The mean and variance of the secondary Binomial Distribution are
(0.4)(5) = 2 and (0.4)(0.6)(5) = 1.2.
Thus the mean of the compound distribution is: (1.3)(2) = 2.6.
The variance of the compound distribution is: (1.3)(1.2) + (2)2 (1.3) = 6.76.
Comment: Mathematically as if the number of claims is Poisson and the size of each claim is
Binomial.]
Thus in the case of the Heartbreak Hotel example, on average 2.6 passengers are dropped off per
minute. The variance of the number of passengers dropped off per minute is 6.76.
Exercise: What is the probability of more than 4 passengers being dropped off during the next
minute? Use the Normal Approximation with continuity correction.
[Solution: 1 - Φ[(4.5 - 2.6) / 6.76 ] = 1 - Φ[0.73] = 23.27%.]
Exercise: Assuming the minutes are independent, what is the probability of more than 40
passengers being dropped off during the next ten minutes?
Use the Normal Approximation with continuity correction.
[Solution: Over ten minutes the mean is (10)(2.6) = 26, and the variance is (10)(6.76) = 67.6.
1 - Φ[(40.5 - 26) / 67.6 ] = 1 - Φ[1.76] = 3.92%.]
Poisson Primary Distribution:
In the case of a Poisson primary distribution with mean λ, the variance of the compound distribution
could be rewritten as:
λ(Variance of Secondary Dist.) + (Mean of Secondary Dist.)2 λ =
λ(Variance of Secondary Dist. + Mean of Secondary Dist.2 ) =
λ(2nd moment of Secondary Distribution).
It also turns out that the third central moment of a compound Poisson distribution =
λ(3rd moment of Secondary Distribution).
For a Compound Poisson Distribution:
Mean = λ(mean of Secondary Distribution).
Variance = λ(2nd moment of Secondary Distribution).
3rd central moment = λ(3rd moment of Secondary Distribution).
Skewness = λ−0.5(3rd moment of Second. Dist.)/(2nd moment of Second. Dist.)1.5.122
Exercise: The number of accidents follows a Poisson Distribution with λ = 0.04.

Each accident generates 1, 2 or 3 claimants with probabilities 60%, 30%, and 10%.
Determine the mean, variance, and skewness of the total number of claimants.
[Solution: The secondary distribution has mean 1.5, second moment 2.7, and third moment 5.7.
Thus the mean number of claimants is: (0.04)(1.5) = 0.06.
The variance of the number of claimants is: (0.04)(2.7) = 0.108.
The skewness of the number of claimants is: (0.04-.5)(5.7)/(2.7)1.5 = 6.42.]
122
Skewness = (third central moment)/ Variance1.5.
Problems:
16.1 (1 point) For a compound distribution:

Mean of primary distribution = 15.
Standard Deviation of primary distribution = 3.
Mean of secondary distribution = 10.
Standard Deviation of secondary distribution = 4.
What is the standard deviation of the compound distribution?
A. 26 B. 28 C. 30 D. 32 E. 34
16.2 (2 points) The number of accidents follows a Poisson distribution with mean 10 per month.
Each accident generates 1, 2, or 3 claimants with probabilities 40%, 40%, 20%, respectively.
Calculate the variance in the total number of claimants in a year.
A. 250 B. 300 C. 350 D. 400 E. 450

The number of customers per minute is Geometric with β = 1.7.
The number of items sold to each customer is Poisson with λ = 3.1.
The number of items sold per customer is independent of the number of customers.
16.3 (1 point) What is the mean of the total number of items sold per minute?
A. less than 5.0
E. at least 6.5
16.4 (1 point) What is the variance of the total number of items sold per minute?
A. less than 50
E. at least 53
16.5 (2 points) What is the chance that more than 4 items are sold during the next minute?
A. 46% B. 48% C. 50% D. 52% E. 54%
16.6 (3 points) A dam is proposed for a river which is currently used for salmon breeding.
You have modeled:
(i) For each hour the dam is opened the number of female salmon that will pass through and
reach the breeding grounds has a distribution with mean 50 and variance 100.
(ii) The number of eggs released by each female salmon has a distribution
with mean of 3000 and variance of 1 million.
(iii) The number of female salmon going through the dam each hour it is open and the
numbers of eggs released by the female salmon are independent.
Using the normal approximation for the aggregate number of eggs released, determine the least
number of whole hours the dam should be left open so the probability that 2 million eggs will be
released is greater than 99.5%.
(A) 14 (B) 15 (C) 16 (D) 17 (E) 18
16.7 (3 points) The claims department of an insurance company receives envelopes with claims for
insurance coverage at a Poisson rate of λ = 7 envelopes per day. For any period of time, the
number of envelopes and the numbers of claims in the envelopes are independent. The numbers
of claims in the envelopes have the following distribution:
1 0.60
2 0.30
3 0.10
Using the normal approximation, calculate the 99th percentile of the number of claims
received in 5 days.
(A) 73 (B) 75 (C) 77 (D) 79 (E) 81
16.8 (3 points) The number of persons using an ATM per hour has a Negative Binomial Distribution
with β = 2 and r = 13. Each hour is independent of the others.
The number of transactions per person has the following distribution:
Number of Transactions Probability
1 0.30
2 0.40
3 0.20
4 0.10
Using the normal approximation, calculate the 80th percentile of the number of transactions in 5
hours.
A. 300 B. 305 C. 310 D. 315 E. 320

• The number of automobile accidents follows a Negative Binomial distribution
with β = 0.6 and r = 100.
• For each automobile accident the number of claimants with bodily injury follows
a Binomial Distribution with q = 0.1 and m = 4.
• The number of claimants with bodily injury is independent between accidents.
16.9 (2 points) Calculate the variance in the total number of claimants.

(A) 33 (B) 34 (C) 35 (D) 36 (E) 37
16.10 (1 point) What is probability that there are 20 or fewer claimants in total?
(A) 22% (B) 24% (C) 26% (D) 28% (E) 30%
16.11 (3 points) The amount of the payment to each claimant follows a Gamma Distribution with
α = 3 and θ = 4000. The amount of payments to different claimants are independent of each other
and are independent of the number of claimants.
What is the probability that the aggregate payment exceeds 300,000?
(A) 44% (B) 46% (C) 48% (D) 50% (E) 52%
16.12 (3 points) The number of batters per half-inning of a baseball game is:
3 + a Negative Binomial Distribution with β = 1 and r = 1.4.
The number of pitches thrown per batter is:
1 + a Negative Binomial Distribution with β = 1.5 and r = 1.8.
What is the probability of more than 30 pitches in a half-inning?
Use the normal approximation with continuity correction.
A. 1/2% B. 1% C. 2% D. 3% E. 4%
16.13 (3 points) The number of taxicabs that arrive per minute at the Gotham City Railroad Station
is Poisson with mean 5.6. The number of passengers dropped off at the station by each taxicab is
Binomial with q = 0.3 and m = 4. The number of passengers dropped off by each taxicab is
independent of the number of taxicabs that arrive and is independent of the number of passengers
dropped off by any other taxicab. Using the normal approximation for the aggregate passengers
dropped off, determine the least number of whole minutes one must observe in order that the
probability that at least 1000 passengers will be dropped off is greater than 90%.
A. 155 B. 156 C. 157 D. 158 E. 159
16.14 (4 points) At a storefront legal clinic, the number of lawyers who volunteer to provide legal aid
to the poor on any day is uniformly distributed on the integers 1 through 4. The number of hours
each lawyer volunteers on a given day is Binomial with q = 0.6 and m = 7. The number of clients
that can be served by a given lawyer per hour is a Poisson distribution with mean 5.
Determine the probability that 40 or more clients can be served in a day at this storefront law clinic,
using the normal approximation.
(A) 69% (B) 71% (C) 73% (D) 75% (E) 77%

The number of persons entering a library per minute is Poisson with λ = 1.2.
The number of books returned per person is Binomial with q = 0.1 and m = 4.
The number of books returned per person is independent of the number of persons.
16.15 (1 point) What is the mean number of books returned per minute?
A. less than 0.5
E. at least 0.9
16.16 (1 point) What is the variance of the number of books returned per minute?
A. less than 0.6
E. at least 0.9
16.17 (1 point) What is the probability of observing more than two books returned in the next
minute?
A. less than 0.6%
E. at least 0.9%
16.18 (2 points) Yosemite Sam is panning for gold.

The number of pans with gold nuggets he finds per day is Poisson with mean 3.
The number of nuggets per such pan are: 1, 5, or 25, with probabilities: 90%, 9%, and 1%
respectively.
The number of pans and the number of nuggets per pan are independent.
Using the normal approximation with continuity correction, what is the probability that the number of
nuggets found by Sam over the next ten day is less than 30?
(A) Φ(-1.2) (B) Φ(-1.1) (C) Φ(-1.0) (D) Φ(-0.9) (E) Φ(-0.8)
16.19 (3 points) Frequency and seventy are independent.

Frequency and severity are each members of the (a, b, 0) class of distributions.
Mean frequency is 2. Variance of frequency is 4.
Mean aggregate is 3. Variance of aggregate is 13.5.
Determine the probability that the aggregate losses are zero.
A. 34% B. 36% C. 38% D. 40% E. 42%
16.20 (4 points) At a food bank, people volunteer their time on a daily basis.
The number of people who volunteer on any day is a zero-truncated Binomial Distribution with
m = 10 and q = 0.3.
The number of hours that each person helps at the food bank is a zero-truncated Binomial
Distribution with m = 3 and q = 0.4.
The number of volunteers and the number of hours they each help are independent.
Determine the probability that on a day fewer than 4 volunteer hours will be available,
using the normal approximation with continuity correction.
A. 24% B. 26% C. 28% D. 30% E. 32%
P1 (z) = exp[0.4z - 0.4].
P2 (z) = exp[5z - 5].
Determine the variance of the compound distribution.
A. 2.8 B. 4 C. 5.4 D. 12 E. 16
16.22 (3 points) Frequency is Poisson with mean 3.

Size of loss is Geometric with mean 10.
There is deductible of 5.
What is the standard deviation of aggregate payments?
A. 17 B. 18 C. 19 D. 20 E. 21
16.23 (3, 11/00, Q.2 & 2009 Sample Q.112) (2.5 points) In a clinic, physicians volunteer their
time on a daily basis to provide care to those who are not eligible to obtain care otherwise. The
number of physicians who volunteer in any day is uniformly distributed on the integers 1 through 5.
The number of patients that can be served by a given physician has a Poisson distribution with
mean 30.
Determine the probability that 120 or more patients can be served in a day at the clinic,
using the normal approximation with continuity correction.
(A) 1 - Φ(0.68) (B) 1 - Φ(0.72) (C) 1 - Φ(0.93) (D) 1 - Φ(3.13) (E) 1 - Φ(3.16)
16.24 (3, 5/01, Q.16 & 2009 Sample Q.106) (2.5 points) A dam is proposed for a river which is
currently used for salmon breeding. You have modeled:
(i) For each hour the dam is opened the number of salmon that will pass through and
reach the breeding grounds has a distribution with mean 100 and variance 900.
(ii) The number of eggs released by each salmon has a distribution with mean of 5
and variance of 5.
(iii) The number of salmon going through the dam each hour it is open and the
numbers of eggs released by the salmon are independent.
Using the normal approximation for the aggregate number of eggs released, determine the least
number of whole hours the dam should be left open so the probability that 10,000 eggs will be
released is greater than 95%.
(A) 20 (B) 23 (C) 26 (D) 29 (E) 32
16.25 (3, 5/01, Q.36 & 2009 Sample Q.111) (2.5 points)
The number of accidents follows a Poisson distribution with mean 12.
Each accident generates 1, 2, or 3 claimants with probabilities 1/2, 1/3, 1/6, respectively.
Calculate the variance in the total number of claimants.
(A) 20 (B) 25 (C) 30 (D) 35 (E) 40
16.26 (3, 11/01, Q.30) (2.5 points) The claims department of an insurance company receives
envelopes with claims for insurance coverage at a Poisson rate of λ = 50 envelopes per week.
For any period of time, the number of envelopes and the numbers of claims in the envelopes are
independent. The numbers of claims in the envelopes have the following distribution:
1 0.20
2 0.25
3 0.40
4 0.15
Using the normal approximation, calculate the 90th percentile of the number of claims
received in 13 weeks.
(A) 1690 (B) 1710 (C) 1730 (D) 1750 (E) 1770
16.27 (3, 11/02, Q.27 & 2009 Sample Q.93) (2.5 points) At the beginning of each round of a
game of chance the player pays 12.5. The player then rolls one die with outcome N. The player
then rolls N dice and wins an amount equal to the total of the numbers showing on the N dice.
All dice have 6 sides and are fair.
Using the normal approximation, calculate the probability that a player starting with
15,000 will have at least 15,000 after 1000 rounds.
(A) 0.01 (B) 0.04 (C) 0.06 (D) 0.09 (E) 0.12
16.28 (CAS3, 5/04, Q.26) (2.5 points) On Time Shuttle Service has one plane that travels from
Appleton to Zebrashire and back each day.
Flights are delayed at a Poisson rate of two per month.
Each passenger on a delayed flight is compensated $100.
The numbers of passengers on each flight are independent and distributed with mean 30 and
standard deviation 50.
(You may assume that all months are 30 days long and that years are 360 days long.)
Calculate the standard deviation of the annual compensation for delayed flights.
A. Less than $25,000
B. At least $25,000, but less than $50,000
C. At least $50,000, but less than $75,000
D. At least $75,000, but less than $100,000
E. At least $100,000
16.29 (SOA M, 11/05, Q.18 & 2009 Sample Q.205) (2.5 points) In a CCRC, residents start
each month in one of the following three states: Independent Living (State #1), Temporarily in a
Health Center (State #2) or Permanently in a Health Center (State #3). Transitions between states
occur at the end of the month. If a resident receives physical therapy, the number of sessions that
the resident receives in a month has a geometric distribution with a mean which depends on the
state in which the resident begins the month. The numbers of sessions received are independent.
The number in each state at the beginning of a given month, the probability of needing physical
therapy in the month, and the mean number of sessions received for residents receiving therapy are
displayed in the following table:
State# Number in state Probability of needing therapy Mean number of visits
1 400 0.2 2
2 300 0.5 15
3 200 0.3 9
Using the normal approximation for the aggregate distribution, calculate the probability that
more than 3000 physical therapy sessions will be required for the given month.
(A) 0.21 (B) 0.27 (C) 0.34 (D) 0.42 (E) 0.50
16.30 (SOA M, 11/05, Q.39 & 2009 Sample Q.213) (2.5 points) For an insurance portfolio:
(i) The number of claims has the probability distribution
n pn
0 0.1
1 0.4
2 0.3
3 0.2
(ii) Each claim amount has a Poisson distribution with mean 3; and
(iii) The number of claims and claim amounts are mutually independent.
Calculate the variance of aggregate claims.
(A) 4.8 (B) 6.4 (C) 8.0 (D) 10.2 (E) 12.4
16.31 (CAS3, 5/06, Q.35) (2.5 points)

The following information is known about a consumer electronics store:
• The number of people who make some type of purchase follows a Poisson distribution with
a mean of 100 per day.
• The number of televisions bought by a purchasing customer follows a Negative Binomial
distribution with parameters r = 1.1 and β = 1.0.
Using the normal approximation, calculate the minimum number of televisions the store must have in
its inventory at the beginning of each day to ensure that the probability of its inventory being
depleted during that day is no more than 1.0%.
A. Fewer than 138
B. At least 138, but fewer than 143
C. At least 143, but fewer than 148
D. At least 148, but fewer than 153
E. At least 153
You are the producer for the television show Actuarial Idol.
Each year, 1000 actuarial clubs audition for the show.
The probability of a club being accepted is 0.20.
The number of members of an accepted club has a distribution with mean 20 and variance 20.
Club acceptances and the numbers of club members are mutually independent.
Your annual budget for persons appearing on the show equals 10 times the expected number
of persons plus 10 times the standard deviation of the number of persons.
Calculate your annual budget for persons appearing on the show.
(A) 42,600 (B) 44,200 (C) 45,800 (D) 47,400 (E) 49,000
16.1. E. Standard deviation of the compound distribution is:

(15)(42 ) + (102 )(32) = 1140 = 33.8.
16.2. E. The frequency over a year is Poisson with mean: (12)(10) = 120 accidents.
Second moment of the secondary distribution is: (40%)(12 ) + (40%)(22 ) + (20%)(32 ) = 3.8.
Variance of compound distribution is: (120)(3.8) = 456.
16.3. B. The mean of the primary Geometric Distribution is 1.7. The mean of the secondary
Poisson Distribution is 3.1. Thus the mean of the compound distribution is: (1.7)(3.1) = 5.27.
16.4. A. Geometric acts as frequency. Mean of the Geometric Distribution is 1.7.

Variance of the Geometric is: (1.7) (1 + 1.7) = 4.59.
Poisson acts as severity. Mean of the Poisson Distribution is 3.1.
Variance of the Poisson Distribution is 3.1.
The variance of the compound distribution is: (1.7) (3.1) + (3.12 ) (4.59) = 49.38.
Comment: The variance of the compound distribution is large compared to its mean. A very large
number of items can result if there are a large number of customers from the Geometric combined
with some of those customers buying a large numbers of items from the Poisson.
Compound distributions tend to have relatively heavy tails.
16.5. E. From the previous solutions, the mean of the compound distribution is 5.27, and the
variance of the compound distribution is 49.38. Thus the standard deviation is 7.03.
1 - Φ[(4.5 - 5.27)/7.03)] = 1 - Φ(-0.11) = Φ(0.11) = 0.5438.
16.6. C. Over y hours, the number of salmon has mean 50y and variance 100y.
The mean aggregate number of eggs is: (50y)(3000) = 150000y.
The standard deviation of the aggregate number of eggs is:
(50y)(10002 ) + (30002)(100y) = 30822 y .
Thus the probability that the aggregate number of eggs is < 2 million is approximately:
Φ((1999999.5 - 150000y)/30822 y ).
Since Φ(2.576) = .995, this probability will be 1/2% if:
(1999999.5 - 150000y)/ 30822 y = - 2.576 ⇒ 150000y - 79397 y - 1999999.5 = 0.
y = (79397 ± 793972 + (4)(150000)(1999999.5) )/ ((2)(150000)) = 0.2647 ± 3.6611.

y = 3.926. ⇒ y = 15.4. The smallest whole number of hours is therefore 16.
Alternately, try the given choices and stop when (Mean - 2 million)/StdDev. > 2.576.
Hours Mean Standard Deviation # of Claims
14 2,100,000 115,325 0.867
15 2,250,000 119,373 2.094
16 2,400,000 123,288 3.244
17 2,550,000 127,082 4.328
18 2,700,000 130,767 5.353
Note that since the variance over one hour is 100, the variance of the number of salmon over two
hours is: (2)(100) = 200.
Number of salmon over two hours = number over the first hour + number over the second hour.
⇒ Var[Number over two hours] = Var[number over first hour] + Var[number over second hour]
= 2 Var[number over an hour]. We are adding independent random variables, rather than
multiplying an individual variable by a constant.
16.7. B. The mean frequency over 5 days is: (7)(5) = 35.

Mean number of claims per envelope is: (60%)(1) + (30%)(2) + (10%)(3) = 1.5.
Mean of compound distribution is: (35)(1.5) = 52.5.
Second moment of number of claims per envelope is: (60%)(12 ) + (30%)(22 ) + (10%)(32 ) = 2.7.
Variance of compound distribution is: (35)(2.7) = 94.5.
99th percentile ≅ mean + (2.326)(standard deviations) = 52.5 + (2.326) 94.5 = 75.1.
16.8. C. The number of persons has mean: (13)(2) = 26,

and variance: (13)(2)(2 + 1) = 78.
The number of transactions per person has mean:
(30%)(1) + (40%)(2) + (20%)(3) + (10%)(4) = 2.1,
second moment: (30%)(12 ) + (40%)(22 ) + (20%)(32 ) + (10%)(42 ) = 5.3,
and variance: 5.3 - 2.12 = 0.89.
The number of transactions in an hour has mean: (26)(2.1) = 54.6,
and variance: (26)(.89) + (2.12 )(78) = 367.12.
The number of transactions in 5 hours has mean: (5)(54.6) = 273,
and variance: (5)(367.12) = 1835.6.
Φ(0.842) = 80%. 80th percentile ≅ 273 + (0.842) 1835.6 = 309.1.
16.9. E. Mean of the Primary Negative Binomial = (100)(0.6) = 60.

Variance of the Primary Negative Binomial = (100)(0.6)(1.6) = 96.
Mean of the Secondary Binomial = (4)(0.1) = 0.4.
Variance of the Secondary Binomial = (4)(0.1)(0.9) = .36.
Variance of the Compound Distribution = (60)(.36) + (0.42 )(96) = 36.96.
16.10. D. Mean of the Compound Distribution = (60) (0.4) = 24.

Prob[# claimants ≤ 20] ≅ Φ[(20.5 - 24)/ 36.96 ] = Φ(-0.58) = 1 - 0.7190 = 28.1%.
16.11. A. Mean Frequency: 24. Variance of Frequency: 36.96.

Mean Severity: (3)(4000) = 12,000. Variance of Severity: (3)(40002 ) = 48,000,000.
Mean Aggregate Loss = (24)(12000) = 288,000.
Variance of the Aggregate Loss = (24)(48,000,000) + (12,0002 )(36.96) = 6474 million.
Prob[Aggregate loss > 300000] ≅ 1 - Φ((300000 - 288000)/ 6474 million n) =
1 - Φ(0.15) = 1 - 0.5596 = 44%.
16.12. E. The number of batters has mean: 3 + (1.4)(1) = 4.4, and variance: (1.4)(1)(1 + 1) = 2.8.
The number of pitches per batter has mean: 1 + (1.8)(1.5) = 3.7,
and variance: (1.8)(1.5)(1 + 1.5) = 6.75.
The number of pitches per half-inning has mean: (4.4)(3.7) = 16.28,
and variance: (4.4)(6.75) + (3.72 )(2.8) = 68.032.
Prob[# pitches > 30] ≅ 1 - Φ[(30.5 - 16.28)/ 68.032 ] = 1 - Φ(1.72) = 4.27%.
16.13. D. Over y minutes, the number of taxicabs has mean 5.6y and variance 5.6y.
The passengers per cab has mean: (0.3)(4) = 1.2, and variance: (0.3)(1 - 0.3)(4) = 0.84.
The mean aggregate number of passengers is: (5.6y)(1.2) = 6.72y.
The standard deviation of the aggregate number of passengers is:
(5.6y)(0.84) + (1.22)(5.6y) = 3.573 y .
Thus the probability that the aggregate number of passengers is ≥ 1000 is approximately:
1 - Φ[(999.5 - 6.72y)/3.573 y ]. Since Φ(1.282) = 0.90, this probability will be greater than 90% if:
(Mean - 999.5) / StdDev. = (6.72y - 999.5) / (3.573 y ) > 1.282.

Try the given choices and stop when (Mean - 999.5) / StdDev. > 1.282.
Minutes Mean Standard Deviation (Mean - 999.5) / StdDev
155 1,041.6 44.48 0.946
156 1,048.3 44.63 1.094
157 1,055.0 44.77 1.241
158 1,061.8 44.91 1.386
159 1,068.5 45.05 1.531
The smallest whole number of minutes is therefore 158.
16.14. A. The mean number of lawyers is: 2.5 and the variance is:
{(1 - 2.5)2 + (2 - 2.5)2 + (3 - 2.5)2 + (4 - 2.5)2 }/4 = 1.25.
The mean number of hours per lawyer is: (7)(.6) = 4.2 and the variance is: (7)(.4)(.6) = 1.68.
Therefore, the total number of hours volunteered per day has mean: (2.5)(4.2) = 10.5 and variance:
(2.5)(1.68) + (4.22 )(1.25) = 26.25.
The number of clients per hour has mean 5 and variance 5.
Therefore, the total number of clients per day has mean: (5)(10.5) = 52.5,
and variance: (10.5)(5) + (52 )(26.25) = 708.75.
Prob[# clients ≥ 40] ≅ 1 - Φ[(39.5 - 52.5)/ 708.75 ] = 1 - Φ(-.49) = 68.79%.
Alternately, the mean number of clients per lawyer is: (4.2)(5) = 21
with variance: (4.2)(5) + (52 )(1.68) = 63.
Therefore, the total number of clients per day has mean: (2.5)(21) = 52.5 and
variance: (2.5)(63) + (212 )(1.25) = 708.75. Proceed as before.
16.15. A. The mean of the primary Poisson Distribution is 1.2.

The mean of the secondary Binomial Distribution is: (4)(.1) = .4.
Thus the mean of the compound distribution is: (1.2)(.4) = 0.48.
16.16. B. The mean of the primary Poisson Distribution is 1.2. The mean of the secondary
Binomial Distribution is: (4)(0.1) = 0.4. The variance of the primary Poisson Distribution is 1.2.
The variance of the secondary Binomial Distribution is: (4)(0.1)(.9) = 0.36.
The variance of the compound distribution is: (1.2)(0.36) + (0.4)2 (1.2) = 0.624.
16.17. A. The compound distribution has mean of .48 and variance of .624.
Prob[# books > 2] ≅ 1 - Φ[(2.5 - 0.48)/ 0.624 ] = 1 - Φ(2.56) = 1 - 0.9948 = 0.0052.
16.18. B. The mean number of nuggets per pan is: (90%)(1) + (9%)(5) + (1%)(25) = 1.6.
2nd moment of the number of nuggets per pan is: (90%)(12 ) + (9%)(52 ) + (1%)(252 ) = 9.4.
Mean aggregate over 10 days is: (10)(3)(1.6) = 48.
Variance of aggregate over 10 days is: (10)(3)(9.4) = 282.
Prob[aggregate < 30] ≅ Φ[(29.5 - 48)/ 282 ] = Φ(-1.10) = 13.57%.
16.19. A. Variance of frequency is greater than the mean, so of the (a, b, 0) class we must have a
Negative Binomial Distribution. rβ = 2, and rβ(1+β) = 4. ⇒ r = 2 and β = 1.
Let X be severity. Then: 2 E[X] = 3. ⇒ E[X] = 1.5.
13.5 = 2 Var[X] + 4 E[X]2 . ⇒ Var[X] = 2.25.

Variance of severity is greater than the mean, so of the (a, b, 0) class we must have a Negative
Binomial Distribution. rβ = 1.5, and rβ(1+β) = 2.25. ⇒ r = 3 and β = 0.5.
Compound density at zero is the p.g.f. of the primary at density at 0 of the secondary.
In other words, compound density at zero is p.g.f. of the frequency at density at 0 of the discrete
severity distribution.
Density at zero of the discrete severity is: 1 / 1.53 = 0.2963.
Probability Generating Function of Frequency is: {1 - β(z-1)}-r = {{1 - (1)(z-1)}-2 = (2 - z)-2.
This probability generating function at 0.2963 is: (2 - 0.2963)-2 = 34.45%.
mq
16.20. D. The mean of each zero truncated Binomial is: .
1 - (1- q)m
(10)(0.3)
For the number of volunteers: = 3.0872.
1 - 0.710
(3)(0.4)
For the hours per volunteer: = 1.5306.
1 - 0.6 3
Thus the mean number of hours per day is: (3.0872)(1.5306) = 4.7253.
mq {(1- q) - (1 - q + mq) (1- q)m }
The variance of each zero truncated Binomial is: .
{1 - (1- q)m}2
(10)(0.3) {(0.7) - (1 - 0.3 + 3) (0.710)}

Variance for the number of volunteers: = 1.8918.
{1 - 0.710}2
(3)(0.4) {(0.6) - (1 - 0.4 + 1.2) (0.63)}

Variance for the hours per volunteer: = 0.4123.
{1 - 0.63}2
Thus the variance of the number of hours per day is:
(3.0872)(0.4123) + (1.53062 )(1.8918) = 5.7048.
Prob[fewer than 4 hours] = Φ[(3.5 - 4.7253) / 5.7048 ] = Φ[-0.51] = 30.5%.
Comment: Similar to 3, 11/00, Q.2 (2009 Sample Q.112).
The number of volunteers acts as frequency, while the number of hours per volunteer acts as
severity.
Using a computer and the Panjer algorithm, discussed in “Mahlerʼs Guide to Aggregate
Distributions”: Prob[1 hour] = 6.86%, Prob[2 hours] = 11.87%, and Prob[3 hours] = 15.34%.
Thus the Prob[fewer than 4 hours] = 6.86% + 11.87% + 15.34% = 34.07%.
16.21. D. The primary distribution is a Poisson with mean 0.4.

The secondary distribution, which acts like severity, is a Poisson with mean 5.
Thus the variance of the compound (or aggregate) distribution is:
(0.4)(5) + (52 )(0.4) = 12.
Alternately, the compound distribution has p.g.f.: P(z) = P1 [P2 (z)] = exp[0.4exp(5z - 5) - 0.4].
Pʼ(z) = (0.4)(5) exp[5z - 5] exp[0.4exp(5z - 5) - 0.4] = 2 exp[0.4exp(5z - 5) + 5z - 5.4].
2 = Pʼ(1) = first factorial moment = mean.
Pʼʼ(z) = 2 exp[0.4exp(5z - 5) + 5z - 5.4] {(0.4)(5) exp[5z - 5] + 5}.
(2)(1)(7) = Pʼʼ(1) = second factorial moment = E[X(X-1)] = second moment - mean.
Therefore, second moment = 14 + 2 = 16.
Thus the variance of the compound distribution is: 16 - 22 = 12.
Comment: The compound distribution is a Neyman Type A, as shown in Appendix B.4 of
Loss Models, not on the syllabus.
16.22. D. The probability of a non-zero payment is the survival function of the geometric at 5:
{β/(1+β)}6 = (10/11)6 = 0.5645.
Thus the number of non-zero payments is Poisson with mean: (0.5645)(3) = 1.6935.
Due to the memoryless property of the Geometric Distribution, the non-zero payments follow a
zero-truncated Geometric with β = 10, with mean 1 + 10 = 11, and variance (10)(11) = 110.
The total payments are the sum of the non-zero payments.
Thus the variance of the total payments is:
(mean freq.) (var. of sev.) + (mean sev.)2 (var. freq.) =
(1.6935)(10)(11) + (112 )(1.6935) = 391.2.
Standard deviation of aggregate payments is: 391.2 = 19.8.
Alternately, the probability of a non-zero payment is:
f(6) + f(7) + f(8) + ... = 106 /117 + 107 /118 + 108 /119 + ....
= (106 /117 ) / (1 - 10/11) = (10/11)6 = 0.5645.
Given there is a non-zero payment, the probability that it is for example 8 - 5 = 3 is:
f(8) / (10/11)6 = (108 /119 ) / (10/11)6 = 102 /113 , which is density at 3 of a zero-truncated Geometric
Distribution with β = 10. One can show in a similar manner that the non-zero payments follow a
zero-truncated Geometric Distribution with β = 10. Proceed as before.
Comment: For example, the sizes of loss could be in units of hundreds of dollars.
16.23. A. This is a compound frequency distribution with a primary distribution that is discrete and
uniform on 1 through 5 and with secondary distribution which is Poisson with λ = 30. The primary
distribution has mean of 3 and second moment of:
(12 + 22 + 32 + 42 + 52 )/5 = 11. Thus the primary distribution has variance: 11 - 32 = 2.
Mean of the Compound Dist. = (Mean of Primary Dist.)(Mean of Secondary Dist.) = (3)(30) = 90.
Variance of the Compound Distribution = (Mean of Primary Dist.)(Variance of Secondary Dist.) +
(Mean of Secondary Dist.)2 (Variance of Primary Dist.) = (3)(30) + (302 )(2) = 1890.
Probability of 120 or more patients ≅ 1 - Φ[(119.5 - 90)/ 1890 ] = 1 - Φ(0.68).
16.24. B. Over y hours, the number of salmon has mean 100y and variance 900y.
The mean aggregate number of eggs is: (100y)(5) = 500y.
The variance of the aggregate number of eggs is: (100y)(5) + (52 )(900y) = 23000y.
Thus the probability that the aggregate number of eggs is < 10000 is approximately:
Φ((9999.5 - 500y)/ 23000y ). Since Φ(1.645) = 0.95, this probability will be 5% if:
(9999.5 - 500y)/ 23000y = - 1.645 ⇒ 500y - 249.98 y - 9999.5 = 0.
y = {249.48 ± 249.482 + (4)(500)(9999.5) } / {(2)(500)} = 0.24948 ± 4.479.

y = 4.729. ⇒ y = 22.3. The smallest whole number of hours is therefore 23.
Alternately, calculate the probability for each of the number of hours in the choices.
Hours Mean Variance Probability of at least 10,000 eggs
20 10,000 460,000 1 - Φ((9999.5 - 10000)/ 460,000 ) = 1 - Φ(-0.0007) = 50.0%
23 11,500 529,000 1 - Φ((9999.5 - 11500)/ 529,000 ) = 1 - Φ(-2.063) = 98.0%
26 13,000 598,000 1 - Φ((9999.5 - 13000)/ 598,000 ) = 1 - Φ(-3.880) = 99.995%
Thus 20 hours is not enough and 23 hours is enough so that the probability is greater than 95%.
Comment: The number of salmon acts as the primary distribution, and the number of eggs per
salmon as the secondary distribution. This exam question should have been worded better. They
intended to say “so the probability that at least 10,000 eggs will be released is greater than 95%.”
The probability of exactly 10,000 eggs being released is very small.
16.25. E. The second moment of the number of claimants per accident is:
(1/2)(12 ) + (1/3(22 ) + (1/6)(32 ) = 3.333. The variance of a Compound Poisson Distribution is:
λ(2nd moment of the secondary distribution) = (12)(3.333) = 40.
Alternately, thinning the original Poisson, those accidents with 1, 2, or 3 claimants are independent
Poissons. Their means are: (1/2)(12) = 6, (1/3)(12) = 4, and (1/6)(12) = 2.
Number of accidents with 3 claimants is Poisson with mean 2 ⇒
The variance of the number of accidents with 3 claimants is 2.
Number of claimants for those accidents with 3 claimants = (3)(# of accidents with 3 claimants) ⇒
The variance of the # of claimants for those accidents with 3 claimants is: (32 )(2).
Due to independence, the variances of the three processes add: (12 )(6) + (22 )(4) + (32 )(2) = 40.
16.26. B. Mean # claims / envelope = (1)(0.2) + (2)(0.25) + (3)(0.4) + (4)(0.15) = 2.5.

2nd moment # claims / envelope = (12 )(0.2) + (22 )(0.25) + (32 )(0.4) + (42 )(0.15) = 7.2.
Over 13 weeks, the number of envelopes is Poisson with mean: (13)(50) = 650.
Mean of the compound distribution = (650)(2.5) = 1625.
Variance of the aggregate number of claims = Variance of a compound Poisson distribution =
(mean primary Poisson distribution)(2nd moment of the secondary distribution) = (650)(7.2) =
4680. Φ(1.282) = 0.90. Estimated 90th percentile = 1625 + 1.282 4680 = 1713.
16.27. E. The amount won per a round of the game is a compound frequency distribution.
Primary distribution (determining how many dice are rolled) is a six-sided die, uniform and discrete on
1 through 6, with mean 3.5, second moment (12 + 22 + 32 + 42 +52 + 62 )/6 = 91/6,
and variance 91/6 - 3.52 = 35/12.
Secondary distribution is also a six-sided die, with mean 3.5 and variance 35/12.
Mean of the compound distribution is: (3.5)(3.5) = 12.25.
Variance of the compound distribution is: (3.5)(35/12) + (3.52 )(35/12) = 45.94.
Therefore, the net result of a round has mean 12.25 - 12.5 = -0.25, and variance 45.94.
1000 rounds have a net result with mean -250 and variance 45,940.
Prob[net result ≥ 0] ≅ 1 - Φ((-0.5 + 250)/ 45,940 ) = 1 - Φ(1.16) = 1 - 0.8770 = 0.1220.
16.28. B. The total number of delayed passengers is a compound frequency distribution, with
primary distribution the number of delayed flights, and the secondary distribution the number of
passengers on a flight.
The number of flights delayed per year is Poisson with mean: (2)(12) = 24.
The second moment of the secondary distribution is: 502 + 302 = 3400.
The variance of the number of passengers delayed per year is: (24)(3400) = 81,600.
The standard deviation of the number of passengers delayed per year is: 81,600 = 285.66.
The standard deviation of the annual compensation is: (100)( 285.66) = 28,566.
16.29. D. The mean number of sessions is:

(400)(0.2)(2) + (300)(0.5)(15) + (200)(0.3)(9) = 2950.
For a single resident we have a Bernoulli primary (whether the resident need therapy) and a
geometric secondary (how many visits).
This has variance: (mean of primary)(variance of second.) + (mean second.)2 (var. of primary)
= qβ(1 + β) + β2q(1 - q).
For a resident in state 1, the variance of the number of visits is:
(0.2)(2)(3) + (32 )(0.2)(1 - 0.8) = 1.84.
For state 2, the variance of the number of visits is: (0.5)(15)(16) + (152 )(0.5)(1 - 0.5) = 176.25.
For state 3, the variance of the number of visits is: (0.3)(9)(10) + (92 )(0.3)(1 - 0.3) = 44.01.
The sum of the visits from 400 residents in state 1, 300 in state 2, and 200 in state 3, has variance:
(400)(1.84) + (300)(176.25) + (200)(44.01) = 62,413.
Prob[sessions > 3000] ≅ 1 - Φ[(3000.5 - 2950)/ 62413 ] = 1 - Φ[0.20] = 0.4207.
16.30. E. Primary distribution has mean: (0)(0.1) + (1)(0.4) + (2)(0.3) + (3)(0.2) = 1.6,
second moment: (02 )(0.1) + (12 )(0.4) + (22 )(0.3) + (32 )(0.2) = 3.4, and variance: 3.4 - 1.62 = 0.84.
The secondary distribution has mean 3 and variance 3.
The compound distribution has variance: (1.6)(3) + (32 )(0.84) = 12.36.
16.31. E. Mean = (mean primary)(mean secondary) = (100)(1.1)(1.0) = 110.

Variance = (mean primary)(variance of secondary) + (mean secondary)2 (variance of primary) =
(100)(1.1)(1)(1 + 1) + {(1.1)(1.0)}2 (100) = 341. Φ(2.326) = 0.99.
99th percentile: 110 + 2.326 341 = 152.95. Need at least 153 televisions.
16.32. A. The primary distribution is Binomial with m = 1000 and q = .2, with mean 200 and
variance 160. The mean of the compound distribution is: (200)(20) = 4000.
The variance of the compound distribution is: (200)(20) + (202 )(160) = 68,000.
Annual budget is: 10(4000 + 68000 ) = 42,608.
2016-C-1, Frequency Distributions, §17 Mixed Distributions HCM 10/21/15, Page 311
Section 17, Mixed Frequency Distributions
One can mix frequency models together by taking a weighted average of different frequency
models. This can involve either a discrete mixture of several different frequency distributions or a
continuous mixture over a portfolio as a parameter varies.
For example, one could mix together Poisson Distributions with different means.123
Discrete Mixtures:
Assume there are four types of risks, each with claim frequency given by a Poisson distribution:
Average Annual A Priori
Type Claim Frequency Probability
Excellent 1 40%
Good 2 30%
Bad 3 20%
Ugly 4 10%
Recall that for a Poisson Distribution with parameter λ the chance of having n claims is given by:
f(n) = λn e−λ / n!.

So for example for an Ugly risk with λ = 4, the chance of n claims is: 4n e-4 / n!.
For an Ugly risk the chance of 6 claims is: 46 e-4 /6! = 10.4%.
Similarly the chance of 6 claims for Excellent, Good, or Bad risks are: 0.05%, 1.20%, and 5.04%
respectively.
If we have a risk but do not know what type it is, we weight together the 4 different chances of
having 6 claims, using the a priori probabilities of each type of risk in order to get the chance of
having 6 claims: (0.4)(0.05%) + (0.3)(1.20%) + (0.2)(5.04%) + (0.1)(10.42%) = 2.43%.
The table below displays similar values for other numbers of claims.
The probabilities in the final column represents the assumed distribution of the number of claims for
the entire portfolio of risks.124 This “probability for all risks” is the mixed distribution. While the mixed
distribution is easily computed by weighting together the four Poisson distributions, it is not itself a
Poisson nor other well known distribution.
123
The parameter of a Poisson is its mean. While one can mix together other frequency distributions, for example
Binomials or Negative Binomials, you are most likely to be asked about mixing Poissons. (It is unclear what if anything
they will ask on this subject.)
124
Prior to any observations. The effect of observations will be discussed in “Mahlerʼs Guide to Buhlmann Credibility”
and “Mahlerʼs Guide to Conjugate Priors.”
Number of Probability for Probability for Probability for Probability for Probability for
Claims Excellent Risks Good Risks Bad Risks Ugly Risks All Risks
0 0.3679 0.1353 0.0498 0.0183 0.1995
1 0.3679 0.2707 0.1494 0.0733 0.2656
2 0.1839 0.2707 0.2240 0.1465 0.2142
3 0.0613 0.1804 0.2240 0.1954 0.1430
4 0.0153 0.0902 0.1680 0.1954 0.0863
5 0.0031 0.0361 0.1008 0.1563 0.0478
6 0.0005 0.0120 0.0504 0.1042 0.0243
7 0.0001 0.0034 0.0216 0.0595 0.0113
8 0.0000 0.0009 0.0081 0.0298 0.0049
9 0.0000 0.0002 0.0027 0.0132 0.0019
10 0.0000 0.0000 0.0008 0.0053 0.0007
11 0.0000 0.0000 0.0002 0.0019 0.0002
12 0.0000 0.0000 0.0001 0.0006 0.0001
13 0.0000 0.0000 0.0000 0.0002 0.0000
14 0.0000 0.0000 0.0000 0.0001 0.0000
SUM 1.0000 1.0000 1.0000 1.0000 1.0000
The density function of the mixed distribution, is the mixture of the density function for
specific values of the parameter that is mixed.
Moments of Mixed Distributions:
The overall (a priori) mean can be computed in either one of two ways.
First one can weight together the means for each type of risks, using their (a priori) probabilities:
(0.4)(1) + (0.3)(2) + (0.2)(3) + (0.1)(4) = 2.
Alternately, one can compute the mean of the mixed distribution:

(0)(0.1995) + (1)(0.2656) + (2)( 0.2142) + ... = 2.
In either case, the mean of this mixed distribution is 2.
The mean of a mixed distribution is the mixture of the means for specific values of the
parameter λ: E[X] = Eλ [E[X | λ]].
One can calculate the second moment of a mixture in a similar manner.
Exercise: What is the second moment of a Poisson distribution with λ = 3?

[Solution: Second Moment = Variance + Mean2 = 3 + 32 = 12.]
In general, the second moment of a mixture is the mixture of the second moments.
In the case of this mixture, the second moment is:

(0.4)(2) + (0.3)(6) + (0.2)(12) + (0.1)(20) = 7.
One can verify this second moment, by working directly with the mixed distribution:
Number
Probability
Probability
of for
for Number of Square of #
Excellent
Claims
Good
Bad
UglyAll
Risks
Risks
Risks
Risks
Risks Claims of Claims
0.1995 0 0
0.2656 1 1
0.2142 2 4
0.1430 3 9
0.0863 4 16
0.0478 5 25
0.0243 6 36
0.0113 7 49
0.0049 8 64
0.0019 9 81
0.0007 10 100
0.0002 11 121
0.0001 12 144
0.0000 13 169
0.0000
Mean 14 196
Average 2.000 7.000
Exercise: What is the variance of this mixed distribution?

[Solution: 7 - 22 = 3.]
First one mixes the moments, and then computes the variance of the mixture from its
first and second moments.125
In general, the nth moment of a mixed distribution is the mixture of the nth moments for
specific values of the parameter λ: E[Xn ] = Eλ [E[Xn | λ]].126
There is nothing unique about assuming four types of risks. If one had assumed for example 100
different types of risks, with mean frequencies from 0.1 to 10. There would have been no change in
the conceptual complexity of the situation, although the computational complexity would have been
increased. This discrete example can be extended to a continuous case.
125
As discussed in “Mahlerʼs Guide to Buhlmann Credibility,” one can split the variance of a mixed distribution into
two pieces, the Expected Value of the Process Variance and the Variance of the Hypothetical Means.
126
Third and higher moments are more likely to be asked about for Loss Distributions.
Mixtures of Loss Distributions are discussed in “Mahlerʼs Guide to Loss Distributions.”
Continuous Mixtures:
We have seen how one can mix a discrete number of Poisson Distributions.127 For a continuous
mixture, the mixed distribution is given as the integral of the product of the distribution of the
parameter λ times the Poisson density function given λ.128
g(x) = ∫ f(x; λ ) u(λ) dλ .

Exercise: The claim count N for an individual insured has a Poisson distribution with mean λ.
λ is uniformly distributed between 0.3 and 0.8.
Find the probability that a randomly selected insured will have one claim.
[Solution: For the Poisson Distribution, f(1 | λ) = λe−λ.
0.8 λ = 0.8
(1/0.5) ∫ λ e- λ dλ = (2)(-λ e- λ - e- λ )] = (2)(1.3e-0.3 - 1.8 e-0.8) = 30.85%.]
0.3 λ = 0.3
Continuous mixtures can be performed of either frequency distributions or loss distributions.129

Such a continuous mixture is called a Mixture Distribution.130
Mixture Distribution ⇔ Continuous Mixture of Models.
Mixture Distributions can be created from other frequency distributions than the Poisson.
For example, if f is a Binomial with fixed m, one could mix on the parameter q:
g(x) = ∫ f(x; q) u(q) dq .

For example, if f is a Negative Binomial with fixed r, one could mix on the parameter β:
g(x) = ∫ f(x; β) u(β) dβ .

If f is a Negative Binomial with fixed r, one could instead mix on the parameter p = 1/(1+β).
127
One can mix other frequency distributions besides the Poisson.
128
The very important Gamma-Poisson situation is discussed in a subsequent section.
129
See the section on Continuous Mixtures of Models in “Mahlerʼs Guide to Loss Distributions”.
130
See Section 5.2.4 of Loss Models.
Moments of Continuous Mixtures:
As in the case of discrete mixtures, the nth moment of a continuous mixture is the mixture of the nth
moments for specific values of the parameter λ: E[Xn ] = Eλ [E[Xn | λ]].
Exercise: What is the mean for a mixture of Poissons?

[Solution: For a given value of lambda, the mean of a Poisson Distribution is λ . We need to weight
these first moments together via the density of lambda u(λ): ∫ λ u(λ) dλ = mean of u.]
If for example, λ were uniformly distributed from 0.1 to 0.5, then the mean of the mixed distribution
would be 0.3.
In general, the mean of a mixture of Poissons is the mean of the mixing distribution.131 For the case of
a mixture of Poissons via a Gamma Distribution with parameters α and θ, the mean of the mixed
distribution is that of the Gamma, αθ.132
Exercise: What is the Second Moment for Poissons mixed via a Gamma Distribution with
parameters α and θ?
[Solution: For a given value of lambda, the second moment of a Poisson Distribution is λ + λ2.
We need to weight these second moments together via the density of lambda: λα−1 e−λ/θ θ−α / Γ(α).
∞
(λ + λ2 ) λα - 1 e- λ / θ θ - α θ- α ∞ α
P(z) = ∫ dλ = ∫ (λ + λα + 1 ) e - λ / θ dλ =
λ=0 Γ(α) Γ(α) λ=0
θ- α
{Γ(α+1)θα+1 + Γ(α+2)θα+2} = αθ + α(α+1)θ2.]
Γ(α)
Since the mean of the mixed distribution is that of the Gamma, αθ, the variance of the mixed
distribution is: αθ + α(α+1)θ2 - (αθ)2 = αθ + αθ2.
As will be discussed, the mixed distribution is a Negative Binomial Distribution, with r = α and
β = θ. Thus the variance of the mixed distribution is: αθ + αθ2 = rβ + rβ2 = rβ(1+β), which is in fact
that the variance of a Negative Binomial Distribution.
131
This result will hold whenever the parameter being mixed is the mean, as it was in the case of the Poisson.
132
The Gamma-Poisson will be discussed in a subsequent section.
Factorial Moments of Mixed Distributions:
The nth factorial moment of a mixed distribution is the mixture of the nth factorial moments for specific
values of the parameter ζ:
E[(X)(X-1) ... (X+1-n)] = Eζ[E[(X)(X-1) ... (X+1-n) | ζ].
When we are mixing Poissons, the factorial moments of the mixed distribution have a simple form.
nth factorial moment of mixed Poisson = E[(X)(X-1) ... (X+1-n)] = Eλ [E[(X)(X-1) ... (X+1-n) | λ] =
Eλ [nth factorial moment of Poisson] = Eλ [λn ] = nth moment of the mixing distribution.133
Exercise: Given Poissons are mixed via a distribution u(θ), what are the mean and variance of the
mixed distribution?
[Solution: The mean of the mixed distribution = first factorial moment =
mean of the mixing distribution.
The second moment of the mixed distribution =
second factorial moment + first factorial moment =
second moment of the mixing distribution + mean of the mixing distribution.
Variance of the mixed distribution =
second moment of mixed distribution - (mean of mixed distribution)2 =
second moment of the mixing distribution + mean of the mixing distribution -
(mean of the mixing distribution)2 =
Variance of the mixing distribution + Mean of the mixing distribution.]
When mixing Poissons, Mean of the Mixed Distribution = Mean of the Mixing Distribution,
and the Variance of the Mixed Distribution =
Variance of the Mixing Distribution + Mean of the Mixing Distribution.
Therefore, for a mixture of Poissons, the variance of the mixed distribution is always
greater than the mean of the mixed distribution.
For example, for a Gamma mixing distribution, the variance of the mixed Poisson is:
Variance of the Gamma + Mean of the Gamma = αθ2 + αθ.
133
See equation 8.24 in Insurance Risk Models by Panjer & Willmot.
Probability Generating Functions of Mixed Distributions:
The Probability Generating Function of the mixed distribution, is the mixture of the probability
generating functions for specific values of the parameter λ:
P(z) = ∫ P(z; λ) u(λ) dλ .

Exercise: What is the Probability Generating Function for Poissons mixed via a Gamma Distribution
with parameters α and θ?
[Solution: For a given value of lambda, the p.g.f. of a Poisson Distribution is eλ(z-1).
We need to weight these Probability Generating Functions together via the density of lambda:
λ α−1 e−λ/θ θ−α / Γ(α).
∞
eλ(z - 1) λα - 1 e- λ / θ θ - α θ- α ∞ α - 1 - λ(1/ θ + 1 - z)
P(z) = ∫ Γ(α)
dλ = ∫ λ e
Γ(α) λ=0
dλ
λ=0
θ- α
= {Γ(α) (1/θ+1-z)−α} = {1+ θ(z-1)}−α.]
Γ(α)
This is the p.g.f. of a Negative Binomial Distribution with r = α and β = θ. This is one way to establish
that when Poissons are mixed via a Gamma Distribution, the mixed distribution is always a Negative
Binomial Distribution, with r = α = shape parameter of the Gamma and
β = θ = scale parameter of the Gamma.134
134
The Gamma-Poisson frequency process is the subject of an important subsequent section.
Mixing Poissons:135
In the very important case of mixing Poisson frequency distributions, the p.g.f. of the mixed
distribution can be put in terms of the Moment Generating Function of the mixing distribution of λ.
The Moment Generating Function of a distribution is defined as: MX(t) = E[ext].136
For a mixture of Poissons:

Pmixed distribution(z) = Emixing distribution of λ[PPoisson(z)] = Emixing distribution of λ[exp[λ(z - 1)]] =
M mixing distribution of λ(z - 1).
Thus when mixing Poissons, Pmixed distribution(z) = Mmixing distribution of λ(z - 1).137
Exercise: Apply the above formula for probability generating functions to Poissons mixed via a
Gamma Distribution.
[Solution: The m.g.f. of a Gamma Distribution with parameters α and θ is: (1 - θt)−α.
Therefore, the p.g.f. of the mixed distribution is:
M mixing distribution(z - 1) = {1 - θ(z - 1)}−α.
Comment: This is the p.g.f. of a Negative Binomial Distribution, with r = α and β = θ.

Therefore, the mixture of Poissons via a Gamma, with parameters α and θ, is a Negative Binomial
Distribution, with r = α and β = θ.]
M X(t) = EX[ext] = EX[exp[t]x] = PX[et]. Therefore, when mixing Poissons:
M mixed distribution(t) = Pmixed distribution(et) = Mmixing distribution of λ(et - 1).
Exercise: Apply the above formula for moment generating formulas to Poissons mixed via an
Inverse Gaussian Distribution with parameters µ and θ.
[Solution: The m.g.f. of an Inverse Gaussian Distribution with parameters µ and θ is:
exp[(θ / µ) (1 - 1 - 2 µ2 t / θ )] .
Therefore, the moment generating function of the mixed distribution is:
M mixing distribution of λ(et - 1) = exp[(θ / µ) {1 - 1 - 2 µ2 (et − 1) / θ }] . ]
135
See Section 7.3.2 of Loss Models, not on the syllabus.
136
See Definition 3.8 in Loss Models and “Mahlerʼs Guide to Aggregate Distributions.”
The moment generating functions of loss distributions are shown in Appendix B, when they exist.
137
See Equation 7.14 in Loss Models, not on the syllabus.
Exercise: The p.g.f. of the Zero-Truncated Negative Binomial Distribution is:

{1 - β(z - 1)} - r - (1 + β)- r {1 - β(z - 1)} - r - 1
P(z) = = 1 + , z < 1 +1/β.
1 - (1 + β)- r 1 - (1 + β)- r
What is the moment generating function of a compound Poisson-Extended Truncated Negative
Binomial Distribution, with parameters λ = (θ/µ){(1 + 2µ2/θ).5 - 1}, r = -1/2, and β = 2µ2/θ?
[Solution: The p.g.f. of a Poisson Distribution with parameter λ is: P(z) = eλ(z - 1).
For a compound distribution, the m.g.f can be written in terms of the p.g.f. of the primary distribution
and m.g.f. of the secondary distribution:
M compound dist.(t) = Pprimary [Msecondary[t]] = Pprimary [Psecondary[et]] =
{1 - β(et - 1)} - r - 1
exp[λ{P secondary[et] - 1}] = exp[λ ]=
1 - (1 + β)- r
⎡ (θ / µ) { 1 + 2 µ 2 / θ -1} { 1 - 2 (µ2 / θ)(et - 1) - 1} ⎤
exp ⎢ ⎥=
⎣ 1 - 1 + 2 µ2 / θ ⎦
exp[(θ / µ) {1 - 1 - 2 µ2 (et − 1) / θ }] .
Comment: This is the same as the m.g.f. of Poissons mixed via an Inverse Gaussian Distribution
with parameters µ and θ.]
Since their moment generating functions are equal, if a Poisson is mixed by an Inverse
Gaussian as per Loss Models, with parameters µ and θ, then the mixed distribution is a
compound Poisson-Extended Truncated Negative Binomial Distribution as per Loss Models,
θ
with parameters: λ = ( 1 + 2 µ2 / θ - 1) , r = -1/2, and β = 2µ2/θ.138
µ
This is an example of a general result:139 If one mixes Poissons and the mixing distribution is infinitely
divisible,140 then the resulting mixed distribution can also be written as a compound Poisson
distribution, with a unique secondary distribution.
The Inverse Gaussian Mixing Distribution was infinitely divisible and the result of mixing the
Poissons was a Compound Poisson Distribution with a particular Extended Truncated Negative
Binomial Distribution as a secondary distribution.
138
See Example 7.17 in Loss Models, not on the syllabus.
139
See Theorem 7.9 in Loss Models, not on the syllabus.
140
As discussed previously, if a distribution is infinitely divisible, then if one takes the probability generating function
to any positive power, one gets the probability generating function of another member of the same family of
distributions. Examples of infinitely divisible distributions include: Poisson, Negative Binomial, Compound Poisson,
Compound Negative Binomial, Normal, Gamma, and Inverse Gaussian.
Another example is mixing Poissons via a Gamma. The Gamma is infinitely divisible, and therefore
the mixed distribution can be written as a compound distribution. As discussed previously, the
mixed distribution is a Negative Binomial. It turns out that the Negative Binomial can also be written
as a Compound Poisson with a logarithmic secondary distribution.
Exercise: The logarithmic frequency distribution has:

⎛ β ⎞x
⎜ ⎟
⎝ 1+ β ⎠ ln[1 - β(z - 1)]
f(x) = , x = 1, 2, 3,... P(z) = 1 - , z < 1 + 1/β.
x ln(1+ β) ln[1+ β]
Determine the probability generating function of a Compound Poisson with a logarithmic secondary
distribution.
[Solution: Pcompound distribution(z) = Pprimary [Psecondary[z]] =
ln[1 - β(z - 1)] -λ
exp[λ{P secondary[z] - 1}] = exp[-λ ] = exp[ ln[1 - β(z - 1)]]
ln[1+ β] ln[1+β]
= {1 - β(z - 1)}−λ/ln[1 + β].]
The p.g.f. of the Negative Binomial is: P(z) = {1 - β(z -1)}-r. This is the same form as the probability
generating function obtained in the exercise, with r = λ/ln[1+β] and β = β.
Therefore, a Compound Poisson with a logarithmic secondary distribution is a Negative Binomial
Distribution with parameters r = λ/ln[1 + β] and β = β.141
When mixing Poisson frequency distributions, the p.g.f. of the mixed distribution can also be put in
terms of the p.g.f. of the mixing distribution of λ. For a mixture of Poissons:
Pmixed distribution(z) = Emixing distribution of λ[PPoisson(z)] = Emixing distribution of λ[exp[λ(z - 1)] ]=
Emixing distribution of λ[exp[z - 1]λ] = Pmixing distribution of λ(exp[z - 1]).
Thus when mixing Poissons, Pmixed distribution(z) = Pmixing distribution of λ(exp[z - 1]).
For example, assume that a Poisson is mixed via a Negative Binomial, in other words each insured
is Poisson with mean λ, but λ in turn follows a Negative Binomial across a group of insureds.
For the Negative Binomial, P(z) = {1 - β(z-1)}-r.
Thus the mixture has probability generating function: {1 - β(exp[z - 1] - 1)}-r.
141
See Example 7.5 in Loss Models, not on the syllabus.
Mixing versus Adding:
The number of accidents Alice has is Poisson with mean 3%.

The number of accidents Bob has is Poisson with mean 5%.
The number of accidents Alice and Bob have are independent.
Exercise: Determine the probability that Alice and Bob have a total of two accidents.
[Solution: Their total number of accidents is Poisson with mean 8%. 0.082 e-0.08 / 2 = 0.30%.
Comment: An example of adding two Poisson variables.]
Exercise: We choose either Alice or Bob at random.

Determine the probability that the chosen person has two accidents.
[Solution: (50%)(0.032 e-0.03 / 2) + (50%)(0.052 e-0.05 / 2) = 0.081%.
Comment: A 50%-50% mixture of two Poisson Distributions with means 3% and 5%.
Mixing is different than adding.]
Problems:

Each insuredʼs claim frequency follows a Poisson process.
There are three types of insureds as follows:
Type A Priori Probability Mean Annual Claim Frequency (Poisson Parameter)
A 60% 1
B 30% 2
C 10% 3
17.1 (1 point) What is the chance of a single individual having 4 claims in a year?
A. less than 0.03
E. at least 0.06
17.2 (1 point) What is the mean of this mixed distribution?

A. 1.1 B. 1.2 C. 1.3 D. 1.4 E. 1.5
17.3 (2 points) What is the variance of this mixed distribution?

A. less than 2.0
E. at least 2.3
17.4 (7 points) Each insured has its annual number of claims given by a Geometric Distribution
with mean β. Across a portfolio of insureds, β is distributed as follows: π(β) = 3/(1+β)4 , 0 < β.
(a) Determine the algebraic form of the density of this mixed distribution.
(b) List the first several values of this mixed density.
(c) Determine the mean of this mixed distribution.
(d) Determine the variance of this mixed distribution.
17.5 (1 point) Each insuredʼs claim frequency follows a Binomial Distribution, with m = 5.
There are three types of insureds as follows:
Type A Priori Probability Binomial Parameter q
A 60% 0.1
B 30% 0.2
C 10% 0.3
What is the chance of a single individual having 3 claims in a year?
A. less than 0.03
E. at least 0.06
Use the following information for the following four questions:

• The claim count N for an individual insured has a Poisson distribution with mean λ.
• λ is uniformly distributed between 0 and 4.
17.6 (2 points) Find the probability that a randomly selected insured will have no claims.
A. Less than 0.22
E. At least 0.28
17.7 (2 points) Find the probability that a randomly selected insured will have one claim.
A. Less than 0.22
E. At least 0.28
17.8 (1 point) What is the mean claim frequency?
17.9 (1 point) What is the variance of the mixed frequency distribution?
17.10 (4 points) For a given value of q, the number of claims is Binomial with parameters m and q.
However, m is distributed via a Negative Binomial with parameters r and β.
What is the mixed distribution of the number of claims?
Assume that given q, the number of claims observed for one risk in m trials is given by a Binomial
distribution with mean mq and variance mq(1-q). Also assume that the parameter q varies between
0 and 1 for the different risks, with q following a Beta distribution:
Γ(a + b) a-1 a ab
g(q) = q (1-q)b-1, with mean and variance .
Γ(a) Γ(b) a +b 2
(a + b) (a+ b +1)
17.11 (2 points) What is the unconditional mean frequency?

m a ab
A. B. m C. m
a +b a +b a +b
a a
D. m E. m
(a + b) (a +b +1) (a + b)2 (a + b +1)
17.12 (4 points) What is the unconditional variance?

a a ab
A. m2 B. m2 C. m2
a +b (a + b) (a +b +1) (a + b) (a +b +1)
ab ab
D. m(m+a+b) E. m(m+a+b)
(a + b)2 (a + b +1) (a + b) (a +b +1) (a + b + 2)
17.13 (4 points) If a = 2 and b = 4, then what is the probability of observing 5 claims in 7 trials for an
individual insured?
A. less than 0.068
E. at least 0.074
17.14 (2 points)
Each insuredʼs claim frequency follows a Negative Binomial Distribution, with r = 0.8.
There are two types of insureds as follows:
Type A Priori Probability β
A 70% 0.2
B 30% 0.5
What is the chance of an insured picked at random having 1 claim next year?
A. 13% B. 14% C. 15% D. 16% E. 17%
However, m is distributed via a Poisson with mean λ.

The number of claims a particular policyholder makes in a year follows a distribution with parameter
p: f(x) = p(1-p)x, x = 0, 1, 2, ....
The values of the parameter p for the individual policyholders in a portfolio follow a Beta Distribution,
with parameters a = 4, b = 5, and θ = 1: g(p) = 280 p3 (1-p)4 , 0 ≤ p ≤ 1.
17.16 (2 points) What is the a priori mean annual claim frequency for the portfolio?
A. less than 1.5
E. at least 1.8
17.17 (3 points) For an insured picked at random from this portfolio, what is the probability
of observing 2 claims next year?
A. 9% B. 10% C. 11% D. 12% E. 13%

(i) An individual insured has an annual claim frequency that follow a Poisson distribution with
mean λ.
(ii) Across the portfolio of insureds, the parameter λ has probability density function:
Π(λ) = (0.8)(40e−40λ) + (0.2)(10e−10λ).
17.18 (1 point) What is the expected annual frequency?

(A) 3.6% (B) 3.7% (C) 3.8% (D) 3.9% (E) 4.0%
17.19 (2 points) For an insured picked at random, what is the probability that he will have at least
one claim in the coming year?
(A) 3.6% (B) 3.7% (C) 3.8% (D) 3.9% (E) 4.0%
However, m is distributed via a Binomial with parameters 5 and 0.1.

For a given value of q, the number of claims is Binomial distributed with parameters m = 3 and q.
In turn q is distributed uniformly from 0 to 0.4.
17.21 (2 points) What is the chance that zero claims are observed?
A. Less than 0.52
E. At least 0.55
17.22 (2 points) What is the chance that one claim is observed?

A. Less than 0.32
E. At least 0.35
17.23 (2 points) What is the chance that two claims are observed?
A. Less than 0.12
E. At least 0.15
17.24 (2 points) What is the chance that three claims are observed?
A. Less than 0.01
E. At least 0.04
17.25 (2 points) For students at a certain college, 40% do not own cars and do not drive.
For the rest of the students, their accident frequency is Poisson with λ = 0.07.
Let T = the total number of accidents for a group of 100 students picked at random.
What is the variance of T?
A. 4.0 B. 4.1 C. 4.2 D. 4.3 E. 4.4

On his daily walk, Clumsy Klem loses coins at a Poisson rate.
At random, on half the days, Klem loses coins at a rate of 0.2 per minute.
On the other half of the days, Klem loses coins at a rate of 0.6 per minute.
The rate on any day is independent of the rate on any other day.
17.26 (2 points) Calculate the probability that Clumsy Klem loses exactly one coin during the sixth
minute of todayʼs walk.
(A) 0.21 (B) 0.23 (C) 0.25 (D) 0.27 (E) 0.29
17.27 (2 points) Calculate the probability that Clumsy Klem loses exactly one coin during the first
two minutes of todayʼs walk.
A. Less than 32%
E. At least 38%
17.28 (2 points) Let A = the number of coins that Clumsy Klem loses during the first minute of
todayʼs walk. Let B = the number of coins that Clumsy Klem loses during the first minute of
tomorrowʼs walk. Calculate Prob[A + B = 1].
(A) 0.30 (B) 0.32 (C) 0.34 (D) 0.36 (E) 0.38
17.29 (2 points) Calculate the probability that Clumsy Klem loses exactly one coin during the third
minute of todayʼs walk and exactly one coin during the fifth minute of todayʼs walk.
(A) 0.05 (B) 0.06 (C) 0.07 (D) 0.08 (E) 0.09
17.30 (2 points) Calculate the probability that Clumsy Klem loses exactly one coin during the third
minute of todayʼs walk and exactly one coin during the fifth minute of tomorrowʼs walk.
(A) 0.05 (B) 0.06 (C) 0.07 (D) 0.08 (E) 0.09
four minutes of todayʼs walk and exactly one coin during the first four minutes of tomorrowʼs walk.
A. Less than 8.5%
E. At least 10.0%
2 minutes of todayʼs walk, and exactly two coins during the following 3 minutes of todayʼs walk.
(A) 0.05 (B) 0.06 (C) 0.07 (D) 0.08 (E) 0.09

Each insured has its accident frequency given by a Poisson Distribution with mean λ.
For a portfolio of insureds, λ is distributed as follows on the interval from a to b:
(d +1) λd
f(λ) = d + 1 , 0 ≤ a ≤ λ ≤ b ≤ ∞.
b - ad + 1
17.33 (2 points) If the parameter d = -1/2, and if a = 0.2 and b = 0.6, what is the mean frequency?
A. less than 0.35
E. at least 0.38
17.34 (2 points) If the parameter d = -1/2, and if a = 0.2 and b = 0.6, what is the variance of the
frequency?
A. less than 0.39
E. at least 0.42
17.35 (3 points) Let X be a 50%-50% weighting of two Binomial Distributions.

The first Binomial has parameters m = 6 and q = 0.8.
The second Binomial has parameters m = 6 and q unknown.
For what value of q, does the mean of X equal the variance of X?
A. 0.3 B. 0.4 C. 0.5 D. 0.6 E. 0.7

(i) Claim counts for individual insureds follow a Poisson distribution.
(ii) Half of the insureds have expected annual claim frequency of 4%.
(iii) The other half of the insureds have expected annual claim frequency of 10%.
17.36 (1 point) An insured is picked at random.

What is the probability that this insured has more than 1 claim next year?
(A) 0.21% (B) 0.23% (C) 0.25% (D) 0.27% (E) 0.29%
17.37 (1 point) A large group of such insured is observed for one year.
What is the variance of the distribution of the number of claims observed for individuals?
(A) 0.070 (B) 0.071 (C) 0.072 (D) 0.073 (E) 0.074
An insurance company sells two types of policies with the following characteristics:
Type of Policy Proportion of Total Policies Annual Claim Frequency
I 25% Poisson with λ = 0.25
II 75% Poisson with λ = 0.50
17.38 (1 point) What is the probability that an insured picked at random will have no claims next
year?
A. 50% B. 55% C. 60% D. 65% E. 70%
17.39 (1 point) What is the probability that an insured picked at random will have one claim next
year?
A. less than 30%
E. at least 45%
17.40 (1 point) What is the probability that an insured picked at random will have two claims next
year?
A. 4% B. 6% C. 8% D. 10% E. 12%
17.41 (3 points) The Spiders sports team will play a best of 3 games playoff series.
They have an 80% chance to win each home game and only a 40% chance to win each road game.
The results of each game are independent of the results of any other game.
It has yet to be determined whether one or two of the three games will be home games for the
Spiders, but you assume these two possibilities are equally likely.
What is the chance that the Spiders win their playoff series?
A. 63% B. 64% C. 65% D. 66% E. 67%
17.42 (4 points) The number of claims is modeled as a two point mixture of Poisson Distributions,
with weight p to a Poisson with mean λ1 and weight (1-p) to a Poisson with mean λ2.
(a) For the mixture, determine the ratio of the variance to the mean as a function of λ1, λ2, and p.
(b) With the aid of a computer, for λ1 = 10% and λ2 = 20%,

graph this ratio as a function of p for 0 ≤ p ≤ 1.

For a given value of q, the number of claims is Binomial distributed with parameters m = 4 and q.
2500 2
In turn q is distributed from 0 to 0.6 via: π(q) = q (1-q).
99
17.43 (3 points) What is the chance that zero claims are observed?
A. 12% B. 14% C. 16% D. 18% E. 20%
17.44 (3 points) What is the chance that one claim is observed?

A. 26% B. 26% C. 28% D. 30% E. 32%
17.45 (3 points) What is the chance that two claims are observed?
A. 26% B. 26% C. 28% D. 30% E. 32%
17.46 (2 points) What is the chance that three claims are observed?
A. 19% B. 21% C. 23% D. 25% E. 27%
17.47 (2 points) What is the chance that four claims are observed?
A. 3% B. 4% C. 5% D. 6% E. 7%

• There are two types of insurance policies.
• Three quarters are low risk policies, while the remaining one quarter are high risk policies.
• The annual claims from each type of policy are Poisson.
• The mean number of claims from a high risk policy is 0.4.
• The variance of the mixed distribution of the number of claims is 0.2575.
Determine the mean annual claims from a low risk policy.
A. 12% B. 14% C. 16% D. 18% E. 20%
17.49 (8 points) One has a mixture of Poisson Distributions each with mean λ.
The mixing distribution of λ is Poisson with mean µ.
(a) (2 points) Determine the mean and variance of the mixture.
(b) (3 points) Determine the form of Probability Generating Function of the mixture.
(c) (3 points) Use the Probability Generating Function to determine the mean and variance of
the mixture, and verify that they match the results in part (a).
17.50 (8.5 points) Use the following information:

• The number of families immigrating to the principality of Genovia per month is a 50%-50% mixture
of Poisons Distributions with λ = 4 and λ = 10.
• The number of members per family immigrating is a 30%-70% mixture of
zero-truncated Binomial Distributions with m = 2 and q = 0.5, or m = 8 and q = 0.4.
• The number of families and their sizes are independent.
(a) (2 points) Determine the mean and variance of the distribution of the number of families.
(b) (4 points) Determine the mean and variance of the distribution of the sizes of families.
(c) (1.5 points) Determine the mean and variance of the distribution of the number of people
immigrating per month.
(d) (1 point) Using the Normal Approximation with continuity correction, estimate the probability
that at least 25 people will immigrate next month.
17.51 (3 points) One has a mixture of Geometric Distributions with mean β.

The mixing distribution of β is Beta with a = 4, b = 3, and θ = 1.
Determine the mean and variance of the mixture.

• Frequency is a 50%-50% mixture of Poissons with means 1 and 2.
• Severity is uniform from 0 to 1000.
17.52 (2 points) What is the probability of exactly 2 claims of size greater than 600?
(There can be any number of small claims.)
A. 10% B. 11% C. 12% D. 13% E. 14%
17.53 (2 points) What is the probability of exactly 2 claims of size less than 600?
(There can be any number of large claims.)
A. 13% B. 14% C. 15% D. 16% E. 17%
17.54 (2 points) What is the probability of exactly 2 claims of size greater than 600
and exactly 2 claims of size less than 600?
A. 1.0% B. 1.2% C. 1.4% D. 1.6% E. 1.8%
17.55 (4, 11/82, Q.48) (3 points)

Let f(x|θ) = frequency distribution for a particular risk having parameter θ.
f(x|θ) = θ(1-θ)x, where θ is in the interval [p, 1], p is a fixed value such that 0 < p < 1,
and x is a non-negative integer.
-1
g(θ) = distribution of θ within a given class of risks. g(θ) = , for p ≤ θ ≤ 1.
θ ln(p)
Find the frequency distribution for the class of risks.
-(x +1) (1- p)x - px + 1 -(1- p)x + 1 - (x +1) px
A. B. C. D.
p2 ln(p) (x +1) ln(p) (x +1) ln(p) ln(p)
E. None A, B, C, or D.
17.56 (2, 5/88, Q.33) (1.5 points) Let X have a binomial distribution with parameters m and q,
and let the conditional distribution of Y given X = x be Poisson with mean x.
What is the variance of Y?
A. x B. mq C. mq(1 - q) D. mq2 E. mq(2 - q)
17.57 (4, 5/88, Q.32) (2 points) Let N be the random variable which represents the number of
claims observed in a one year period. N is Poisson distributed with a probability density function
with parameter θ: P[N = n | θ] = e-θ θn /n!, n = 0, 1, 2, ...
The probability of observing no claims in a year is less than .450.
Which of the following describe possible probability distributions for θ?
1. θ is uniformly distributed on (0, 2).
2. The probability density function of θ is f(θ) = e−θ for θ > 0.

3. P[θ = 1] = 1 and P[θ ≠ 1] = 0.
A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3
17.58 (3, 11/00, Q.13 & 2009 Sample Q.114) (2.5 points)
A claim count distribution can be expressed as a mixed Poisson distribution.
The mean of the Poisson distribution is uniformly distributed over the interval [0, 5].
Calculate the probability that there are 2 or more claims.
(A) 0.61 (B) 0.66 (C) 0.71 (D) 0.76 (E) 0.81
Bob is a carnival operator of a game in which a player receives a prize worth W = 2N if the player
has N successes, N = 0, 1, 2, 3,…
Bob models the probability of success for a player as follows:
(i) N has a Poisson distribution with mean Λ.
(ii) Λ has a uniform distribution on the interval (0, 4).
Calculate E[W].
(A) 5 (B) 7 (C) 9 (D) 11 (E) 13
17.60 (CAS3, 11/06, Q.19) (2.5 points)

In 2006, annual claim frequency follows a negative binomial distribution with parameters β and r.
β follows a uniform distribution on the interval (0, 2) and r = 4.
Calculate the probability that there is at least 1 claim in 2006.
A. Less than 0.85
E. At least 0.94
The random variable N has a mixed distribution:
(i) With probability p, N has a binomial distribution with q = 0.5 and m = 2.
(ii) With probability 1 - p, N has a binomial distribution with q = 0.5 and m = 4.
Which of the following is a correct expression for Prob(N = 2)?
(A) 0.125p2
(B) 0.375 + 0.125p
(C) 0.375 + 0.125p2
(D) 0.375 - 0.125p2
(E) 0.375 - 0.125p
17.1. D. Chance of observing 4 accidents is θ4e−θ / 24. Weight the chances of observing 4
accidents by the a priori probability of θ.
A Priori Poisson Chance of
Type Probability Parameter 4 Claims
A 0.6 1 0.0153
B 0.3 2 0.0902
C 0.1 3 0.1680
Average 0.053
17.2. E. (60%)(1) + (30%)(2) + (10%)(3) = 1.5.
17.3. A. For a Type A insured, the second moment is: variance + mean2 = 1 + 12 = 2.
For a Type B insured, the second moment is: variance + mean2 = 2 + 22 = 6.
For a Type C insured, the second moment is: variance + mean2 = 3 + 32 = 12.
The second moment of the mixture is: (60%)(2) + (30%)(6) + (10%)(12) = 4.2.
The variance of the mixture is: 4.2 - 1.52 = 1.95.
Alternately, the Expected Value of the Process Variance is:
(60%)(1) + (30%)(2) + (10%)(3) = 1.5.
The Variance of the Hypothetical Means is:
(60%)(1 - 1.5)2 + (30%)(2 - 1.5)2 + (10%)(3 - 1.5)2 = 0.45.
Total Variance = EPV + VHM = 1.5 + 0.45 = 1.95.
Comment: For the mixed distribution, the variance is greater than the mean.
17.4. (a) For the Geometric distribution, f(x) = βx/(1+β)x+1.

∞ ∞ 1
For the mixed distribution, f(x) = ∫ f(x;β) π(β )dβ = ∫ βx / (1+β)x + 1 3 / (1+β)4 dβ = 3 ∫ u3 (1−u)x du ,
0 0 0
where u = 1/(1+β), 1 - u = β/(1+β), and du = -1/(1+β)2 .

This integral is of the Beta variety; its value of Γ(x+1) Γ(3+1) / Γ(x + 1 + 3 + 1),
follows from the fact that the density of a Beta Distribution integrates to one over its support.
Therefore, f(x) = (3){Γ(x+1) Γ(3+1) / Γ(x + 1 + 3 + 1)} = (3)(x!)(3!)/(x+4)! =
18 / {(x+1)(x+2)(x+3)(x+4)}.
(b) The densities from 0 to 20 are:
3/4, 3/20, 1/20, 3/140, 3/280, 1/168, 1/280, 1/440, 1/660, 3/2860, 3/4004, 1/1820, 3/7280,
3/9520, 1/4080, 1/5168, 1/6460, 1/7980, 3/29260, 3/35420, 1/14168.
∞ ∞ 1
(c) The mean of this mixed distribution is: ∫ β π(β )dβ = ∫ β 3 / (1+β)4 dβ = 3 ∫ u (1−u) du =
0 0 0
(3)(1/2 - 1/3) = 1/2.

(d) The second moment of a Geometric is: variance + mean2 = β(1+β) + β2 = β + 2β2.
∞ ∞ 1
∫ β2 π(β) dβ = ∫ β2 3 / (1+β)4 dβ = 3 ∫ (1− u)2 du = 3/3 = 1.
0 0 0
Therefore, the second moment of this mixed distribution is: 1/2 + (2)(1) = 2.5.
The variance of this mixed distribution is: 2.5 - 0.52 = 2.25.
Comment: This is a Yule Distribution as discussed In Example 7.13 of Loss Models, with a = 3.
The mixed density can also be written in terms of a complete Beta function: 3 β[4, x+1].
17.5. B. Chance of observing 3 claims is 10q3 (1-q)2 . Weight the chances of observing 3 claims
by the a priori probability of q.
A Priori q Chance of
Type Probability Parameter 3 Claims
A 0.6 0.1 0.0081
B 0.3 0.2 0.0512
C 0.1 0.3 0.1323
Average 0.033
17.6. C. The chance of no claims for a Poisson is: e−λ.

We average over the possible values of λ:
4 λ =4
(1/4) ∫0 e- λ dλ = (1/ 4)(- -
e )λ
λ=0
] = (1/4)(1 - e-4) = 0.245.
17.7. B. The chance of one claim for a Poisson is: λe−λ.

4 λ =4
∫0
(1/4) λ e- λ dλ = (1/ 4)(-λ e- λ - e- λ )]
λ =0
= (1/4)(1 - 5e-4) = 0.227.
Comment: The densities of this mixed distribution from 0 to 9: 0.245421, 0.227105, 0.190474,
0.141632, 0.0927908, 0.0537174, 0.0276685, 0.0127834, 0.00534086, 0.00203306.
17.8. E[λ] = (0 + 4)/2 = 2.
17.9. The second moment of a Poisson is: variance + mean2 = λ + λ2.
E[λ + λ2] = E[λ] + E[λ2] = mean of uniform distribution + second moment of uniform distribution
= 2 + {22 + (4 - 0)2 /12} = 2 + 4 + 1.333 = 7.333.
variance = second moment - mean2 = 7.333 - 22 = 3.333.
17.10. The p.g.f. of each Binomial is: {1 + q(z-1)}m.

The p.g.f. of the mixture is the mixture of the p.g.f.s:
Pmixture[z] = Σ f(m){1 + q(z-1)}m = p.g.f. of f at: 1 + q(z-1).
However, f(m) is Negative Binomial, with p.g.f.: {1 - β(z-1)}-r

Therefore, Pmixture[z] = {1 - β(1 + q(z-1) - 1)}-r = {1 - βq(z-1)}-r .
However, this is the p.g.f. of a Negative Binomial Distribution with parameters r and qβ, which is
therefore the mixed distribution.
Alternately, the mixed distribution at k is:
m=∞ m=∞
βm
∑ Prob[k | m] Prob[m] = ∑
m! k m - k (r + m- 1)!
q (1- q) =
(m- k)! k! (r -1)! m! (1+ β)r + m
m=k m=k
m=∞
q k βk ⎛ (1- q) β ⎞ m - k
∑
(r + k -1)! (r + m- 1)!
⎜ ⎟ =
(1+ β) r + k (r - 1)! k! (m- k)! (r +k - 1)! ⎝ 1 + β ⎠
m=k
n=∞ n
q k βk (r + k + n -1)! ⎛ (1- q) β ⎞
∑
(r + k -1)!
⎜ ⎟ =
(1+ β) r + k (r - 1)! k! n! (r +k - 1)! ⎝ 1 + β ⎠
n=0
q k βk (r + k -1)! ⎛ 1 ⎞ r+k
⎜ ⎟ =
(1+ β) r + k (r - 1)! k! ⎝ 1 - (1- q)β / (1+β) ⎠
q k βk (r + k -1)! (1+ β)r + k (r + k -1)! (qβ)k

= .
(1+ β) r + k (r - 1)! k! (1 + qβ)r + k (r - 1)! k! (1 + qβ)r + k
This is a Negative Binomial Distribution with parameters r and qβ.

Comment: The sum was simplified using the fact that the Negative Binomial densities sum to 1:
i=∞ i=∞
(s + i -1)! ⎛ α ⎞ i
∑ ∑
(s + i -1)! αi
1= .⇒ = (1+α)s.
i! (s - 1)! (1+ α) s + i i! (s - 1)! ⎝ 1+ α ⎠
i=0 i=0
i=∞
∑ i! (s - 1)! γ i = (1-1γ)s .
(s + i -1)!
⇒
i=0
Where, γ = α/(1+α). ⇒ α = γ/(1-γ). ⇒ 1+ α = 1/(1-γ).

17.11. B. The conditional mean given q is: mq. The unconditional mean can be obtained by
integrating the conditional means versus the distribution of q:
1 1
Γ(a + b) a - 1
E[X] = ∫0 E[X | q] g(q) dq =
∫0 mq
Γ(a) Γ(b)
q (1- q)b - 1 dq =
1
Γ(a + b) Γ(a + b) Γ(a +1) Γ(b) Γ(a +1) Γ(a+ b)
m
Γ(a) Γ(b) ∫0 qa (1- q)b - 1 dq = m
Γ(a) Γ(b) Γ(a + b + 1)
=m
Γ(a) Γ(a + b + 1)
= ma / (a+b).
Alternately,
1 1
E[X] = ∫0 E[X | q] g(q) dq = m ∫0 q g(q) dq = m (mean of Beta Distribution) = ma / (a+b).
Γ(a + b) a-1
Comment: The Beta distribution with θ = 1 has density from 0 to 1 of: x (1-x)b-1.
Γ(a) Γ(b)
Γ(a) Γ(b)
Therefore, the integral from zero to one of xa-1(1-x)b-1 is: .
Γ(a + b)
17.12. D. The conditional variance given q is: mq(1-q) = mq - mq2 . Thus the conditional second
moment given q is: mq - mq2 + (mq)2 = mq + (m2 - m)q2 . The unconditional second moment can
be obtained by integrating the conditional second moments versus the distribution of q:
1 1
Γ(a + b)
E[X2 ] =
∫0 E[X2 | q] g(q) dq = ∫0 {mq + (m2 - m)q2} Γ(a)Γ(b) qa - 1(1- q)b - 1 dq =
1 1
Γ(a + b) Γ(a + b)
m
Γ(a)Γ(b) ∫0 qa (1- q)b - 1 dq + (m2 - m)
Γ(a)Γ(b) ∫0 qa + 1(1- q)b - 1 dq =
Γ(a + b) Γ(a +1) Γ(b) Γ(a + b) Γ(a + 2) Γ(b)
m + (m2 - m) =
Γ(a)Γ(b) Γ(a + b + 1) Γ(a)Γ(b) Γ(a + b + 2)
Γ(a +1) Γ(a+ b) Γ(a + 2) Γ(a + b)

m + (m2 - m) =
Γ(a) Γ(a + b + 1) Γ(a) Γ(a + b + 2)
a a (a + 1) a
m + (m2 - m) . Since the mean is m , the variance is:
a + b (a + b)(a + b + 1) a + b
a a (a + 1) a2
m + (m2 - m) - m2 =
a + b (a + b)(a + b + 1) (a + b)2
a
m {(a+b+1)(a+b) + (m-1)(a+1)(a+b) - ma(a+b+1)} =
(a + b)2 (a+ b +1)
a (m + a + b) a b
m (ab + b2 + mb) = m .
(a + b)2 (a+ b +1) (a + b)2 (a + b + 1)
1 1 1
Alternately, E[X2 ] =
∫0 E[X2
∫0
| q] g(q) dq = m q g(q)dq + (m2 - m) ∫0 q2 g(q)dq =
m (mean of Beta Distribution) + (m2 - m) (second moment of the Beta Distribution) =
a a (a + 1)
m + (m2 - m) . Then proceed as before.
a + b (a + b)(a + b + 1)
Comment: This is an example of the Beta-Binomial Conjugate Prior Process. See “Mahlerʼs Guide
to Conjugate Priors.” The unconditional distribution is sometimes called a “Beta-Binomial”
Distribution. See Example 7.12 in Loss Models or Kendallʼs Advanced Theory of Statistics by
Stuart and Ord.
17.13. E. The probability density of q is a Beta Distribution with parameters a and b:

Γ(a + b) a-1
q (1-q)b-1.
Γ(a)Γ(b)
One can compute the unconditional density via integration:
1
Γ(a + b) a - 1
f(n) = ∫0 f(n | q)
Γ(a)Γ(b)
q (1- q)b - 1 dq =
1
Γ(a + b)
∫0
m!
qn (1- q)m - n qa - 1 (1- q)b - 1 dq =
Γ(a)Γ(b) n! (m- n)!
1
Γ(a + b) Γ(m +1)
Γ(a)Γ(b) Γ(n+ 1) Γ(m+ 1- n) ∫0 qa + n - 1 (1- q)b + m - n - 1 dq =
Γ(a + b) Γ(m +1) Γ(a + n) Γ(b + m- n) Γ(a + b) Γ(m+ 1) Γ(a+ n) Γ(b + m- n)
= .
Γ(a)Γ(b) Γ(n+ 1) Γ(m+ 1- n) Γ(a+ b + m) Γ(a) Γ(b) Γ(n + 1) Γ(m+1- n) Γ(a + b + m)
For n = 5, a = 2, b = 4, and m = 7.
Γ(6) Γ(8) Γ(7) Γ(6) 5! 7! 6! 5!
f(5) = = = 0.07576.
Γ(2) Γ(4) Γ(6) Γ(3) Γ(13) 1! 3! 5! 2! 12!
Comment: Beyond what you are likely to be asked on your exam.
The probability of observing other number of claims in 7 trials is as follows:
n 0 1 2 3 4 5 6 7
f(n) 0.15152 0.21212 0.21212 0.17677 0.12626 0.07576 0.03535 0.01010
F(n) 0.15152 0.36364 0.57576 0.75253 0.87879 0.95455 0.98990 1.00000
This is an example of the “Binomial-Beta” distribution with: a = 2, b = 4, and m = 7.
rβ
17.14. B. For a Negative Binomial Distribution, f(1) = .
(1 + β)r + 1
For Type A: f(1) = (0.8) (0.2) / (1.21.8) = 11.52%.
For Type B: f(1) = (0.8) (0.5) / (1.51.8) = 19.28%.
(70%) (11.52%) + (30%) (19.28%) = 13.85%.

Pmixture[z] = Σ f(m){1 + q(z-1)}m = p.g.f. of f at: 1 + q(z-1).
However, f(m) is Poisson, with p.g.f.: exp[λ(z-1)].

Therefore, Pmixture[z] = exp[λ{1 + q(z-1) - 1}] = exp[λq(z-1)].
However, this is the p.g.f. of a Poisson Distribution with mean qλ, which is therefore the mixed
distribution.
Alternately, the mixed distribution at k is:
∞ ∞
e - λ λm
∑ Prob[k | m]Prob[m] = ∑
m! k m - k
q (1- q)
(m- k)! k! m!
m=k m=k
∞ ∞
qk e - λ λk λ m- k qk e - λ λk λn qk e - λ λk
=
k! ∑ (1- q)m - k
(m- k)!
=
k! ∑ (1- q)n
n!
=
k!
exp[(1-q)λ]
m=k n=0
= (qλ)k e−λq / k!. This is a Poisson Distribution with mean qλ.
17.16. C. This is a Geometric Distribution (a Negative Binomial with r =1), parameterized

somewhat differently than in Loss Models, with p = 1/(1 + β). Therefore for a given value of p the
mean is: µ(p) = β = (1-p)/p. In order to get the average mean over the whole portfolio we need to
take the integral of µ(p) g(p) dp.
1 1 1
∫0 µ(p) g(p) dp = ∫0 {(1- p) / p} 280 p3(1- p)4 dp = 280 ∫0 p2(1- p)5 dp = 280 Γ(3)Γ(6) / Γ(3+6)
= 280 (2!)(5!) / 8! = 5/3 .
Comment: Difficult! Special case of mixing a Negative Binomial (for r fixed) via a Beta Distribution.
See Example 7.13 in Loss Models, where the mixed distribution is called the Generalized Waring.
For the Generalized Waring in general, the a priori mean turns out to be rb/(a-1). For r =1, b = 5 and
a = 4, the a priori mean is (1)(5)/3 = 5/3.
17.17. D. The probability density of p is a Beta Distribution with parameters a and b:

Γ(a + b) a-1
p (1-p)b-1.
Γ(a)Γ(b)
One can compute the unconditional density at n via integration:

1 1
Γ(a + b) a - 1 Γ(a + b)
f(n) = ∫0 f(n | p)
Γ(a)Γ(b)
p (1- p)b - 1 dp =
Γ(a)Γ(b) ∫0 p(1- p)x pa - 1(1- p)b - 1 dp =
1
Γ(a + b) Γ(a + b) Γ(a +1) Γ(b + n) Γ(a + b) Γ(b + n)
Γ(a)Γ(b) ∫0 pa(1- p)b + n - 1 dp = Γ(a)Γ(b) Γ(a + b + n +1)
=a
Γ(b) Γ(a + b + n +1)
.
Γ(9) Γ(5 + n) 8! (n+ 4)! (n + 4)!

For a = 4, b = 5: f(n) = 4 =4 = 6720 .
Γ(5) Γ(10 + n) 4! (n+ 9)! (n + 9)!
6!
f(2) = 6720 = 12.1%.
11!
Γ(a + b) a-1
Comment: The Beta distribution with θ = 1 has density from 0 to 1 of: x (1-x)b-1.
Γ(a)Γ(b)
Therefore, the integral from zero to of xa-1(1-x)b-1 is: Γ(a)Γ(b) / Γ(a+b).

This is an example of a Generalized Waring Distribution, with r = 1, a = 4 and b = 5.
See Example 7.13 in Loss Models. The probabilities of observing 0 to 20 claims is as follows:
0.444444, 0.222222, 0.121212, 0.0707071, 0.043512, 0.027972, 0.018648, 0.0128205,
0.00904977, 0.00653595, 0.00481596, 0.00361197, 0.00275198, 0.00212653, 0.00166424,
0.00131752, 0.00105402, 0.000851323, 0.00069367, 0.000569801, 0.000471559.
Since the densities must add to unity:
∞ ∞
∑ ∑ Γ(a+ b + n +1) = a Γ(a

Γ(a + b) Γ(b + n) Γ(b+ n) Γ(b)
1= a .⇒ .
Γ(b) Γ(a + b + n +1) + b)
n=0 n=0
17.18. E. E[λ] = the mean of the prior mixed exponential = weighted average of the means of the
two exponential distributions = (0.8)(1/40) + (0.2)(1/10) = 4.0%.
17.19. C. Given λ, f(0) = e−λ.

∞ ∞ ∞
∫0 f(0; λ) π(λ) dλ = 32 ∫0 e-41λ

∫0
dλ + 2 e-11λ dλ = (32/41) + (2/11) = 0.9623.
Prob[at least one claim] = 1 - 0.9623 = 3.77%.

Pmixture[z] = ∑ f(m) {1 + q(z - 1)}m = p.g.f. of f at: 1 + q(z-1).
However, f(m) is Binomial with parameters 5 and 0.1, with p.g.f.: {1 + 0.1(z-1)}5 .
Therefore, Pmixture[z] = {1 + .1(1 + q(z-1) -1)}5 = {1 + .1q(z-1)}5 .
However, this is the p.g.f. of a Binomial Distribution with parameters 5 and 0.1q, which is therefore
the mixed distribution.
Alternately, the mixed distribution at k ≤ 5 is:
∞ ∞
∑ Prob[k | m]Prob[m] = ∑ (m-k)! k! qk (1- q)m - k (5 - m)! m! 0.1m 0.95 - m

m! 5!
m=k m=k
∑ (m-k)! (5 - m)! (1- q)m - k 0.1m - k 0.95 - m

5! (5 - k)!
= qk 0.1k
(5 -k)! k!
m=k
∑ n! (5 - k - n)! (1- q)n 0.1n 0.95 - k - n

5! (5 -k)!
= qk 0.1k
(5 -k)! k!
n=0
5! 5!
= qk 0.1k {(1-q)(0.1) + 0.9}5- k = (0.1q)k (1 - 0.1q)5- k.
(5 -k)! k! (5 -k)! k!
This is a Binomial Distribution with parameters 5 and 0.1q.
Comment: The sum was simplified using the Binomial expansion:
m
∑ xi ym - i i! (m- i)! .
m!
(x + y)m =
i=0
17.21. D. Given q, we have a Binomial with parameters m = 3 and q. The chance that we observe
zero claims is: (1-q)3 . The distribution of q is uniform: π(q) = 2.5 for 0 ≤ q ≤ 0.4.
0.4 0.4 q = 0.4
f(0) = ∫0 f(0 | q) π(q) dq =
∫0 (1 - q)3 (2.5) dq = (-2.5 / 4)(1 - q)4
q=0
]
= (-0.625)(0.64 - 14 ) = 0.544.
17.22. B. Given q, we have a Binomial with parameters m = 3 and q.

The chance that we observe one claim is: 3q(1-q)2 = 3q - 6q2 + 3q3 .
0.4 0.4
P(c=1) = ∫0 P(c = 1 | q) π(q) dq = ∫0 (3q - 6q2 + 3q3) (2.5) dq =
q = 0.4
(2.5)(1.5q2 - 2q3 + 0.75q4 ) ] = (2.5)(0.24 - 0.128 + 0.0192) = 0.328.
q=0
17.23. A. Given q, we have a Binomial with parameters m = 3 and q.

The chance that we observe two claims is: 3q2 (1-q) = 3q2 - 3 q3 .
0.4 0.4 q = 0.4
P(c=2) = ∫0 P(c = 2 | q) π(q) dq = ∫0 (3q 2 - 3q3 ) (2.5) dq = (2.5)(q3 - 4
0.75q ) ]
q=0
= (2.5)(0.064 - 0.0192) = 0.112.
17.24. B. Given q, we have a Binomial with parameters m = 3 and q.

The chance that we observe three claims is: q3 .
0.4 0.4 q = 0.4
P(c=3) = ∫0 P(c = 3 | q) π(q) dq = ∫0 (q3) (2.5) dq = 4
(2.5)(q / 4) ]
q=0
= (2.5)(0.0064) = 0.016.
Comment: Since we have a Binomial with m = 3, the only possibilities are 0, 1, 2 or 3 claims.
Therefore, the probabilities for 0, 1, 2 and 3 claims (calculated in this and the prior three questions)
add to one: 0.544 + 0.328 + 0.112 + 0.016 = 1.
17.25. D. This is a 40%-60% mixture of zero and a Poisson with λ = .07.

The second moment of the Poisson is: variance + mean2 = .07 + .072 = .0749.
The mean of the mixture is: (40%)(0) + (60%)(.07) = .042.
The second moment of the mixture is: (40%)(0) + (60%)(.0749) = .04494.
The variance of the mixture is: .04494 - .0422 = .0432, per student.
For a group of 100 students the variance is: (100)(.0432) = 4.32.
17.26. C. For λ = 0.2, f(1) = 0.2e-0.2 = 0.1638. For λ = 0.6, f(1) = .6e-.6 = 0.3293.
Prob[1 coin] = (.5)(.1638) + (.5)(.3293) = 24.65%.
17.27. A. Over two minutes, the mean is either 0.4: f(1) = .4e-.4 = 0.2681,
or the mean is 1.2: f(1) = 1.2e-1.2 = 0.3614.
Prob[1 coin] = (.5)(.2681) + (.5)(.3614) = 31.48%.
17.28. C. Prob[0 coins during a minute] = (.5)e-.2 + (.5)e-.6 = 0.6838.

Prob[1 coin during a minute] = (.5).2e-.2 + (.5).6e-.6 = 0.2465.
Prob[A + B = 1] = Prob[A= 0]Prob[B] + Prob[A = 1]Prob[B = 0] = (2)(.6838)(.2465) = 33.71%.
Comment: Since the minutes are on different days, their lambdas are picked independently.
17.29. C.
Prob[1 coin during third minute and 1 coin during fifth minute | λ = 0.2] = (.2e-.2)(.2e-.2) = 0.0268.
Prob[1 coin during third minute and 1 coin during fifth minute | λ = 0.6] = (.6e-.6)(.6e-.6) = 0.1084.
(0.5)(0.0268) + (0.5)(0.1084) = 6.76%.
Comment: Since the minutes are on the same day, they have the same λ, whichever it is.
17.30. B. Prob[1 coin during a minute] = (0.5)0.2e-0.2 + (0.5)0.6e-0.6 = 0.2465.

Since the minutes are on different days, their lambdas are picked independently.
Prob[1 coin during 1 minute today and 1 coin during 1 minute tomorrow] =
Prob[1 coin during a minute] Prob[1 coin during a minute] = 0.24652 = 6.08%.
17.31. A. Prob[1 coin during 4 minutes] = (0.5)0.8e-0.8 + (0.5)2.4e-2.4 = 0.2866.

Since the time intervals are on different days, their lambdas are picked independently.
Prob[1 coin during 4 minutes today and 1 coin during 4 minutes tomorrow] =
Prob[1 coin during 4 minutes] Prob[1 coin during 4 minutes] = 0.28662 = 8.33%.
17.32. B. Prob[1 coin during two minute and 2 coins during following 3 minutes | λ = 0.2] =
(0.4e-0.4) (0.62 e-0.6/2) = 0.0265.
Prob[1 coin during two minute and 2 coins during following 3 minutes | λ = 0.6] =
(1.2e-1.2) (1.82 e-1.8/2) = 0.0968. (0.5)(0.0265) + (0.5)(0.0968) = 6.17%.
b b λ =b
(d + λ) λd λd + 2 ⎤
∫a λ f(λ) dλ = ∫a λ bd + 1 - ad + 1 dλ = bd + 1 - ad + 1
d +1
17.33. E. ⎥⎦ =
d+ 2
λ=a
d +1 bd + 2 - ad + 2 0.5 0.61.5 - 0.21.5

= = 0.3821.
d + 2 bd + 1 - ad + 1 1.5 0.60.5 - 0.20.5
b b λ =b
(d + λ) λ d λd + 3 ⎤
∫a λ2 f(λ) dλ = ∫a
d +1
17.34. B. λ2 dλ = d + 1 =
bd + 1 - ad + 1 b - ad + 1 d + 3 ⎥⎦
λ=a
d +1 bd + 3 - ad + 3 0.5 0.62.5 - 0.22.5

= = 0.15943.
d + 3 bd + 1 - ad + 1 2.5 0.60.5 - 0.20.5
For fixed λ, the second moment of a Poisson is: λ + λ2.
Therefore, the second moment of the mixture is: E[λ] + E[λ2] = 0.3821 + 0.15943 = 0.5415.
Therefore, the variance of the mixture is: 0.5415 - 0.38212 = 0.3955.
Alternately, Variance[λ] = Second Moment[λ] - Mean[λ]2 = 0.15943 - 0.38212 = 0.0134.
The variance of frequency for a mixture of Poissons is: E[λ] + Var[λ] = 0.3821 + 0.0134 = 0.3955.
17.35. A. E[X] = (.5)(6)(.8) + (.5)(6)q = 2.4 + 3q.

The second moment of a Binomial is: mq(1 - q) + (mq)2 = mq - mq2 + m2 q2 .
E[X2 ] = (0.5) {(6)(0.8) - (6)(0.82 ) + (62 )(0.82 )} + (0.5){6q - 6q2 + 36q2 } = 12 + 3q + 15q2 .
Var[X] = 12 + 3q + 15q2 - (2.4 + 3q)2 = 6.24 - 11.4q + 6q2 .
E[X] = Var[X]. ⇒ 2.4 + 3q = 6.24 - 11.4q + 6q2 . ⇒ 6q2 - 14.4q + 3.84 = 0.
q = {14.4 ± 14.42 - (4)(6)(3.84) } / 12 = {14.4 ± 10.7331} / 12 = 2.094 or 0.3056.
Comment: 0 ≤ q ≤ 1. When one mixes distributions, the variance increases. As discussed in
“Mahlerʼs Guide to Buhlmann Credibility,” Var[X] = E[Var[X | q]] + Var[E[X | q]] ≥ E[Var[X | q]].
Since for a Binomial Distribution, the variance is less than the mean, for a mixture of Binomial
Distributions, the variance can be either less than, greater than, or equal to the mean.
17.36. D. For λ = 0.04, Prob[more than 1 claim] = 1 - e-0.04 - 0.04 e-0.04 = 0.00077898.
For λ = 0.10, Prob[more than 1 claim] = 1 - e-0.10 - 0.10 e-0.10 = 0.00467884.
Prob[more than 1 claim] = (0.5)(0.00077898) + (0.5)(0.00467884) = 0.273%.
17.37. B. For λ = 0.04, the mean is 0.04 and the 2nd moment is: λ + λ2 = 0.04 + 0.042 = 0.0416.
For λ = 0.10, the mean is 0.10 and the second moment is: λ + λ2 = 0.10 + 0.102 = 0.11.
Therefore, the mean of the mixture is: (0.5)(0.04) + (0.5)(0.10) = 0.07, and the second moment of
the mixture is: (0.5)(0.0416) + (0.5)(0.11) = 0.0758.
The variance of the mixed distribution is: 0.0758 - 0.072 = 0.0709.
Alternately, Variance[λ] = Second Moment[λ] - Mean[λ]2 =
(0.5)(0.042 ) + (0.5)(0.12 ) - 0.072 = 0.0009.
The variance of frequency for a mixture of Poissons is:
Expected Value of the Process Variance + Variance of the Hypothetical Means =
E[λ] + Var[λ] = 0.07 + 0.0009 = 0.0709.
17.38. D. (25%)(e-0.25) + (75%)(e-0.5) = 65.0%.
17.39. A. (25%)(0.25 e-0.25) + (75%)(0.5 e-0.5) = 27.6%.
17.40. B. (25%)(0.252 e-0.25/2) + (75%)(0.52 e-0.5/2) = 6.3%.
17.41. D. If there is one home game and two road games, then the distributions of road wins is:
2 @ 16%, 1 @ 48%, 0 @ 36%.
Thus the chance of winning at least 2 games is:
Prob[win 2 road] + Prob[win 1 road] Prob[win one home] = 16% + (48%)(80%) = 0.544.
If instead there is one road game and two home games, then the distributions of home wins is:
2 @ 64%, 1 @ 32%, 0 @ 4%.
Thus the chance of winning at least 2 games is:
Prob[win 2 home] + Prob[win one home] Prob[win 1 road] = 64% + (32%)(40%) = 0.768.
Thus the chance the Spiders win the series is:
(50%)(0.544) + (50%)(0.768) = 65.6%.
Comment: This is a 50%-50% mixture of two situations.
(Each situation has its own distribution of games won.)
While in professional sports there is a home filed advantage, it is not usually this big.
Note that for m = 3 and q = (0.8 + 0.4)/2 = 0.6, the probability of at least two wins is:
0.63 + (3)(0.62 )(0.4) = 0.648 ≠ 0.656.
17.42. The mean of the mixture is: p λ1 + (1 - p)λ2.
The second moment of a Poisson is: variance + mean2 = λ + λ2.
Therefore, the second moment of the mixture is: p (λ1 + λ12) + (1 - p)(λ2 + λ22).
Variance of the mixture is: p (λ1 + λ12) + (1 - p)(λ2 + λ22) - {p λ1 + (1 - p)λ2}2 .

For the mixture, the ratio of the variance to the mean is:
p (λ 1 + λ12 ) + (1 - p)(λ 2 + λ2 2 )
- {p λ1 + (1 - p)λ2}.
p λ1 + (1 - p)λ 2
For λ1 = 10% and λ2 = 20%, the ratio of the variance to the mean is:
p 0.11 + (1 - p) 0.24 0.24 - 0.13p
- {p 0.1 + (1 - p)0.2} = - (0.2 - 0.1p).
p 0.1 + (1 - p) 0.2 0.2 - 0.1p
Here is a graph of the ratio of the variance to the mean as a function of p:
Ratio
1.0175
1.015
1.0125
1.01
1.0075
1.005
1.0025
p
0.2 0.4 0.6 0.8 1
Comment: For either p = 0 or p = 1, this ratio is 1.
For either p = 0 or p = 1, we have a single Poisson and the mean is equal to the variance.
For 0 < p < 1, mixing increases the variance, and the variance of the mixture is greater than its mean.
0.24 - 0.13p
For example, for p = 80%, - (0.2 - 0.1p) = 1.013.
0.2 - 0.1p
17.43. B. Given q, we have a Binomial with parameters m = 4 and q. The chance that we observe
2500 2
zero claims is: (1-q)4 . The distribution of q is: π(q) = q (1-q).
99
0.6 0.6
∫0 ∫0 q2(1- q)5 dq =
2500
f(0) = f(0 | q) π(q) dq =
99
0.6
∫0 q2 - 5q3 + 10q4 - 10q5 + 5 q6 - q7 dq =

2500
99
2500
{0.63 /3 - (5)(0.64 )/4 + (10)(0.65 )/5 - (10)(0.66 )/6 + (5)(0.67 )/7 - 0.68 /8} = 14.28%.
99
17.44. D. Given q, we have a Binomial with parameters m = 4 and q. The chance that we observe
2500 2
one claim is: 4q(1-q)3 . The distribution of q is: π(q) = q (1-q).
99
0.6 0.6
∫0 ∫0 q3(1- q)4 dq =
2500
f(1) = f(1 | q) π(q) dq = (4)
99
0.6
∫0 q3 - 4q4 + 6q5 - 4q6 + q7 dq =

10,000
99
10,000
{0.64 /4 - (4)(0.65 )/5 + (6)(0.66 )/6 - (4) (0.67 )/7 + 0.68 /8} = 29.81%.
99
17.45. E. Given q, we have a Binomial with parameters m = 4 and q. The chance that we observe
2500 2
two claims is: 6q2 (1-q)2 . The distribution of q is: π(q) = q (1-q).
99
0.6 0.6 0.6
∫0 ∫0 ∫0 q4 - 3q5 + 3q6 - q7 dq =
2500 5000
f(2) = f(2 | q) π(q) dq = (6) q4 (1- q)3 dq =
99 33
5000
{0.65 /5 - (3)(0.66 )/6 + (3)(0.67 )/7 - 0.68 /8} = 32.15%.
33
17.46. A. Given q, we have a Binomial with parameters m = 4 and q. The chance that we observe
2500 2
three claims is: 4q3 (1-q). The distribution of q is: π(q) = q (1-q).
99
0.6 0.6 0.6
∫0 ∫0 ∫0 q5 - 2q6 + q7 dq =
2500 10,000
f(3) = f(3 | q) π(q) dq = (4) q5 (1- q)2 dq =
99 99
10,000
{0.66 /6 - (2)(0.67 )/7 + 0.68 /8} = 18.96%.
99
17.47. C. Given q, we have a Binomial with parameters m = 4 and q. The chance that we observe
2500 2
four claims is: q4 . The distribution of q is: π(q) = q (1-q).
99
0.6 0.6 0.6
∫0 ∫0 ∫0 q6 - q7 dq =
2500 2500
f(4) = f(4 | q) π(q) dq = q6 (1- q) dq =
99 99
2500
{0.67 /7 - 0.68 /8} = 4.80%.
99
Comment: Since we have a Binomial with m = 4, the only possibilities are 0, 1, 2, 3 or 4 claims.
Therefore, the probabilities for 0, 1, 2, 3, and 4 claims must add to one:
14.28% + 29.81% + 32.15% + 18.96% + 4.80% = 1.
17.48. E. Let x be the mean for the low risk policies.

The mean of the mixture is: (3/4)x + 0.4/4 = 0.75x + 0.1
The second of the mixture is the mixture of the second moments:
(3/4)(x + x2 ) + (0.4 + 0.42 )/4 = 0.75x2 + 0.75x + 0.14.
Thus the variance of the mixture is:
0.75x2 + 0.75x + 0.14 - (0.75x + 0.1)2 = 0.1875x2 + 0.6x + 0.13.
Thus, 0.2575 = 0.1875x2 + 0.6x + 0.13. ⇒ 0.1875x2 + 0.6x - 0.1275 = 0. ⇒
-0.6 ± 0.62 - (4)(0.1875)(-0.1275)
x= = 0.20, taking the positive root.
(2)(0.1875)
Comment: You can try the choices and see which one works.
17.49. (a) The mean the mixture is: E[λ] = µ.
The second moment of each Poisson is: λ + λ2.
Thus the second moment of the mixture is: E[λ + λ2] = E[λ] + E[λ2] = µ + µ + µ2 = 2µ + µ2.
Thus the variance of the mixture is: 2µ + µ2 - µ2 = 2µ.

Alternately, the process variance given λ is λ.
Thus the Expected Value of the Process Variance is E[λ] = µ.
The Variance of the Hypothetical Means is the variance of the distribution of the lambdas, which is µ.
Total variance is: EPV + VHM = µ + µ = 2µ.
(b) EN[zn | λ] is the Probability Generating Function of a Poisson with mean λ: exp[λ(z-1)].
∞ ∞ ∞
e- µ µλ
P[z] = EN[zn ] = ∑ π[λ ] EN [zn | λ] = ∑ π[λ ] exp[λ(z -1)] = ∑ λ!
exp[λ(z -1)] =
λ=0 λ=0 λ=0
∞
{µ ez - 1}λ
e−µ ∑ λ!
= e−µ exp[µ ez-1] = exp[µ(ez-1 - 1)].
λ=0
Alternately, when mixing Poissons, Pmixed distribution(z) = Pmixing distribution of λ(ez-1).
Thus the mixture has probability generating function: exp[µ(ez-1 - 1)].

(c) Pʼ(z) = exp[µ(ez-1 - 1)] µ ez-1. Mean = Pʼ(1) = µ.
Pʼʼ(z) = exp[µ(ez-1 - 1)] µ ez-1 µ ez-1 + exp[µ(ez-1 - 1)] µ ez-1. Pʼʼ(1) = µ2 + µ.
Second factorial moment = E[N(N-1)] = Pʼʼ(1) = µ2 + µ.
Thus E[N2 ] = µ2 + µ + E[N] = µ2 + µ + µ = µ2 + 2µ.
Var[N] = E[N2 ] - E[N]2 = µ2 + 2µ - µ2 = 2µ.

Comment: Parts (b) and (c) are beyond what you should be asked on your exam.
The alternate solution to part (a) is discussed in “Mahlerʼs Guide to Buhlmann Credibility.”
17.50. (a) mean = (0.5)(4) + (0.5)(10) = 7.

The second moment of each Poisson is its variance of square of its mean.
second moment of the mixture is: (0.5)(4 + 42 ) + (0.5)(10 + 102 ) = 65.
Thus the variance of the mixture is: 65 - 72 = 16.
mq
(b) The mean of each zero truncated Binomial is: .
1 - (1- q)m
(2)(0.5) (8)(0.4)
For the first one: = 1.3333. For the second one: = 3.2547.
1 - 0.52 1 - 0.68
mean of the mixture is: (0.3)(1.3333) + (0.7)(3.2547) = 2.6783.
mq {(1- q) - (1 - q + mq) (1- q)m }
The variance of each zero truncated Binomial is: .
{1 - (1- q)m}2
(2)(0.5) {(0.5) - (1 - 0.5 + 1) (0.52)}

Variance for the first one: = 0.2222.
{1 - 0.52}2
(8)(0.4) {(0.6) - (1 - 0.4 + 3.2) (0.68)}

Variance for the second one: = 1.7749.
{1 - 0.68}2
The second moment of each zero truncated Binomial is its variance of square of its mean.
second moment of the mixture is: (0.3)(0.2222 + 1.33332 ) + (0.7)(1.7749 + 3.25472 ) = 9.2575.
Thus the variance of the mixture is: 9.2575 - 2.67832 = 2.0842.
Alternately, the second moment of a Binomial Distribution is: mq(1-q) + (mq)2 = mq(1 + mq - q).
mq(1 + mq - q)
Thus the second moment of a zero truncated Binomial Distribution is: .
1 - (1- q)m
(2)(0.5) (1 + 1 - 0.5)
Second moment for the first one: = 2.
1 - 0.52
(8)(0.4) (1 + 3.2 - 0.4)
Second moment for the second one: = 12.3678.
1 - 0.68
second moment of the mixture is: (0.3)(2) + (0.7)(12.3678) = 9.2575.
(c) mean of the compound distribution: (7)(2.6783) = 18.748.
variance of the compound distribution is: (7)(2.0842) + (2.67832 )(16) = 129.36.
(Treat size of families as severity in a collective risk model of aggregate losses.)
(d) Prob[at least 25 people] = 1 - Φ[(24.5 - 18.748) / 129.36 ] = 1 - Φ[0.51] = 30.5%.
17.51. mean of mixture is: E[β] = mean of Beta Distribution = a/(a+b) = 4/7.
The second moment of the Geometric is: β(1+β) + β2 = β + 2β2.

Therefore, the second moment of the mixture is:
E[β + 2β2] = E[β] + 2E[β2] = mean of Beta Distribution + twice second moment of Beta DIstribution
= a/(a+b) + 2 a (a+1) / {(a+b)(a+b+1)} = 4/7 + (2)(4)(5) / {(7)(8)} = 1.2857.
Therefore, the variance of the mixture is: 1.2857 - (4/7)2 = 0.959.
Alternately, the process variance is β(1+β).
Thus the expected value of the process variance is: E[β(1+β)] = E[β] + E[β2] =
= mean of Beta Distribution + second moment of Beta DIstribution
= a/(a+b) + 2 a (a+1) / {(a+b)(a+b+1)} = 4/7 + (4)(5) / {(7)(8)} = 0.92857.
The variance of the hypothetical means is: Var[β] =
second moment of Beta DIstribution - square of mean of Beta Distribution =
(4)(5) / {(7)(8)} - (4/7)2 = 0.03061.
Variance of the mixture is: EPV + VHM = 0.92857 + 0.03061 = 0.959.
Comment: The alternate solution is discussed in “Mahlerʼs Guide to Buhlmann Credibility.”
17.52. A. The probability of a claim being large is 40%.

Thus conditional on λ = 1, the number of large claims is Poisson with λ = 0.4, while conditional on
λ = 2 the number of large claims is Poisson with λ = 0.8.
Thus the probability of 2 large claims is:
(0.5) (0.42 e-0.4 / 2) + (0.5) (0.82 e-0.8 / 2) = 0.0987.
17.53. D. The probability of a claim being small is 60%.

Thus conditional on λ = 1, the number of large claims is Poisson with λ = 0.6, while conditional on
λ = 2 the number of large claims is Poisson with λ = 1.2.
Thus the probability of 2 small claims is:
(0.5) (0.62 e-0.6 / 2) + (0.5) (1.22 e-1.2 / 2) = 0.1578.
17.54. E. In order to have 2 large and 2 small claims, one has to have 4 claims in total.
The probability of 4 claims is: (0.5) (14 e-1 / 24) + (0.5) (24 e-2 / 24) = 0.05278.
Given that one has 4 claims, the probability that 2 are large and 2 are small is:
(6)(0.42 )(0.62 ) = 0.3456.
Thus the probability of 2 large and 2 small claims is: (0.5278)(0.3456) = 0.01824.
Alternately, Prob[2 small & 2 large | λ = 1] =
(density at 2 for a Poisson with mean 0.6) (density at 2 for a Poisson with mean 0.4) =
(0.62 e-0.6 / 2) (0.42 e-0.4 / 2) = 0.0052975.
Prob[2 small & 2 large | λ = 2] =
(density at 2 for a Poisson with mean 1.2) (density at 2 for a Poisson with mean 0.8) =
(1.22 e-1.2 / 2) (0.82 e-0.8 / 2) = 0.0311812.
Thus for the mixture the probability of 2 large and 2 small claims is:
(0.5)(0.0052975) + (0.5)(0.0311812) = 0.01824.
Comment: While for each Poisson the number of large and small claims are independent,
the number of large and small claims are not independent for the mixture.
(0.0987)(0.1578) = 0.01557 ≠ 0.01824.
Given a lot of large claims, the probability that λ is 2 is bigger and thus the probability of a lot of
small claims is also higher.
17.55. C. The frequency distribution for the class =

1 1 θ=1
x+1
(1- θ)x + 1 ⎤
∫p ∫p x -(1- p)
f(x | θ) g(θ) dθ = - (1- θ) / ln(p) dθ = = .
(x +1) ln(p) ⎥⎦ (x + 1) ln(p)
θ=p
Comment: 4,11/82, Q.48, rewritten. Note that f(x|θ) is a geometric distribution.

The mixed frequency distribution for the class is a logarithmic distribution, with
β = 1/p -1 and x+1 running from 1 to infinity (so that f(0) is the logarithmic distribution at 1, f(1) is the
logarithmic at 2, etc. The support of the logarithmic is 1,2,3,...)
17.56. E. Var[Y] = EX[VAR[Y | X]] + VARX[E[Y | X]] = EX[x] + VARX[x] = mq + mq(1 - q) =

mq(2 - q).
Comment: Total Variance = Expected Value of the Process Variance + Variance of the Hypothetical
Means. See “Mahlerʼs Guide to Buhlmann Credibility.”
∫
17.57. E. P(Y=0) = P(Y = 0 | θ) f(θ) dθ = ∫ e- θ f(θ) dθ.
For the first case, f(θ) = 1/2, for 0 ≤ θ ≤ 2
2
P(Y=0) = ∫0 e-θ / 2 dθ = (1 - e-2)/2 = 0.432.
For the second case, f(θ) = e−θ, for θ > 0 and

∞
P(Y=0) = ∫0 e-2θ dθ = 1/2.
For the third case, P(Y=0) = e-1 = 0.368. In the first and third cases P(Y=0) < 0.45.
Comment: Three separate problems in which you need to calculate P(Y=0) given
three different distributions of θ.
17.58. A. The chance of zero or one claim for a Poisson distribution is: e−λ + λe−λ.
5 λ=5
Prob(0 or 1 claim) = (1/5) ∫0 e- λ + λe - λ dλ = (1/5) (-2e- λ - λe )
- λ]
λ=0
= (1/5) (2 - 7e-5) = 0.391.
Probability that there are 2 or more claims = 1 - Prob(0 or 1 claim) = 1 - 0.391 = 0.609.
17.59. E. P(z) ≡ E[zN]. The p.g.f. of the Poisson Distribution is: P(z) = eλ(z-1).
Therefore, for the Poisson, E[zN] = eλ(z-1). E[2N | λ] = P(2) = eλ(2-1) = eλ.
4 4
∫0
E[W] = E[2N | λ] (1/ 4) dλ = (1/4) ∫0 eλ dλ = (1/4)(e4 - 1) = 13.4.
17.60. A. For a Negative Binomial with r = 4, f(0) = 1/(1+β)4 .

2 β=2
Prob[0 claims] = ∫ 1/ (1+ β) 4 (1/ 2) dβ = -1/{6(1 + β)3 } ]
β=0
= (1/6)(1 - 33 ) = .1605.
0
Prob[at least 1 claim] = 1 - 0.1605 = 0.8395.

17.61. E. For q = 0.5 and m = 2, f(2) = .52 = .25.

⎛4⎞
For q = 0.5 and m = 4, f(2) = ⎜ ⎟ (0.52 ) (0.52 ) = .375.
⎝2⎠
Probability that the mixed distribution is 2 is: p(.25) + (1 - p)(.375) = 0.375 - 0.125p.
Comment: The solution cannot involve p2 , eliminating choices A, C, and D.
2016-C-1, Frequency Distributions, §18 Gamma Function HCM 10/21/15, Page 357
Section 18, Gamma Function142
The quantity xα−1e-x is finite for x ≥ 0 and α ≥ 1.

Since it declines quickly to zero as x approaches infinity, its integral from zero to ∞ exists.
This is the much studied and tabulated (complete) Gamma Function.
∞ ∞
Γ(α) = ∫ tα - 1 e - t dt = θ−α ∫ tα - 1e - t / θ dt , for α ≥ 0 , θ ≥ 0.
0 0
We prove the equality of these two integrals, by making the change of variables x = t/θ:
∞ ∞ ∞
∫0 tα - 1e- t dt = ∫0 (x / θ)α - 1 e- x/ θ dx/ θ = θ−α ∫0 xα - 1 e- x/ θ dx .
Γ(α) = (α −1) ! Γ(α) = (α-1) Γ(α−1).

Γ(1) = 1. Γ(2) = 1. Γ(3) = 2. Γ(4) = 6. Γ(5) = 24. Γ(6) = 120. Γ(7) = 720. Γ(8) = 5040.
One does not need to know how to compute the complete Gamma Function for noninteger alpha.
Many computer programs will give values of the complete Gamma Function.
Γ(1/2) = π Γ(3/2) = 0.5 π Γ(-1/2) = -2 π Γ(-3/2) = (4/3) π .
ln(2 π ) 1 1 1 1
For α ≥ 10: lnΓ(α) ≅ (α - 0.5) lnα - α + + - + -
2 12 α 360α 3 1260α 5 1680α 7
1 691 1 3617
+ 9 - 11 + 13 - 15 . 143
1188α 360,360 α 156 α 122,400 α
For α < 10 use the recursion relationship Γ(α) = (α−1) Γ(α−1).

The Gamma function is undefined at the negative integers and zero.
For large α: Γ(α) ≅ e-α α α−1/2 2 π , which is Sterlingʼs formula.144
The ratios of two Gamma functions with arguments that differ by an integer can be computed in
terms of a product of factors, just as one would with a ratio of factorials.
142
See Appendix A of Loss Models. Also see the Handbook of Mathematical Functions, by M. Abramowitz, et. al.
143
See Appendix A of Loss Models, and the Handbook of Mathematical Functions, by M. Abramowitz, et. al.
144
See the Handbook of Mathematical Functions, by M. Abramowitz, et. al.
Exercise: What is Γ(7) / Γ(4)?

[Solution: Γ(7) / Γ(4) = 6! / 3! = (6)(5)(4) = 120.]
Exercise: What is Γ(7.2) / Γ(4.2)?

[Solution: Γ(7.2) / Γ(4.2) = 6.2! / 3.2! = (6.2)(5.2)(4.2) = 135.4.]
Note that even when the arguments are not integer, the ratio still involves a product of factors.
The solution of the last exercise depended on the fact that 7.2 - 4.2 = 3 is an integer.
Integrals involving e−x and powers of x can be written in terms of the Gamma function:
∞ ∞
∫0 tα - 1e - t / θ dt = Γ(α) θα, or for integer n: ∫ tn e- c t

0
dt = n! / cn+1.
Exercise: What is the integral from 0 to ∞ of: t3 e-t/10?

[Solution: With α = 4 and θ = 10, this integral is: Γ(4) 104 = (6)(10000) = 60,000.]
This formula for “gamma-type” integrals is very useful for working with anything involving the Gamma
distribution, for example the Gamma-Poisson process. It follows from the definition of the Gamma
function and a change of variables.
The Gamma density in the Appendix of Loss Models is: θ−α xα−1 e−x/θ / Γ(α).
Since this probability density function must integrate to unity, the above formula for gamma-type
integrals follows. This is a useful way to remember this formula on the exam.
Incomplete Gamma Function:
As shown in Appendix A of Loss Models, the Incomplete Gamma Function is defined as:
x
Γ(α ; x) = ∫ tα - 1 e- t dt / Γ(α).
0
Γ(α ; 0) = 0. Γ(α ; ∞) = Γ(α)/Γ(α) = 1. As discussed below, the Incomplete Gamma Function with
the introduction of a scale parameter θ is the Gamma Distribution.
Computing Incomplete Gamma Functions:
Exercise: Via integration by parts, put Γ(2 ; x) in terms of Exponentials and powers of x.
x x t=x
[Solution: Γ(2 ; x) = ∫t e- t dt / Γ(2) = ∫ t e - t dt = -e- t - t e- t ] = 1 - e-x - xe-x.
0 0 t= 0
Comment: ∫ t e- t / θ dt = -θ t e-t/θ - θ2 e-t/θ.]

One can prove via integration by parts that Γ(α ; x) = Γ(α-1 ; x) - xα-1 e-x / Γ(α).145
This recursion formula for integer alpha is: Γ(n ; x) = Γ(n-1 ; x) - xn-1 e-x /(n-1)!.
x
Combined with the fact that Γ(1 ; x) = ∫0 e- t dt = 1 - e-x, this leads to the following formula for the
Incomplete Gamma for positive integer alpha:146
α-1 ∞
∑ ∑
xi e- x xi e- x
Γ(α ; x) = 1 - = .
i! i!
i=0 i=α
Exercise: Compute Γ[4; 6.5].

[Solution: Γ[4; 6.5] = 1 - e-6.5 (1 + 6.5 + 6.52 /2 + 6.53 /6) = 0.8882.]
Relationship of the Gamma to Poisson Processes:147
In general, assume claims are given by a Poisson Process with claims intensity λ. Then the claims in
the time interval from (0, 1) are Poisson Distributed with mean λ. One can calculate the chance that
there are least n claims in two different ways.
First, the chance of at least n claims is a sum of Poisson densities:

n-1 - λ ∞
e - λ λi
∑ ∑
e λi
1 - F(n-1) = 1 - = .
i! i!
i=0 i=n
145
See for example, Formula 6.5.13 in the Handbook of Mathematical Functions, by Abramowitz, et. al.
146
See Theorem A.1 in Appendix A of Loss Models.
147
Not on the syllabus. See “Mahlerʼs Guide to Poisson Processes,” for CAS Exam 3ST.
On the other hand, the times between claims are independent, identically distributed Exponential
Distributions, each with mean θ = 1/λ.148
Thus, the time of the nth claim is a sum of n independent, identically distributed Exponentials with
θ = 1/λ, and thus a Gamma Distribution with α = n and θ = 1/λ.
Thus the nth claim has distribution function at time t of: Γ[α ; t/θ] = Γ[n ; λt].
The chance of at least n claims by time one is the probability that the nth claim occurs by time one,
which is: Γ[n ; λ].
n-1 - λ ∞
e - λ λi
∑ ∑
e λi
Comparing the two results: Γ[n ; λ] = 1 - = .
i! i!
i=0 i=n
Thus, the Incomplete Gamma Function with positive integer shape parameter α can be written in
terms of a sum of Poisson densities:
α-1 ∞
∑ ∑
xi e- x xi e- x
Γ[α ; x] = 1 - = .
i! i!
i=0 i=α
Integrals Involving Exponentials times Powers:
One can use the incomplete Gamma Function to handle integrals involving te-t/θ.
x x/θ
∫ t e- t / θ dt = ∫ θs e-s θds = θ2∫ se-s ds = θ2Γ(2 ; x/θ)Γ(2) = θ2{1 - e-x/θ - (x/θ)e-x/θ}.
0 0
x
∫ t e - t / θ dt = θ2 {1 - e-x/θ - (x/θ)e-x/θ }.
0
Exercise: What is the integral from 0 to 3.4 of: te-t/10?

[Solution: (102 ) {1 - e-3.4/10 - (3.4/10)e-3.4/10} = 4.62.]
148
This fact is used to derive the special algorithm to simulate a Poisson Distribution,
as discussed in “Mahlerʼs Guide to Simulation.”
Such integrals can also be done via integration by parts, or as discussed below using the formula for
the present value of a continuously increasing annuity, or one can make use of the formula for the
Limited Expected Value of an Exponential Distribution:149
x x
∫t e- t / θ dt = θ ∫ t e- t / θ / θ dt = θ {E[X ∧ x] - xS(x)} =
0 0
θ{θ(1 - e-x/θ) - xe-x/θ} = θ2{1 - e-x/θ - (x/θ)e-x/θ}.150
When the upper limit is infinity, the integral simplifies:
∞
∫ t e - t dt = θ2. 151
In a similar manner, one can use the incomplete Gamma Function to handle integrals involving
tn e-t/θ, for n integer:
x x /θ n i -x
∫ tn e - t / θ dt = θn+1 ∫ sn e -s d s = θn+1Γ(n+1; x/θ)Γ(n+1) = n! θn+1{1 - ∑ x ei! }.
0 0 i =0
Exercise: What is the integral from 0 to 3.4 of: t3 e-t/10?

x x /θ
[Solution: ∫ t 3 e - t / θ dt = θ4 ∫ s3 e -s d s = θ4Γ(4 ; x/θ)Γ(4) =
0 0
6θ4{1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2 e-x/θ/2 - (x/θ)3 e-x/θ/6}.

For θ = 10 and x = 3.4, this is:
60000{1 - e-0.34 - 0.34e-0.34 - 0.342 e-0.34/2 - 0.343 e-0.34/6} = 25.49.]
149
See Appendix A of Loss Models.
150
E[X ∧ x] is the limited expected value, as discussed in “Mahler's Guide to Loss Distributions.”
151
If one divided by θ, then the integrand would be t times the density of an Exponential Distribution. Therefore, the
given integral is θ(mean of an Exponential Distribution) = θ2.
Continuously Increasing Annuities:
The present value of a continuously increasing annuity of term n, with force of interest δ, is:152
− nδ
(I a)n = (a n − ne )/δ
where the present value of a continuous annuity of term n, with force of interest δ, is:
-nδ
a n = (1− e )/ δ
However, the present value of a continuously increasing annuity can also be written as the integral
from 0 to n of te-tδ. Therefore,
n
∫ t e− tδ dt = {(1-e-nδ)/δ - ne-nδ}/δ = (1-e-nδ)/δ2 - ne-nδ/δ.
0
Those who remember the formula for the present value of an increasing continuous annuity will find
writing such integrals involving te-tδ in terms of increasing annuities to be faster than doing integration
by parts.
Exercise: What is the integral from 0 to 3.4 of: te-t/10?

[Solution: {(1-e-3.4/10)/0.1 - (3.4)e-3.4/10}/0.1 = (2.882 - 2.420)/0.1 = 4.62.
Comment: Matches the answer gotten above using Incomplete Gamma Functions.
4.62 is the present value of a continuously increasing annuity with term 3.4 years and force of interest
10%.]
152
See for example, The Theory of Interest by Kellison.
Gamma Distribution:153
The Gamma Distribution can be defined in terms of the Incomplete Gamma Function,
F(x) = Γ(α ; x/ θ). Note that Γ(α; ∞) = Γ(α) / Γ(α) = 1 and Γ(α; 0) = 0, so we have as required for a
distribution function F(∞) = 1 and F(0) = 0.
(x / θ)α e - x / θ xα− 1 e - x / θ
f(x) = = , x > ∞.
x Γ(α ) θα Γ (a)
Exercise: What is the mean of a Gamma Distribution?

∞
∞ ∞ ∫ xα e - x/ θ dx
xα-1 e - x/ θ Γ(α+ 1) θα + 1 Γ(α+ 1)
[Solution: ∫ x f(x) dx = ∫ x
θα Γ(α)
dx = 0
θα Γ(α)
=
θ α Γ(α)
=
Γ(α)
θ = αθ.]
0 0
Exercise: What is the nth moment of a Gamma Distribution?

[Solution:
∞
∞ ∞ ∫ xn + α − 1 e - x/ θ dx
xα- 1 e - x/ θ Γ(α+ n) θ α+ n Γ(α+ n) n
∫ xn f(x) dx = ∫ xn θ α Γ(α)
dx = 0
θα Γ(α)
=
θα Γ(α )
=
Γ(α)
θ
0 0
= (α+n-1)(α+n-2)....(α) θn .
Comment: This is the formula shown in Appendix A of Loss Models.]
Exercise: What is the 3rd moment of a Gamma Distribution with α = 5 and θ = 2.5?
[Solution: (α+n-1)(α+n-2)....(α)θn = (5+3-1)(5+3-2)(5)(2.53 ) = 3281.25.]
Relation to the Chi-Square Distribution:
A Chi-Square Distribution with ν degrees of freedom is a Gamma Distribution with shape parameter
of ν/2 and scale parameter of 2 : χ2ν (x) = Γ(ν/2 ; x/2).

Therefore, one can look up values of the Incomplete Gamma Function (for half integer or integer
values of α) by using the cumulative values of the Chi-Square Distribution.
153
See “Mahlerʼs Guide to Loss Distributions.”
For example, Γ(6;10) = the Chi-Square Distribution for 2 x 6 = 12 degrees of freedom at a value of
2 x 10 = 20. For the Chi-Square with 12 d.f. there is a 0.067 chance of a value greater than 20.154
Thus, the value of the distribution function is: χ212 (20) = 1 - 0.067 = 0.933 = Γ(6;10).
∞
Note that Γ(6; 10) = ∑ 10i e - 10 / i! = e-10 (106/6! + 107/7! + 108/8! + ...) = 0.933.
i=6
Relation to the Poisson Distribution:
The distribution function of a Poisson with mean λ is: F(x) = 1 - Γ(x+1 ; λ).
For example, for λ = 1.745, F(3) = 1 - Γ(4 ; 1.745).

Now a Chi-Square Distribution with 8 degrees of freedom is a Gamma Distribution with α = 4 and
θ = 2. Thus, Γ(4 ; 1.745) = χ28 (3.490). From the Chi-Square Table, χ28 (3.490) = 0.1.155
Thus for λ = 1.745, F(3) = 1 - 0.1 = 0.9.

One can verify this directly: F(3) = e-1.745 (1 + 1.745 + 1.7452 /2 + 1.7453 /6) = 0.900.
Inverse Gamma Distribution:156
By employing the change of variables y = 1/x, integrals involving e−1/x and powers of 1/x can be
written in terms of the Gamma function:
∞
∫ t - (α + 1) e - θ / t dt = Γ(α) θ−α.
0
The Inverse Gamma Distribution can be defined in terms of the Incomplete Gamma Function,
F(x) = 1 - Γ[α ; (θ/x)].
θα e - θ / x
The density of the Inverse Gamma is: α + 1 , for 0 < x < ∞.
x Γ[α]
A good way to remember the result for integrals from zero to infinity of powers of 1/x times
Exponentials of 1/x, is that the density of the Inverse Gamma Distribution must integrate to unity.
154
The 0.067 is via computer. From the table, one can tell that the survival function at 20 is between 10% and 5%.
155
I chose the numbers for this example, so that this distribution function value happens to appear in the table.
156
See “Mahlerʼs Guide to Loss Distributions,” and Appendix A of Loss Models.
Problems:
18.1 (1 point) What is the value of the integral from zero to infinity of: x5 e-8x?
A. less than 0.0004
E. at least 0.0007
18.2 (1 point) What is the density at x = 8 of the Gamma distribution with parameters
α = 3 and θ = 10?
A. less than 0.012
E. at least 0.015
∞
18.3 (1 point) Determine ∫ x- 6 e - 4 / x dx .
0
A. less than 0.02

E. at least 0.05
18.4 (2 points) What is the integral from 6.3 to 8.4 of x2 e-x / 2?

Hint: Use the Chi-Square table.
A. less than 0.01
E. at least 0.07
18.5 (2 points) What is the integral from 4 to 8 of: xe-x/5?

A. 7 B. 8 C. 9 D. 10 E. 11
Define the following distribution function in terms of the Incomplete Gamma Function:
F(x) = Γ[α ; ln(x)/θ], 1 < x.
18.6 (2 points) What is the probability density function corresponding this distribution function?
θα xα e - θ / x
A.
Γ[α]
ln[x]α -1
B. α 1+ 1/ θ
θ x Γ[α]
θα xα + 1 e - θ / x
C.
Γ[α]
ln[x]α
D. α 1+ 1/ θ
θ x Γ[α]
18.7 (2 points) What is the mean of this distribution?

A. θ/(α−1)
B. θ/α
C. θ(α−1)
D. θα
18.8 (3 points) If α = 5 and θ = 1/7, what is the 3rd moment of this distribution?
A. less than 12
E. at least 15
18.1. B. Γ(5+1) / 85+1 = 5! / 86 = 0.000458.
18.2. D. θ−α xα−1 e−x /θ / Γ(α) = (10-3) 82 e-0.8 / Γ(3) = 0.0144.
18.3. B. The density of the Inverse Gamma is: θα e−θ/x /{xα+1 Γ(α)}, 0 < x < ∞.
Since this density integrates to one, x−(α+1) e−θ/x integrates to θ−αΓ(α).
Thus taking α = 5 and θ = 4, x-6 e-4/x integrates to: 4-5 Γ(5) = 24 / 45 = 0.0234.
Comment: Alternately, one can make the change of variables y = 1/x.
18.4. C. The integrand is that of the Incomplete Gamma Function for α = 3:
xα−1e-x / Γ(α) = x2 e-x /2. Thus the integral is: Γ(3; 8.4) − Γ(3; 6.3).
Since the Chi-Square Distribution with ν degrees of freedom is a Gamma Distribution with shape
parameter of ν/2 and scale parameter of 2 : χ26 (x) = Γ(3 ; x/2).

Looking up the Chi-Square Distribution for 6 degrees of freedom, the distribution function is 99% at
16.8 and 95% at 12.6.
99% = Γ(3; 16.8/2) = Γ(3; 8.4), and 95% = Γ(3; 12.6/2) = Γ(3; 6.3).
Thus Γ(3; 8.4) - Γ(3; 6.3) = 0.99 - 0.95 = 0.04.
Comment: The particular integral can be done via repeated integration by parts.
One gets: -e-x {(x2 /2) + x + 1}. Evaluating at the limits of 8.4 and 6.3 gives the same result.
x
18.5. A. ∫0 t e- t / θ dt = θ2{1 - e-x/θ - (x/θ)e-x/θ}. Set θ = 5.
0.8 0.8 0.4
x = 0.8
∫0.4 t e- t / 5 dt = ∫0 t e- t / 5 dt - ∫0 t e- t / 5 dt = (52){1 - e- x / 5 - (x / 5)e - x / 5 }]
x = 0.4
=
(25){e-0.8 + (0.8)e-0.8 - e-1.6 - (1.6)e-1.6} = 7.10.

Comment: Can also be done using integration by parts or the increasing annuity technique.
18.6. B. Let y = ln(x)/θ. If y follows a Gamma Distribution with parameters α and 1, then x follows a
LogGamma Distribution with parameters α and θ. If y follows a Gamma Distribution with
parameters α and 1, then f(y) = yα−1 e−y / Γ(α). Then the density of x is given by:
f(y)(dy/dx) = {(ln(x)/θ)α−1 exp(- ln(x)/θ) / Γ(α)} /(xθ) = θ−α{ln(x)}α−1 / {x1+1/θ Γ(α)}.

Comment: This is called the LogGamma Distribution and bears the same relationship to the Gamma
Distribution as the LogNormal bears to the Normal Distribution.
Note that the support for the LogGamma is 1 to ∞, since when y = 0, x = exp(0θ) = 1.
∞ ∞
θ- α {ln(x)} α - 1
18.7. E. ∫1 xf(x)dx =
∫1 x1/ θ Γ(α)
dx .
Let y = ln(x)/θ, and thus x = exp(θ y), dx = exp(θy)θ dy, then the integral for the first moment is:
∞ ∞
θ- α {θy} α - 1 y α - 1 exp[-y(1- θ)]
∫0 exp(y) Γ(α)
exp(θy)θ dy = ∫0 Γ(α)
dy = (1- θ)−α .
18.8. E. The formula for the nth moment is derived as follows:

∞ ∞ ∞
θ - α {ln(x)} α - 1 θ - α {ln(x)} α - 1
∫1 xn f(x) dx =
∫1 xn 1+ 1/ θ
x Γ(α)
dx = ∫1 xn - 1- θ
Γ(α)
dx .
Let y = ln(x)/θ, and thus x = exp(θ y), dx = exp(θy)θ dy, then the integral for the nth moment is:
∞ ∞
θ - α {θy} α - 1 y α - 1 exp[-y(1- nθ)]
∫0 exp[(n- 1-1/ θ)yθ] Γ(α)
exp(θy)θ dy = ∫0 Γ(α)
dy = (1 - nθ)−α , nθ < 1.
Thus the 3rd moment with α = 5 and θ = 1/7 is: (1 - nθ)−α = (1 - 3/7)-5 = 16.41.
Comment: One could plug in n = 3 and the value of the parameters at any stage in the computation.
I have chosen to do so at the very end.
2016-C-1, Frequency Distributions, §19 Gamma-Poisson HCM 10/21/15, Page 369
Section 19, Gamma-Poisson Frequency Process157
The single most important specific example of mixing frequency distributions, is mixing Poisson
Frequency Distributions via a Gamma Distribution. Each insured in a portfolio is assumed to have a
Poisson distribution with mean λ. Across the portfolio, λ is assumed to be distributed via a Gamma
Distribution. Due to the mathematical properties of the Gamma and Poisson there are some specific
relationships. For example, as will be discussed, the mixed distribution is a Negative Binomial
Distribution.
Prior Distribution:
The number of claims a particular policyholder makes in a year is assumed to be Poisson with mean
λ. For example, the chance of having 6 claims is given by: λ6 e−λ / 6!
Assume the λ values of the portfolio of policyholders are Gamma distributed with α = 3 and
θ = 2/3, and therefore probability density function:158
f(λ) = 1.6875 λ2 e−1.5λ λ ≥ 0.
This prior Gamma Distribution of Poisson parameters is displayed below:
0.4
0.3
0.2
0.1
Poisson Parameter
1 2 3 4 5 6
157
Section 6.3 of Loss Models.
Additional aspects of the Gamma-Poisson are discussed in “Mahlerʼs Guide to Conjugate Priors.”
158
For the Gamma Distribution, f(x) = θ−αxα−1 e- x/θ/ Γ(α).
The Prior Distribution Function is given in terms of the Incomplete Gamma Function:
F(λ) = Γ(3; 1.5λ). So for example, the a priori chance that the µ value lies between
4 and 5 is: F(5) - F(4) = Γ(3; 7.5) - Γ(3; 6) = 0.9797 - 0.9380 = 0.0417.
Graphically, this is the area between 4 and 5 and under the prior Gamma.
Mixed Distribution:
If we have a risk and do not know what type it is, in order to get the chance of having 6 claims, one
would weight together the chances of having 6 claims, using the a priori probabilities and integrating
from zero to infinity:159
∞ ∞ 6 -λ ∞
λ 6 e- λ λ e
∫ 6! f(λ) dλ = ∫ 6! 1.6875 λ 2 e- 1.5λ dλ = 0.00234375
∫ λ8 e- 2.5λ dλ .
0 0 0
This integral can be written in terms of the (complete) Gamma function:

∞
∫ λα − 1 e- λ / θ dλ = Γ(α) θα.
0
∞
Thus ∫ λ8 e- 2.5λ dλ = Γ(9) 2.5-9 = (8!) (0.4)9 ≅ 10.57.
0
Thus the chance of having 6 claims ≅ (0.00234375) (10.57) ≅ 2.5%.
More generally, if the distribution of Poisson parameters λ is given by a Gamma distribution
f(λ) = θ−αλ α−1 e− λ/θ/ Γ(α), and we compute the chance of having n accidents by integrating from zero
to infinity:
∞ ∞ 6 -λ ∞
λ n e- λ λ e λ α − 1 e- λ / θ 1
∫ n! f(λ) dλ = ∫ 6! α
θ Γ(α)
dλ = α ∫
n! θ Γ(α) 0
λ n + α − 1 e- λ (1 + 1 / θ) dλ =
0 0
1 Γ(n + α) Γ(n+ α) θn α(α + 1)...(α + n -1) θn

= = .
n! θα Γ(α) (1 + 1/ θ)n + α Γ(α) n! θn + α (1 + 1/ θ)n + α n! (1 + θ)n + α
The mixed distribution is in the form of the Negative Binomial distribution with parameters
r = α and β = θ:
r(r +1)...(r + x - 1) βx
Probability of n accidents = .
x! (1+ β) x + r
159
Note the way both the Gamma and the Poisson have factors involving powers of λ and e−λ and these similar
factors combine in the product.
For the specific case dealt with previously: n = 6, α = 3 and θ = 2/3.

Therefore, the mixed Negative Binomial Distribution has parameters r = α = 3 and β = θ = 2/3.
(3)(4)(5)(6)(7)(8) (2 / 3)6
Thus the chance of having 6 claims is: = 2.477%.
6! (1 + 2 / 3)6 + 3
This is the same result as calculated above.
This mixed Negative Binomial Distribution is displayed below, through 10 claims:
0.25
0.2
0.15
0.1
0.05
0 1 2 3 4 5 6 7 8 9 10
On the exam, one should not go through the calculation above. Rather remember that the mixed
distribution is a Negative Binomial.
When Poissons are mixed via a Gamma Distribution,

the mixed distribution is always a Negative Binomial Distribution,
with r = α = shape parameter of the Gamma and
β = θ = scale parameter of the Gamma.
r goes with alpha, beta rhymes with theta.

Note that the overall (a priori) mean can be computed in either one of two ways.
First one can weight together the means for each type of risk, using the a priori probabilities.
This is E[λ] = the mean of the prior Gamma = αθ = 3(2/3) = 2.
Alternately, one can compute the mean of the mixed distribution: the mean of a Negative Binomial is
rβ = 3(2/3) = 2. Of course the two results match.
Exponential-Poisson:160
It is important to note that the Exponential distribution is a special case of the Gamma
distribution, for α = 1.
For the important special case α = 1, we have an Exponential distribution of λ: f(λ) = e−λ/θ/θ, λ ≥ 0.
The mixed distribution is a Negative Binomial Distribution with r = 1 and β = θ.
For the Exponential-Poisson, the mixed distribution is a Geometric Distribution with

β = θ.
Mixed Distribution for the Gamma-Poisson, When Observing Several Years of Data:
One can observe for a period of time longer than a year. If an insured has a Poisson parameter of λ
for each individual year, with λ the same for each year, and the years are independent, then for
example one has a Poisson parameter of 7λ for 7 years. The chances of such an insured having a
given number of claims over 7 years is given by a Poisson with parameter 7λ. For a portfolio of
insureds, each of its Poisson parameters is multiplied by 7. This is mathematically just like inflation.
If before their each being multiplied by 7, the Poisson parameters follow a Gamma distribution with
parameter α and θ, then after being multiplied by 7 they follow a Gamma with parameters α and
7θ.161 Thus the mixed distribution for 7 years of data is given by a Negative Binomial with
parameters r = α and β = 7θ.
160
See for example 3/11/01, Q.27.
161
Under uniform inflation, the scale parameter of the Gamma Distribution is multiplied by the inflation factor.
In general, if one observes a Gamma-Poisson situation for Y years, and each insuredʼs
Poisson parameter does not change over time, then the distribution of Poisson
parameters for Y years is given by a Gamma Distribution with parameters α and Yθ, and
the mixed distribution for Y years of data is given by a Negative Binomial Distribution,
with parameters r = α and β = Yθ.162
Exercise: Assume that the number of claims in a year for each insured has a Poisson Distribution with
mean λ. The distribution of λ over the portfolio of insureds is a Gamma Distribution with parameters
α = 3 and θ = 0.01.
What is the mean annual claim frequency for the portfolio of insureds?
[Solution: The mean annual claims frequency = mean of the (prior) Gamma = αθ = (3)(0.01) = 3%.]
mean λ. For each insured, λ does not change over time. For each insured, the numbers of claims in
one year is independent of the number of claims in another year. The distribution of λ over the
portfolio of insureds is a Gamma Distribution with parameters α = 3 and θ = 0.01.
An insured is picked at random and observed for 9 years.
What is the chance of observing exactly 4 claims from this insured?
[Solution: The mixed distribution for 9 years of data is given by a Negative Binomial Distribution with
parameters r = α = 3 and β = Yθ = (9)(0.01) = 0.09.
(4 + 3 -1)! 0.094
f(4) = = 0.054%.]
4! 2! (1 + 0.09)3 + 4
If Lois has a low expected annual claim frequency, for example 2%, then over 9 years she has a
Poisson Distribution with mean 18%. Her chance of having 4 claims during these nine years is:
0.184 e-0.18/ 24 = 0.004%.
If Hi has a very high expected annual claim frequency, for example 20%, then over 9 years he has a
Poisson Distribution with mean 180%. His chance of having 4 claims during these nine years is:
1.84 e-1.8/ 24 = 7.23%.
Drivers such as Lois with a low λ in one year are assumed to have the same low λ every year.
Such good drivers have an extremely small chance of having four claims in 9 years.
162
“Each insuredʼs Poisson parameter does not change over time.” If Alanʼs lambda is 4% this year, it is 4% next year,
and every year. Similarly, if Bonnieʼs lambda is 3% this year , then it is 3% every year.
Unless stated otherwise, on the exam assume lambda does not vary over time.
Drivers such as Hi with a very high λ in one year are assumed to have the same high λ every year.
Such drivers have a significant chance of having four claims in 9 years. It is such very bad drivers
which contribute significantly to the 0.054% probability of four claims in 9 years for an insured picked
at random.
This situation in which for a given insured λ is the same over time, contrasts with that in which λ
changes randomly each year.
mean λ. For each insured, λ changes each year at random; the λ in one year is independent of the λ
in another year.
The distribution of λ is a Gamma Distribution with parameters α = 3 and θ = 0.01.
An insured is picked at random and observed for 9 years.
What is the chance of observing exactly 4 claims from this insured?
[Solution: The mixed distribution for 1 year of data is given by a Negative Binomial Distribution with
parameters r = α = 3 and β = θ = 0.01. Over 9 years, we get a sum of 9 independent Negative
Binomials, with r = (9)(3) = 27 and β = 0.01.
(4 + 27 -1)! 0.014
f(4) = = 0.00020.]
4! 26! (1 + 0.01)27 + 4
This is different than the Gamma-Poisson process in which we assume that the lambda for an
individual insured is the same each year. For the Gamma-Poisson the β parameter is multiplied by
Y, while here the r parameter is multiplied by Y. This situation in which instead λ changes each year
is mathematically the same as if we assume an insured each year has a Negative Binomial
Distribution.
For example, assume an insured has a Negative Binomial with parameters r and β. Assume the
numbers of claims in one year is independent of the number of claims in another year. Then over Y
years, we add up Y independent identically distributed Negative Binomials; over Y years, the
frequency distribution for this insured is Negative Binomial with parameters Yr and β.
Exercise: Assume that the number of claims in a year for an insured has a Negative Binomial
Distribution with parameters r = 3 and β = 0.01. What is the mean annual claim frequency?
[Solution: rβ = (3)(0.01) = 3%.]
Exercise: Assume that the number of claims in a year for an insured has a Negative Binomial
Distribution with parameters r = 3 and β = 0.01. The numbers of claims in one year is independent
of the number of claims in another year. What is the chance of observing exactly 4 claims over 9
years from this insured?
[Solution: Over 9 years, the frequency distribution for this insured is Negative Binomial with
parameters r = (9)(3) = 27 and β = 0.01.
(4 + 27 -1)! 0.014
f(4) = = 0.00020.]
4! 26! (1 + 0.01)27 + 4
Even though both situations had a 3% mean annual claim frequency, the probability of observing 4
claims over 9 years was higher in the Gamma-Poisson situation with λ the same each year for a
given insured, than when we assumed λ changed each year or equivalently an insured had the
same Negative Binomial Distribution each year. In the Gamma-Poisson situation with λ the same
each year for a given insured, we were more likely to see extreme results such as 4 claims in 9
years, since there is a small probability of picking at random an insured with a high expected annual
claim frequency, such as Hi with λ = 20%.
Thinning a Negative Binomial Distribution:
Since the Gamma-Poisson is one source of the Negative Binomial Distribution, it can be used to aid
our understanding of the Negative Binomial Distribution.
For example, assume we have a Negative Binomial Distribution with r = 4 and β = 2.

We can think of that as resulting from a mixture of Poisson Distributions, with λ distributed via a
Gamma Distribution with α = 4 and θ = 2.163
Assume frequency and severity are independent, and that 30% of losses are “large.”
Then for each insured, his large losses are Poisson with mean 0.3λ. If λ is distributed via a Gamma
with α = 4 and θ = 2, then 0.3λ is distributed via a Gamma with α = 4 and θ = (0.3)(2) = 0.6.164
The large losses are a Gamma-Poisson Process, and therefore, across the whole portfolio, the
distribution of large losses is Negative Binomial, with r = 4 and β = 0.6.
163
While this may not be real world situation that the Negative Binomial is modeling, since the results are
mathematically identical, we can assume it is for the purpose of deriving general mathematical results.
164
When a variable is Gamma Distributed, then a constant times that variable is also Gamma Distributed, with the same
shape parameter, but with the scale parameter multiplied by that constant. See the discussion of uniform inflation in
”Mahlerʼs Guide to Loss Distributions.”
In this manner one can show, as has been discussed previously, that if losses are Negative
Binomial with parameters r and β, then if we take a fraction t of all the losses in a manner independent
of frequency, then these selected losses are Negative Binomial with parameters r and tβ.165
Returning to the example, the small losses for an individual insured are Poisson with mean 0.7λ.
Since λ is Gamma distributed, 0.7λ is distributed via a Gamma with α = 4 and θ = (0.7)(2) = 1.4.
Therefore, across the whole portfolio, the distribution of small losses is Negative Binomial, with r = 4
and β = 1.4.
Thus as in the Poisson situation, the overall process has been thinned into two similar processes.
However, unlike the Poisson case, these two Negative Binomials are not independent.
If for example, we observe a lot of large losses, such as 5, it is more likely that the observation
came from an insured with a large λ. This implies we are more likely to also have observed a higher
than average number of small losses. The number of large losses and the number of small losses
are positively correlated.166
Correlation of Number of Small and Large Losses, Negative Binomial:
Assume the number of losses follow a Negative Binomial Distribution with parameters r and β, and
that “large” losses are t of all the losses. As previously, assume each insured is Poisson with mean
λ, and λ is distributed via a Gamma with α = r and θ = β.
Then the number of large losses is a Gamma-Poisson with α = r and θ = tβ.

Posterior to observing L large losses, the distribution of the mean frequency for large losses is
Gamma with α = r + L and 1/θ = 1/(tβ) + 1 ⇒ θ = tβ/(1+ tβ).167
Since the mean frequency of large losses is t times the mean frequency, posterior to observing L
large losses, the distribution of the mean frequency is Gamma with α = r + L and θ = β/(1+ tβ).
Therefore, given we have observed L large losses, the small losses are
Gamma-Poisson with α = r + L, and θ = (1-t)β/(1+ tβ).
165
This can be derived via probability generating functions. See Example 8.8 in Loss Models.
166
In the case of thinning a Binomial, the number of large and small loses would be negatively correlated.
167
See “Mahlerʼs Guide to Conjugate Priors.”
One computes the correlation between the number of small losses, S, and the number of large
losses, L, as follows:
L (r + L) (1- t)β
E[LS] = EL [E[LS | L]] = EL [L E[S | L]] = EL [ ]=
1 + tβ
(1- t)β (1- t)β

{rEL [L] + EL [L2 ]} = {rrtβ + rtβ(1 + tβ) + (rtβ)2 } = (1-t)tβ2r(1+r).168
1 + tβ 1 + tβ
Cov[L, S] = E[LS] - E[L]E[S] = (1-t)tβ2r(1+r) - rtβr(1-t)β = β2rt(1-t).
β2 r t(1- t) 1
Corr[L, S] = = > 0.
r tβ(1+ tβ)r (1- t)β {1+(1- t)β } 1 1
(1 + ) (1 + )
tβ (1- t)β
For example, assume we have a Negative Binomial Distribution with r = 4 and β = 2.

Assume frequency and severity are independent, and that 30% of losses are “large.”
Then the number of large losses are Negative Binomial with r = 4 and β = 0.6, and the number of
small losses are Negative Binomial with r = 4 and β = 1.4.
The correlation of the number of large and small losses is:
1 1
= = 0.468.
1 1 (1 + 1/ 0.6) (1 + 1/ 1.4)
(1 + ) (1 + )
tβ (1- t)β
168
Large losses are Negative Binomial with parameters r and tβ. Thus, EL [L2 ] = Var[L] + E[L]2 = rtβ(1 + tβ) + (rtβ)2 .
Problems:
Use the following information to answer the next 2 questions:
The number of claims a particular insured makes in a year is Poisson with mean λ.
λ for a particular insured remains the same each year.
The values of the Poisson parameter λ (for annual claim frequency) for the insureds in a portfolio
follow a Gamma distribution, with parameters α = 3 and θ = 1/12.
19.1 (2 points) What is the chance that an insured picked at random from the portfolio will have no
claims over the next three years?
A. less than 35%
E. at least 50%
19.2 (2 points) What is the chance that an insured picked at random from the portfolio will have one
claim over the next three years?
A. less than 35%
E. at least 50%
19.3 (2 points) The distribution of the annual number of claims for an insured chosen at random is
modeled by the negative binomial distribution with mean 0.6 and variance 0.9.
The number of claims for each individual insured has a Poisson distribution and the means of these
Poisson distributions are gamma distributed over the population of insureds.
Calculate the variance of this gamma distribution.
(A) 0.20 (B) 0.25 (C) 0.30 (D) 0.35 (E) 0.40
19.4 (2 points) The number of claims a particular policyholder makes in a year has a
Poisson distribution with mean µ. The µ-values for policyholders follow a gamma distribution with
variance equal to 0.3. The resulting distribution of policyholders by number of claims is a Negative
Binomial with parameters r and β such that the variance is equal to 0.7.
What is the value of r(1+β)?
A. less than 0.90
E. at least 1.05
Assume that the number of claims for an individual insured is given by a Poisson distribution with
mean (annual) claim frequency λ and variance λ. Also assume that the parameter λ varies for the
different insureds, with λ following a Gamma distribution:
g(λ) = θ−α λ α−1 e−λ/θ / Γ(α), for 0 < λ < ∞, with mean αθ, and variance αθ2.
19.5 (2 points) An insured is picked at random and observed for one year.
What is the chance of observing 2 claims?
αθ2 α(α +1)θ2 α(α +1)θ2
A. B. C.
(1+ θ)α + 2 (1+ θ)α + 2 2 (1+ θ)α + 2
α2 (α + 1)θ2 α2 (α + 1) (α + 2) θ2
D. E.
6 (1+ θ)α + 2 6 (1+ θ)α + 2
19.6 (2 points) What is the unconditional mean frequency?

A. αθ B. (α-1)θ C. α(α-1)θ2 D. α(α-1)θ2 E. α(α-1)(α+1)θ2/2

A. αθ2 B. αθ + αθ2 C. αθ + α2θ2 D. α2θ2 E. α(α+1) θ

As he walks, Clumsy Klem loses coins at a Poisson rate. The Poisson rate, expressed in coins per
minute, is constant during any one day, but varies from day to day according to a gamma distribution
with mean 0.2 and variance 0.016.
The denominations of coins are randomly distributed: 50% of the coins are worth 5;
30% of the coins are worth 10; and 20% of the coins are worth 25.
19.8 (2 points) Calculate the probability that Clumsy Klem loses exactly one coin during the tenth
minute of todayʼs walk.
(A) 0.09 (B) 0.11 (C) 0.13 (D) 0.15 (E) 0.17
19.9 (3 points) Calculate the probability that Clumsy Klem loses exactly two coins during the first 10
minutes of todayʼs walk.
(A) 0.12 (B) 0.14 (C) 0.16 (D) 0.18 (E) 0.20
19.10 (4 points) Calculate the probability that the worth of the coins Clumsy Klem loses during his
one-hour walk today is greater than 300.
A. 1% B. 3% C. 5% D. 7% E. 9%
19.11 (2 points) Calculate the probability that the sum of the worth of the coins Clumsy Klem loses
during his one-hour walks each day for the next 5 days is greater than 900.
A. 1% B. 3% C. 5% D. 7% E. 9%
19.12 (2 points) During the first 10 minutes of todayʼs walk, what is the chance that Clumsy Klem
loses exactly one coin of worth 5, and possibly coins of other denominations?
A. 31% B. 33% C. 35% D. 37% E. 39%
19.13 (3 points) During the first 10 minutes of todayʼs walk, what is the chance that Clumsy Klem
loses exactly one coin of worth 5, and no coins of other denominations?
A. 11.6% B. 12.0% C. 12.4% D. 12.8% E. 13.2%
19.14 (3 points) Let A be the number of coins Clumsy Klem loses during the first minute of his walk
today. Let B be the number of coins Clumsy Klem loses during the first minute of his walk tomorrow.
What is the probability that A + B = 3?
A. 0.2% B. 0.4% C. 0.6% D. 0.8% E. 1.0%
19.15 (3 points) Let A be the number of coins Clumsy Klem loses during the first minute of his walk
today. Let B be the number of coins Clumsy Klem loses during the first minute of his walk tomorrow.
Let C be the number of coins Clumsy Klem loses during the first minute of his walk the day after
tomorrow. What is the probability that A + B + C = 2?
A. 8% B. 10% C. 12% D. 14% E. 16%
19.16 (2 points) For an insurance portfolio the distribution of the number of claims a particular
policyholder makes in a year is Poisson with mean λ.
The λ-values of the policyholders follow the Gamma distribution, with parameters α = 4,
and θ = 1/9.
The probability that a policyholder chosen at random will experience x claims is given by which of
the following?
(x + 3)!
A. 0.94 0.1x
x! 3!
(x + 3)!
B. 0.14 0.9x
x! 3!
(x + 8)!
C. 0.754 0.25x
x! 8!
(x + 8)!
D. 0.254 0.75x
x! 8!
19.17 (2 points) The number of claims a particular policyholder makes in a year has a Poisson
distribution with mean λ. The λ-values for policyholders follow a Gamma distribution.
This Gamma Distribution has a variance equal to one quarter that of the resulting Negative Binomial
distribution of policyholders by number of claims.
What is the value of the β parameter of this Negative Binomial Distribution?
A. 1/6 B. 1/5 C. 1/4 D. 1/3 E. Can not be determined
19.18 (1 point) Use the following information:

• The random variable representing the number of claims for a single policyholder follows
a Poisson distribution.
• For a portfolio of policyholders, the Poisson parameters follow a Gamma distribution
representing the heterogeneity of risks within that portfolio.
• The random variable representing the number of claims in a year of a policyholder,
chosen at random, follows a Negative Binomial distribution with parameters:
r = 4 and β = 3/17.
Determine the variance of the Gamma distribution.
(A) 0.110 (B) 0.115 (C) 0.120 (D) 0.125 (E) 0.130
19.19 (2 points) Tom will generate via simulation 100,000 values of the random variable X as
follows:
(i) He will generate the observed value λ from a distribution with density λe−λ/1.4 /1.96.
(ii) He then generates x from the Poisson distribution with mean λ.
(iii) He repeats the process 99,999 more times: first generating a value λ, then
generating x from the Poisson distribution with mean λ.
Calculate the expected number of Tomʼs 100,000 simulated values of X that are 6.
(A) 4200 (B) 4400 (C) 4600 (D) 4800 (E) 5000
19.20 (2 points) In the previous question, let V = the variance of a single simulated set of 100,000
values. What is the expected value of V?
A. 0 B. 2.8 C. 3.92 D. 5.6 E. 6.72
19.21 (2 points) Dick will generate via simulation 100,000 values of the random variable X as
follows:
(i) He will generate the observed value λ from a distribution with density λ e−λ/1.4 /1.96.
(ii) He will then generate 100,000 independent values from the Poisson distribution
with mean λ.
Calculate the expected number of Dickʼs 100,000 simulated values of X that are 6.
(A) 4200 (B) 4400 (C) 4600 (D) 4800 (E) 5000
19.22 (2 points) In the previous question, let V = the variance of a single simulated set of 100,000
A. 0 B. 2.8 C. 3.92 D. 5.6 E. 6.72
19.23 (1 point) Harry will generate via simulation 100,000 values of the random variable X as
follows:
(i) He will generate the observed value λ from a distribution with density
λ e−λ/1.4 /1.96.
(iii) He will then copy 99,999 times this value of x.
Calculate the expected number of Harryʼs 100,000 simulated values of X that are 6.
(A) 4200 (B) 4400 (C) 4600 (D) 4800 (E) 5000
19.24 (1 point) In the previous question, let V = the variance of a single simulated set of 100,000
A. 0 B. 2.8 C. 3.92 D. 5.6 E. 6.72

• The number of vehicles arriving at an amusement park per day is Poisson with mean λ.
• λ varies from day to day via a Gamma Distribution with α = 40 and θ = 10.
• The value of λ on one day is independent of the value of λ on another day.
• The number of people leaving each vehicle is:
1 + a Negative Binomial Distribution with r = 1.6 and β = 6.
• The amount of money spent at the amusement park by each person is
LogNormal with µ = 5 and σ = 0.8.
19.25 (1 point) What is the variance of the number of vehicles that will show up tomorrow at the
amusement park?
A. 4,000 B. 4,400 C. 4,800 D. 5,200 E. 5,600
19.26 (1 point) What is the variance of the number of vehicles that will show up over the next 7
days at the amusement park?
A. 25,000 B. 27,000 C. 29,000 D. 31,000 E. 33,000
19.27 (2 points) What is the variance of the number of people that will show up tomorrow at the
amusement park?
A. 480,000 B. 490,000 C. 500,000 D. 510,000 E. 520,000
19.28 (1 point) What is the variance of the number of people that will show up over the next 7 days
at the amusement park?
A. 2.8 million B. 3.0 million C. 3.2 million D. 3.4 million E. 3.6 million
19.29 (3 points) What is the standard deviation of the money spent tomorrow at the amusement
park?
A. 150,000 B. 160,000 C. 170,000 D. 180,000 E. 190,000
19.30 (1 point) What is the standard deviation of the money spent over the next 7 days at the
amusement park?
A. 360,000 B. 370,000 C. 380,000 D. 390,000 E. 400,000
19.31 (2 points) You simulate the amount of the money spent over the next 7 days at the
amusement park. You run this simulation a total of 1000 times.
How many runs do you expect in which less than 5 million is spent?
A. 1 B. 2 C. 3 D. 4 E. 5

• For each individual driver, the number of accidents in a year follows a Poisson Distribution.
• For each individual driver, the mean of their Poisson Distribution λ is the same each year.
• For each individual driver, the number of accidents each year is independent of other years.
• The number of accidents for different drivers are independent.
• λ varies between drivers via a Gamma Distribution with mean 0.08 and variance 0.0032.
• Moe, Larry, and Curly are each drivers.
19.32 (2 points) What is the probability that Moe has exactly one accident next year?
A. 6.9% B. 7.1% C. 7.3% D. 7.5% E. 7.7%
19.33 (2 points) What is the probability that Larry has exactly 2 accidents over the next 3 years?
A. 2.25% B. 2.50% C. 2.75% D. 3.00% E. 3.25%
19.34 (2 points) What is the probability that Moe, Larry, and Curly have a total of exactly 2
accidents during the next year?
A. 2.25% B. 2.50% C. 2.75% D. 3.00% E. 3.25%
19.35 (2 points) What is the probability that Moe, Larry, and Curly have a total of exactly 3
accidents during the next four years?
A. 5.2% B. 5.4% C. 5.6% D. 5.8% E. 6.0%
19.36 (3 points) What is the probability that Moe has no accidents next year, Larry has exactly one
accident over the next two years, and Curly has exactly two accidents over the next three years?
A. 0.3% B. 0.4% C. 0.5% D. 0.6% E. 0.7%
19.37 (9 points) Let M = the number of accidents Moe has next year.
Let L = the number of accidents Larry has over the next two years.
Let C = the number of accidents Curly has over the next three years.
Determine the probability that: M + L + C = 3.
A. 0.9% B. 1.1% C. 1.3% D. 1.5% E. 1.7%
Use the following information to answer the next 3 questions:

The number of claims a particular policyholder makes in a year is Poisson. The values of the Poisson
parameter (for annual claim frequency) for the individual policyholders in a portfolio of 10,000 follow a
Gamma distribution, with parameters α = 4 and θ = 0.1.
You observe this portfolio for one year and divide it into three groups based on how many claims
you observe for each policyholder: Group A: Those with no claims.
Group B: Those with one claim. Group C: Those with two or more claims.
19.38 (1 point) What is the expected size of Group A?

(A) 6200 (B) 6400 (C) 6600 (D) 6800 (E) 7000
19.39 (1 point) What is the expected size of Group B?

(A) 2400 (B) 2500 (C) 2600 (D) 2700 (E) 2800
19.40 (1 point) What is the expected size of Group C?

(A) 630 (B) 650 (C) 670 (D) 690 (E) 710
19.41 (3 points) The claims from a particular insured in a time period t are Poisson with mean λt.
The values of λ for the individual insureds in a portfolio follow a Gamma distribution,
with parameters α = 3 and θ = 0.02.
For an insured picked at random what is the average wait until the first claim?
A. 17 B. 19 C. 21 D. 23 E. 25

• Frequency for an individual is a 80-20 mixture of two Poissons with means λ and 3λ.
• The distribution of λ is Exponential with a mean of 0.1.
For an insured picked at random, what is the probability of seeing two claims?
A. 1.2% B. 1.3% C. 1.4% D. 1.5% E. 1.6%
19.43 (2 points) Claim frequency follows a Poisson distribution with parameter λ.
λ is distributed according to: g(λ) = 25 λ e-5λ.

Determine the probability that there will be at least 2 claims during the next year.
A. 5% B. 7% C. 9% D. 11% E. 13%

• 60% of claims are small.
• 40% of claims are large.
• The annual number of claims from a particular insured is Poisson with mean λ.
• λ is distributed across a group of insureds via a Gamma with α = 2 and θ = 0.5.
• You pick an insured at random and observe for one year.
19.44 (2 points) What is the variance of the number of small claims?

A. 0.78 B. 0.80 C. 0.82 D. 0.84 E. 0.86
19.45 (2 points) What is the variance of the number of large claims?

A. 0.40 B. 0.42 C. 0.44 D. 0.46 E. 0.48
19.46 (CAS9, 11/94, Q.7) (1 point)

For a group of insureds, each insured has a frequency which is Poisson. There are different
assumptions for the distribution across this group of the probability of having an accident.
1. If the distribution of the probability of having an accident is constant,
then the distribution of the risks by number of accidents is Poisson.
2. If the distribution of the probability of having an accident is Gamma,
then the distribution of risks by number of accidents is Negative Binomial.
3. If the distribution of the probability of having an accident is Poisson,
then the distribution of risks by number of accidents is Negative Binomial.
A. 2 only B. 3 only C. 1 and 2 D. 1 and 3 E. 1, 2, and 3
• The number of claims for a single policyholder follows a Poisson distribution with mean λ .
• λ follows a gamma distribution.
• The number of claims for a policyholder chosen at random follows a distribution
with mean 0.10 and variance 0.15.
Determine the variance of the gamma distribution.
A. 0.05 B. 0.10 C. 0.15 D. 0.25 E. 0.30
• The probability that a single insured will produce 0 claims during the next
exposure period is e−λ.
• λ varies by insured and follows a distribution with density function
f(λ) = 36λe-6λ, 0 < λ < ∞.

Determine the probability that a randomly selected insured will produce 0 claims during the next
exposure period.
A. Less than 0.72
E. At least 0.87
19.49 (Course 3 Sample Exam, Q.12) The annual number of accidents for an individual driver
has a Poisson distribution with mean λ. The Poisson means, λ, of a heterogeneous population of
drivers have a gamma distribution with mean 0.1 and variance 0.01.
Calculate the probability that a driver selected at random from the population will have 2 or more
accidents in one year.
A. 1/121 B. 1/110 C. 1/100 D. 1/90 E. 1/81
19.50 (3, 5/00, Q.4) (2.5 points) You are given:

(i) The claim count N has a Poisson distribution with mean Λ .
(ii) Λ has a gamma distribution with mean 1 and variance 2.
Calculate the probability that N = 1.
(A) 0.19 (B) 0.24 (C) 0.31 (D) 0.34 (E) 0.37
19.51 (3, 5/01, Q.3 & 2009 Sample Q.104) (2.5 points) Glen is practicing his simulation skills.
He generates 1000 values of the random variable X as follows:
(i) He generates the observed value λ from the gamma distribution with α = 2 and
θ = 1 (hence with mean 2 and variance 2).
(iii) He repeats the process 999 more times: first generating a value λ, then
generating x from the Poisson distribution with mean λ.
(iv) The repetitions are mutually independent.
Calculate the expected number of times that his simulated value of X is 3.
(A) 75 (B) 100 (C) 125 (D) 150 (E) 175
19.52 (3, 5/01, Q.15 & 2009 Sample Q.105) (2.5 points) An actuary for an automobile insurance
company determines that the distribution of the annual number of claims for an insured chosen at
random is modeled by the negative binomial distribution with mean 0.2 and variance 0.4.
The number of claims for each individual insured has a Poisson distribution and the means of these
Poisson distributions are gamma distributed over the population of insureds.
Calculate the variance of this gamma distribution.
(A) 0.20 (B) 0.25 (C) 0.30 (D) 0.35 (E) 0.40
19.53 (3, 11/01, Q.27) (2.5 points) On his walk to work, Lucky Tom finds coins on the ground at a
Poisson rate. The Poisson rate, expressed in coins per minute, is constant during any one day, but
varies from day to day according to a gamma distribution with mean 2 and variance 4.
Calculate the probability that Lucky Tom finds exactly one coin during the sixth minute of todayʼs
walk.
(A) 0.22 (B) 0.24 (C) 0.26 (D) 0.28 (E) 0.30
19.54 (2 points) In 3, 11/01, Q.27, calculate the probability that Lucky Tom finds exactly one coin
during the first two minutes of todayʼs walk.
(A) 0.12 (B) 0.14 (C) 0.16 (D) 0.18 (E) 0.20
19.55 (3 points) In 3, 11/01, Q.27, let A = the number of coins that Lucky Tom finds during the first
minute of todayʼs walk. Let B = the number of coins that Lucky Tom finds during the first minute of
tomorrowʼs walk. Calculate Prob[A + B = 1].
(A) 0.09 (B) 0.11 (C) 0.13 (D) 0.15 (E) 0.17
during the third minute of todayʼs walk and exactly one coin during the fifth minute of todayʼs walk.
A. Less than 4.5%
E. At least 6.0%
during the first minute of todayʼs walk, exactly two coins during the second minute of todayʼs walk,
and exactly three coins during the third minute of todayʼs walk.
A. Less than 0.2%
E. At least 0.5%
during the first minute of todayʼs walk and exactly one coin during the fifth minute of tomorrowʼs walk.
(A) 0.05 (B) 0.06 (C) 0.07 (D) 0.08 (E) 0.09
during the first three minutes of todayʼs walk and exactly one coin during the first three minutes of
tomorrowʼs walk.
(A) 0.005 (B) 0.010 (C) 0.015 (D) 0.020 (E) 0.025
19.60 (3, 11/02, Q.5 & 2009 Sample Q.90) (2.5 points) Actuaries have modeled auto
windshield claim frequencies. They have concluded that the number of windshield claims filed per
year per driver follows the Poisson distribution with parameter λ, where λ follows the gamma
distribution with mean 3 and variance 3.
Calculate the probability that a driver selected at random will file no more than 1 windshield claim
next year.
(A) 0.15 (B) 0.19 (C) 0.20 (D) 0.24 (E) 0.31
19.61 (CAS3, 11/03, Q.15) (2.5 points)

Two actuaries are simulating the number of automobile claims for a book of business.
For the population that they are studying:
i) The claim frequency for each individual driver has a Poisson distribution.
ii) The means of the Poisson distributions are distributed as a random variable, Λ.
iii) Λ has a gamma distribution.
In the first actuary's simulation, a driver is selected and one year's experience is generated. This
process of selecting a driver and simulating one year is repeated N times.
In the second actuary's simulation, a driver is selected and N years of experience are generated for
that driver.
Which of the following is/are true?
I. The ratio of the number of claims the first actuary simulates to the number of claims the
second actuary simulates should tend towards 1 as N tends to infinity.
II. The ratio of the number of claims the first actuary simulates to the number of claims the
second actuary simulates will equal 1, provided that the same uniform random
numbers are used.
Ill. When the variances of the two sequences of claim counts are compared the first actuary's
sequence will have a smaller variance because more random numbers are used in
computing it.
A. I only B. I and II only C. I and Ill only D. II and Ill only E. None of I, II, or Ill is true
19.62 (CAS3, 5/05, Q.10) (2.5 points) Low Risk Insurance Company provides liability coverage
to a population of 1,000 private passenger automobile drivers.
The number of claims during a given year from this population is Poisson distributed.
If a driver is selected at random from this population, his expected number of claims per year is a
random variable with a Gamma distribution such that α = 2 and θ = 1.
Calculate the probability that a driver selected at random will not have a claim during the year.
A. 11.1% B. 13.5% C. 25.0% D. 33.3% E. 50.0%
19.63 (2 points) In CAS3, 5/05, Q.10, what is the probability that at most 265 of these 1000
drivers will not have a claim during the year?
A. 75% B. 78% C. 81% D. 84% E. 87%
19.64 (2 points) In CAS3, 5/05, Q.10, what is the probability that these 1000 drivers will have a
total of more than 2020 claims during the year?
A. 31% B. 33% C. 35% D. 37% E. 39%
19.65 (4 points) In CAS3, 5/05, Q.10, let A be the number of these 1000 drivers that have one
claim during the year and B be the number of these 1000 drivers that have two claims during
the year. Determine the correlation of A and B.
A. -0.32 B. -0.30 C. -0.28 D. -0.26 E. -0.24
19.1. E. The Poisson parameters over three years are three times those on an annual basis.
Therefore they are given by a Gamma distribution with α = 3 and θ = 3/12 = 1/4.
(The mean frequency is now 3/4 per three years rather than 3/12 = 1/4 on an annual basis. It might
be helpful to recall that θ is the scale parameter for the Gamma Distribution.)
The mixed distribution is a Negative Binomial, with parameters r = α = 3 and β = θ = 1/4.
f(0) = 1/(1+β)r = 1/1.253 = 0.512.
Comment: Over one year, the mixed distribution is Negative Binomial, with parameters r = α = 3
and β = θ = 1/12. Thus for a driver picked at random, the probability of no claims next year is:
1/(1 + 1/12)3 = 0.7865. Then one might be tempted to think that the probability of no claims over
the next three years for a driver picked at random is: 0.78653 = 0.4865.
However, drivers with a low λ in one year are assumed to have the same low λ every year.
Such good drivers have a large chance of having 0 claims in 3 years.
Drivers with a high λ in one year are assumed to have the same high λ every year.
Such drivers have a smaller chance of having 0 claims in 3 years.
As discussed in “Mahlerʼs Guide to Conjugate Priors,” a driver who has no claims the first year, has a
posterior distribution of lambda that is Gamma, but with α = 3 + 0 = 3, and 1/θ = 12 + 1 = 13.
Therefore for a driver with no claims in year one, the mixed distribution in year two is Negative
Binomial with parameters r = α = 3 and β = θ = 1/13. Thus for a driver with no claims in year one, the
probability of no claims in year two is: 1/(1 + 1/13)3 = 0.8007.
A driver who has no claims the first two years, has a posterior distribution of lambda that is Gamma,
but with α = 3 + 0 = 3, and 1/θ = 12 + 2 = 14.
Therefore for a driver with no claims in the first two years, the mixed distribution in year two is
Negative Binomial with parameters r = α = 3 and β = θ = 1/14. Thus for a driver with no claims in year
one, the probability of no claims in year two is: 1/(1 + 1/14)3 = 0.8130.
Prob[0 claims in three years] = (0.7865)(0.8007)(0.8130) = 0.512 ≠ 0.4865.
19.2. A. From the previous solution, f(1) = rβ/(1+β)r+1 = (3)(1/4)/1.254 = 0.3072.

19.3. C. The mean of the Negative Binomial is rβ = .6, while the variance is rβ(1+β) = .9.
Therefore, 1 + β = 0.9/0.6 = 1.5, and β = 0.5. Therefore r = 1.2.
For a Gamma-Poisson, α = r = 1.2 and θ = β = 0.5.
Therefore, the variance of the Gamma Distribution is: αθ2 = (1.2)(.52 ) = 0.3.
19.4. B. For the Gamma-Poisson, the variance of the mixed Negative Binomial is equal to:
mean of the Gamma + variance of the Gamma. Thus mean of Gamma + 0.3 = 0.7. Therefore,
mean of Gamma = 0.4 = αθ. Variance of Gamma = 0.3 = αθ2 . Therefore, θ = 0.3 / 0.4 = 3/4.
α = 0.4/θ = 0.5332. r = α = 0.5332 and β = θ = 3/4. r(1+β) = 0.5332 (7/4) = 0.933.
19.5. C. The conditional chance of 2 claims given λ is e−λ λ 2 /2. The unconditional chance can be
obtained by integrating the conditional chances versus the distribution of λ:
∞ ∞ ∞
α -1 -λ / θ
e- λ λ2 λ
∫0 f(2 | λ) g(λ) dλ = ∫0 ∫0 λα + 1 e - (1 + 1/ θ) / λ dλ =
e 1
f(2) = α dλ =
2 Γ(α) θ 2 Γ(α) θα
1 Γ(α + 2) α (α + 1)θ2
= .
2 Γ(α) θα (1 + 1/ θ)α + 2 2(1+ θ)α + 2
Comment: The mixed distribution is a Negative Binomial with r = α and β = θ.

r(r +1) β2 α(α +1)θ2
f(2) = = .
2 (1+ β) r + 2 2(1+ θ)α + 2
19.6. A. The conditional mean given λ is: λ. The unconditional mean can be obtained by
integrating the conditional means versus the distribution of λ:
∞ ∞ ∞
λα - 1 e- λ / θ
∫0 E[X | λ] g(λ) dλ = ∫0 ∫0 λα e - λ / θ dλ = Γ(α) θα Γ(α+1) θα+1
1 1
E[X] = λ α dλ =
Γ(α) θ Γ(α) θα
= αθ.
∞ ∞
Alternately, E[X] = ∫0 E[X | λ] g(λ) dλ = ∫0 λ g(λ) dλ = Mean of the Gamma Distribution = αθ .
19.7. B. The conditional mean given λ is: λ. The conditional variance given λ is: λ. Thus the
conditional second moment given λ is: λ + λ2. The unconditional second moment can be obtained
by integrating the conditional second moments versus the distribution of λ:
∞ ∞
λα - 1 e - λ / θ
E[X2 ] = ∫0 E[X2 | λ ] g(λ) dλ = ∫0 (λ + λ2 )
Γ(α) θ α
dλ =
∞ ∞
Γ(α +1) Γ(α + 2)
∫0 ∫0 λα + 1 e- λ / θ dλ = Γ(α) θα
1 α 1 1 1
λ e - λ / θ dλ + + =
Γ(α) θα Γ(α) θα θ α + 1 Γ(α) θα
θα + 2
= αθ + α(α+1)θ2. Since the mean is αθ, the variance is: αθ + α(α+1)θ2 - α2 θ2 = αθ + αθ2.
Comment: Note that one integrates the conditional second moments in order to obtain the
unconditional second moment. If instead one integrated the conditional variance one would obtain the
Expected Value of the Process Variance, (in this case αθ), which is only one piece of the total
unconditional variance. One would need to also add the Variance of the Hypothetical Means,
(which in this case is αθ2), in order to obtain the total variance of αθ + αθ2. The mixed distribution is a
Negative Binomial with r = α and β = θ. It has variance: rβ(1+β)= αθ + αθ2.
19.8. D. For the Gamma, mean = αθ = 0.2, and variance = αθ2 = 0.016.
Thus θ = 0.016/0.2 = 0.08 and α = 2.5.
This is a Gamma-Poisson, with mixed distribution a Negative Binomial, with r = α = 2.5 and
β = θ = 0.08. f(1) = rβ/(1+β)1+r = (2.5)(0.08) / (1 + 0.08)3.5 = 0.153.
19.9. E. Over 10 minutes, the rate of loss is Poisson, with 10 times that for one minute.
λ has a Gamma distribution with α = 2.5 and θ = 0.08 ⇒
10λ has a Gamma distribution with α = 2.5, and θ = (10)(.08) = 0.8.
The mixed distribution is a Negative Binomial, with r = α = 2.5 and β = θ = 0.8.
f(2) = {r(r+1)/2} β2 /(1+β)2+r = {(2.5)(3.5)/2}0.82 /(1.8)4.5 = 0.199.
19.10. B. Mean value of a coin is: (50%)(5) + (30%)(10) + (20%)(25) = 10.5.

2nd moment of the value of a coin is: (50%)(52 ) + (30%)(102 ) + (20%)(252 ) = 167.5.
Over 60 minutes, the rate of loss is Poisson, with 60 times that for one minute.
λ has a Gamma distribution with α = 2.5 and θ = .08 ⇒
60λ has a Gamma distribution with α = 2.5 and θ = (60)(.08) = 4.8.
Therefore, the mean number of coins: rβ = (2.5)(4.8) = 12,
and the variance of number of coins: rβ(1+β) = (2.5)(4.8)(5.8) = 69.6.
The mean worth is: (10.5)(12) = 126.
Variance of worth is:
(mean frequency)(variance of severity) + (mean severity)2 (variance of frequency) =
12)(167.5 - 10.52 ) + (10.52 )(69.6) = 8360.4.
Prob[worth > 300] ≅ 1 - Φ[(300.5 - 126)/ 8360.4 ] = 1 - Φ[1.91] = 2.81%.
Klem loses money in units of 5 cents or more.
Therefore, if he loses more than 300, he loses 305 or more.
Thus it might be better to approximate the probability as:
1 - Φ((304.5 - 126)/ 8360.4 ) = 1 - Φ[1.95] = 2.56%.
Along this same line of thinking, one could instead approximate the probability by taking the
probability from 302.5 to infinity: 1 - Φ[(302.5 - 126)/ 8360.4 ] = 1 - Φ[1.93] = 2.68%.
Comment: The formula used is for the variance of aggregate losses, which is covered in section 5 of
19.11. E. From the previous solution, for a day chosen at random, the worth has mean 126 and
variance 8360.4. The worth over five days is the sum of 5 independent variables; the sum of 5
days has mean: (5)(126) = 630 and variance: (5)(8360.4) = 41,802.
Prob[worth > 900] ≅ 1 - Φ[(900.5 - 630)/ 41,802 ] = 1 - Φ[1.32] = 9.34%.
Klem loses money in units of 5 cents or more.
Therefore, if he loses more than 900, he loses 905 or more.
It might be better to approximate the probability as:
Prob[worth > 900] = Prob[worth ≥ 905] ≅ 1 - Φ[(904.5 - 630)/ 41,802 ] = 1 - Φ[1.34] = 9.01%.
One might have instead approximated as: 1 - Φ[(902.5 - 630)/ 41,802 ] = 1 - Φ[1.33] = 9.18%.
19.12. A. 50% of the coins are worth 5, so if the overall process is Poisson with mean λ, then
losing coins of worth 5 is Poisson with mean 0.5λ.
Over 10 minutes it is Poisson with mean 5λ.
λ has a Gamma distribution with α = 2.5 and θ = 0.08 ⇒

5λ has a Gamma distribution with α = 2.5 and θ = (5)(.08) = 0.4.
f(1) = rβ/(1 + β)r+1 = (2.5)(0.4)/(1.43.5) = 30.8%.
19.13. D. Losing coins of worth 5, 10, and 25 are three independent Poisson Processes.
Over 10 minutes losing coins of worth 5 is Poisson with mean 5λ.
Prob[1 coin @ 5]Prob[0 coins @ 10]Prob[0 coins @ 25] = 5λe−5λe−3λe−2λ = 5λe−10λ.

λ has a Gamma distribution with α = 2.5 and θ = 0.08. 1/θ = 12.5.
⇒ f(λ) = 12.52.5λ1.5e−12.5λ / Γ(2.5).
∞ ∞ ∞
(5)12.52.5
∫0 5λ e- 10λ f(λ) dλ = ∫0 5λ e- 10λ 12.52.5 λ1.5 e - 12.5λ / Γ(2.5) dλ =
Γ(2.5) ∫0 λ2.5 e- 22.5λ dλ =
(5)12.52.5 Γ(3.5)
= (5)(2.5)12.52.5 / 22.53.5 = 12.8%.
Γ(2.5) 22.53.5
Comment: While given lambda, each Poisson Process is independent, the mixed Negative
Binomials are not independent, since each day we use the same lambda (appropriately thinned) for
each denomination of coin.
From the previous solution, the probability of one coin worth 5 is 30.80%.
The distribution of coins worth ten is Negative Binomial with r = 2.5 and β = (3)(0.08) = 0.24.
Therefore, the chance of seeing no coins worth 10 is: 1/1.242.5 = 58.40%.
The distribution of coins worth 25 is Negative Binomial with r = 2.5 and β = (2)(0.08) = 0.16.
Therefore, the chance of seeing no coins worth 25 is: 1/1.162.5 = 69.0%.
However, (30.80%)(58.40%)(69.00%) = 12.4% ≠ 12.8%, the correct solution.
One can not multiply the three probabilities together, because the three events are not
independent. The three probabilities each depend on the same lambda value for the given day.
19.14. E. A is Poisson with mean λA, where λA is a random draw from a Gamma Distribution with
α = 2.5 and θ = 0.08. B is Poisson with mean λB, where λB is a random draw from a Gamma
Distribution with α = 2.5 and θ = 0.08. Since A and B are from walks on different days, λA and λB
are independent random draws from the same Gamma.
Thus λA + λB is from a Gamma Distribution with α = 2.5 + 2.5 = 5 and θ = 0.08.
Thus A + B is from a Negative Binomial Distribution with r = 5 and β = .08.

The density at 3 of this Negative Binomial Distribution is: {(5)(6)(7)/3!}.083 /1.088 = 0.97%.
Alternately, A and B are independent Negative Binomials each with r = 2.5 and β = .08.
Thus A + B is a Negative Binomial Distribution with r = 5 and β = 0.08. Proceed as before.
Alternately, for A and B the densities for each are:
f(0) = 1/(1+β)r = 1/1.082.5 = .825, f(1) = rβ/(1+β)1+r = (2.5).08/1.083.5 = .153,
f(2) = {r(r+1)/2} β2 /(1+β)2+r = {(2.5)(3.5)/2}.082 /1.084.5 = .0198,
f(3) = {r(r+1)(r+2)/3!} β3 /(1+β)3+r = {(2.5)(3.5)(4.5)/6}0.083 /1.085.5 = .00220.
Prob[A + B = 3] =
Prob[A=0]Prob[B=3] + Prob[A=1]Prob[B=2] + Prob[A=2]Prob[B=1] + Prob[A=3]Prob[B=0] =
(0.825)(0.00220) + (0.153)(0.0198) + (0.0198)(0.153) + (0.00220)(0.825) = 0.97%.
Comment: For two independent Gamma Distributions with the same θ:
Gamma(α1, θ) + Gamma(α2, θ) = Gamma(α1 + α2, θ).
19.15. B. λA + λB + λC is from a Gamma Distribution with α = (3)(2.5) = 7.5 and θ = .08.

Thus A + B + C is from a Negative Binomial Distribution with r = 7.5 and β = .08.
The density at 2 of this Negative Binomial Distribution is: {(7.5)(8.5)/2!}.082 /1.089.5 = 9.8%.
19.16. A. Mixing a Poisson via a Gamma leads to a negative binomial overall frequency
distribution. The negative binomial has parameters r = α = 4 and β = θ = 1/9.
f(x) = {r(r+1)...(r+x-1)/x!} β x / (1+β)x+r = {(4)(5) ... (x + 4)/x!} (1/9)x / (10/9)x+4 =
{(x+3)! / (x! 3!)} 0.94 0.1x .
19.17. D. For the Gamma-Poisson, β = θ and r = α. Therefore, the variance of the Gamma = αθ2
= rβ2 . Total Variance = Variance of the mixed Negative Binomial = rβ(1+β). Thus for the Gamma-
Poisson we have: (Var. of the Gamma)/(Var. of the Negative Binomial) = β/(1+β)
= 1/{1 + 1/β}. Thus in this case 1/{1 + 1/β} = 0.25. ⇒ β = 1/3.

19.18. D. The parameters of the Gamma can be gotten from those of the Negative Binomial,
α = r = 4, θ = β = 3/17. Then the Variance of the Gamma = αθ2 = 0.125.
Alternately, the variance of the Gamma is the Variance of the Hypothetical Means =
Total Variance - Expected Value of the Process Variance =
Variance of the Negative Binomial - Mean of the Gamma =
Variance of the Negative Binomial - Mean of Negative Binomial =
rβ(1+β) - rβ = rβ2 = (4)(3/17)2 = 0.125.
19.19. D. This is a Gamma-Poisson with α = 2 and θ = 1.4. The mixed distribution is Negative
Binomial with r = α = 2, and β = θ = 1.4.
For a Negative Binomial Distribution, f(6) = {(r)(r+1)(r+2)(r+3)(r+4)(r+5)/6!}β6/(1+β)r+6 =

{(2)(3)(4)(5)(6)(7)/720}(1.46 )/(2.48 ) = 0.04788.
Thus we expect (100,000)(0.04788) = 4788 out of 100,000 simulated values to be 6.
Comment: Similar to 3, 5/01, Q.3. One need know nothing about simulation, in order to answer
these questions.
19.20. E. Each year is a random draw from a different Poisson with unknown λ.
The simulated set consists of random draws each from different Poisson Distributions.
Thus each simulated set is a mixed distribution for a Gamma-Poisson, a Negative Binomial
Distributions with r = α = 2, and β = θ = 1.4.
E[V] = variance of this Negative Binomial = (2)(1.4)(1 + 1.4) = 6.72.
Alternately, Expected Value of the Process Variance is:
E[P.V. | λ] = E[λ] = αθ = (2)(1.4) = 2.8.
Variance of the Hypothetical Means is: Var[Mean | λ] = Var[λ] = αθ2 = (2)(1.42 ) = 3.92.
Total Variance is: EPV + VHM = 2.8 + 3.92 = 6.72.
Comment: Difficult! In other words, Var[X] = E[Var[X|Y] + Var[E[X|Y]].
19.21. D. This is a Gamma-Poisson with α = 2 and θ = 1.4. The mixed distribution is Negative
Binomial with r = α = 2, and β = θ = 1.4.
For this Negative Binomial Distribution, 100,000f(6) = 4788.
19.22. B. Each year is a random draw from the same Poisson with unknown λ.
The simulated set is from this Poisson Distribution with mean λ. V = λ.
E[V] = E[λ] = mean of the Gamma = αθ = (2)(1.4) = 2.8.
Comment: Difficult! What Tom did was simulate one year each from 100,000 randomly selected
insureds. What Dick did was pick a random insured and simulate 100,000 years for that insured; each
year is an independent random draw from the same Poisson distribution with unknown λ. The two
situations are different, even though they have the same mean. In Dickʼs case there is no variance
associated with the selection of the parameter lambda; the only variance is associated with the
variance of the Poisson Distribution. In Tomʼs case there is variance associated with the selection of
the parameter lambda as well as variance is associated with the variance of the Poisson Distribution.
19.23. D. This is a Gamma-Poisson with α = 2 and θ = 1.4.

The mixed distribution is Negative Binomial with r = α = 2, and β = θ = 1.4.
For this Negative Binomial Distribution, 100000f(6) = 4788.
19.24. A. Since all 100,000 values in the simulated set are the same, V = 0. E[V] = 0.
Comment: Contrast Tom, Dick, and Harryʼs simulations. Even though they all have the same mean,
they are simulating somewhat different situations.
19.25. B. The number of vehicles is Negative Binomial with r = α = 40 and β = θ = 10.

It has variance: rβ(1 + β) = (40)(10)(11) = 4400.
19.26. D. This is the sum of 7 independent variables, each with variance 4400.
(7)(4400) = 30,800.
Comment: Although λ is constant on any given day, it varies from day to day. A day picked at
random is a Negative Binomial with r = 40 and β = 10. The sum of seven independent Negative
Binomials is a Negative Binomial with r = (7)(40) = 280 and β = 10.
This has variance: (280)(10)(11) = 30,800.
If instead λ had been the same for a whole week, the answer would have changed.
In that case, one would get a Negative Binomial with r = 40 and β = (7)(10) = 70, with variance:
(40)(70)(71) = 198,800.
19.27. E. The mean number of people per vehicle is: 1 + (1.6)(6) = 10.6.
The variance of the people per vehicle is: (1.6)(6)(1 + 6) = 67.2.
Variance of the number of people is: (400)(67.2) + (10.62 )(4400) = 521,264.
19.28. E. This is the sum of 7 independent variables. (7)(521,264) = 3,648,848.
19.29. A. The number of people has mean: (400)(10.6) = 4240, and variance: 521,264.
The LogNormal has mean: exp[5 + 0.82 /2] = 204.38, second moment:
exp[(2)(5) + (2)(0.82 )] = 79,221, and variance: 79221 - 204.382 = 37,450.
Variance of the money spent: (4240)(37450) + (204.382 )(521264) = 21,933 million.
21,933 million = 148,098.
19.30. D. This is the sum of 7 independent variables, with variance:

(7)(21,933 million) = 153,531 million. 153,531 million = 391,830.
19.31. C. The mean amount spent per day is: (4240)(204.38) = 866,571.
Over 7 days the mean amount spent is: (7)(866,571) = 6,065,997, with variance 153,531 million.
Prob[amount spent < 5 million] ≅ Φ[(5 million - 6.0660 million)/ 153,531 million ] = Φ(-2.72) = .33%.
So we expect: (1000)(0.33%) = 3 such runs.
19.32. B. For the Gamma, mean = αθ = 0.08, and variance = αθ2 = 0.0032.
Thus θ = 0.04 and α = 2.
This is a Gamma-Poisson, with mixed distribution a Negative Binomial:
with r = α = 2 and β = θ = 0.04.
rβ (2)(0.04)
f(1) = r + 1 = = 7.11%.
(1 + β) (1 + 0.04 )3
Comment: The fact that it is the next year rather than some other year is irrelevant.
19.33. C. For one year, each insureds mean is λ, and is distributed via a Gamma with:
θ = 0.04 and α = 2.
Over three years, each insureds mean is 3λ, and is distributed via a Gamma with:
θ = (3)(0.04) = 0.12, and α = 2.
with r = α = 2 and β = θ = 0.12.
2
r (r + 1) β (2) (3) 0.122
f(2) = = = 2.75%.
2 (1 + β)r + 2 2 (1 + 0.12 )4
19.34. B. For one year, each insureds mean is λ, and is distributed via a Gamma with:
θ = 0.04 and α = 2.
with r = α = 2 and β = θ = 0.04.
We add up three individual independent drivers and we get a Negative Binomial with:
r = 2 + 2 + 2 = 6, and β = 0.04.
2
r (r + 1) β (6) (7) 0.042
f(2) = = = 2.46%.
2 (1 + β)r + 2 2 (1 + 0.04 )8
Comment: The Negative Binomial Distributions here and in the previous solution have the same
mean, however the densities are not the same. Here is a graph of the ratios of the densities of the
Negative Binomial in the previous solution and those of the Negative Binomial here:
ratio
3.0
2.0
1.5
1.0
n
0 1 2 3 4 5
19.35. E. For one year, each insureds mean is λ, and is distributed via a Gamma with:
θ = 0.04 and α = 2.
Over four years, each insureds mean is 4λ, and is distributed via a Gamma with:
θ = (4)(0.04) = 0.16, and α = 2.
with r = α = 2 and β = θ = 0.16.
We add up three individual independent drivers and we get a Negative Binomial with:
r = 2 + 2 + 2 = 6, and β = 0.16.
r (r + 1) (r + 2) β3 (6) (7) (8) 0.163

f(3) = = = 6.03%.
6 (1 + β)r + 3 6 (1 + 0.16 )9
19.36. A. The number of accidents Moe has over one year is Negative Binomial:
with r = α = 2 and β = θ = 0.04.
1 1
f(0) = r = = 0.9246.
(1 + β) (1 + 0.04 )2
The number of accidents Larry has over two years is Negative Binomial:
with r = α = 2 and β = 2θ = 0.08.
rβ (2)(0.08)
f(1) = r + 1 = = 0.1270.
(1 + β) (1 + 0.08 )3
The number of accidents Curly has over three years is Negative Binomial:
with r = α = 2 and β = 3θ = 0.12.
2
r (r + 1) β (2) (3) 0.12 2
f(2) = = = 0.0275.
2 (1 + β)r + 2 2 (1 + 0.12)4
Prob[Moe = 0, Larry = 1, and Curly = 2)] = (0.9246)(0.1270)(0.0275) = 0.32%.

19.37. D. The number of accidents Moe has over one year is Negative Binomial:
with r = α = 2 and β = θ = 0.04.
f(0) = 0.9246. f(1) = 0.0711. f(2) = 0.0041. f(3) = 0.0002.
The number of accidents Larry has over two years is Negative Binomial:
with r = α = 2 and β = 2θ = 0.08.
f(0) = 0.8573. f(1) = 0.1270. f(2) = 0.0141. f(3) = 0.0014.
The number of accidents Curly has over three years is Negative Binomial:
with r = α = 2 and β = 3θ = 0.12.
f(0) = 0.7972. f(1) = 0.1708. f(2) = 0.0275. f(3) = 0.0039.
We need to list all of the possibilities:
Prob[M = 0, L = 0, C = 3] + Prob[M = 0, L = 1, C = 2] + Prob[M = 0, L = 2, C = 1] +
Prob[M = 3, L = 0, C = 0] =
(0.9246) {(0.8573)(0.0039) + (0.1270)(0.0275) + (0.0141)(0.1708) + (0.0014)(0.7972)} +
(0.0711) {(0.8573)(0.0275) + (0.1270)(0.1708) + (0.0141)(0.7972)} +
(0.0041) {(0.8573)(0.1708) + (0.1270)(0.7972)} + (0.0002)(0.8573)(0.7972) = 1.475%.
Comment: Adding up the three independent drivers, M + L + C does not follow a Negative
Binomial, since the betas are not the same.
Note that the solution to the previous question is one of the possibilities here.
19.38. D, 19.39. B, & 19.40. D.

The mixed distribution is a Negative Binomial with r = α = 4 and β = θ = 0.1.
f(0) = (1+β)-r = 1.1-4 = 0.6830. Expected size of group A: 6830.
f(1) = rβ(1+β)-(r+1) = (4)(.1)1.1-5 = 0.2484. Expected size of group B: 2484.
Expected size of group C: 10000 - (6830 + 2484) = 686.
19.41. E. For an individual insured, the probability of no claims by time t is the density at zero of a
Poisson Distribution with mean λt: exp[-λt].
In other words, the probability the first claim occurs by time t is: 1 - exp[-λt].
This an Exponential Distribution with mean 1/λ.
Thus, for an individual the average wait until the first claim is 1/λ.
(This is a general result for Poisson Processes.)
For a Gamma Distribution, E[X-1] = θ−1 Γ(α + k) / Γ(α) = 1/ {θ(α-1)}, α > 1.
Lambda is Gamma Distributed, thus E[1/λ] = 1/ {θ(α-1)} = 1 / {(0.02)(3 - 1)} = 25.
19.42. C. There is an 80% chance we get a random draw from the Poisson with mean λ.
In which case, we have a Gamma-Poisson with α = 1 and θ = 0.1.
The mixed distribution is Negative Binomial with r = 1 and β = 0.1. f(2) = 0.12 / 1.13 = 0.751%.
There is a 20% chance we get a random draw from the Poisson with mean 3λ.
3λ follows an Exponential with mean 0.3.
We have a Gamma-Poisson with α = 1 and θ = 0.3.
The mixed distribution is Negative Binomial with r = 1 and β = 0.3. f(2) = 0.32 / 1.33 = 4.096%.
Thus the overall probability of two claims is: (0.8)(0.751%) + (0.2)(4.096%) = 1.420%.
19.43. B. This is a Gamma-Poisson with α = 2 and θ = 1/5.

Thus the mixed distribution is Negative Binomial with r = 2 and β = 1/5.
For this Negative Binomial: f(0) = 1/(1 + 1/5)2 = 25/36. f(1) = (2)(1/5)/(1 + 1/5)3 = 25/108.
Probability of at least 2 claims is: 1 - 25/36 - 25/108 = 8/108 = 2/27 = 7.41%.
19.44. A. The mixed distribution is Negative Binomial with r = 2 and β = 0.5.

Thinning, small claims are Negative Binomial with r = 2 and β = (60%)(0.5) = 0.3.
Variance of the number of small claims is: (2)(0.3)(1.3) = 0.78.
Alternately, for each insured, the number of small claims is Poisson with mean: 0.6 λ.
0.6 λ follows a Gamma Distribution with α = 2 and θ = (0.6)(0.5) = 0.3.
Thus the mixed distribution for small claims is Negative Binomial with r = 2 and β = 0.3.
Variance of the number of small claims is: (2)(0.3)(1.3) = 0.78.
19.45. E. The mixed distribution is Negative Binomial with r = 2 and β = 0.5.

Thinning, large claims are Negative Binomial with r = 2 and β = (40%)(0.5) = 0.2.
Variance of the number of large claims is: (2)(0.2)(1.2) = 0.48.
Alternately, for each insured, the number of large claims is Poisson with mean: 0.2 λ.
0.2 λ follows a Gamma Distribution with α = 2 and θ = (0.4)(0.5) = 0.2.
Thus the mixed distribution for large claims is Negative Binomial with r = 2 and β = 0.2.
Variance of the number of large claims is: (2)(0.2)(1.2) = 0.48.
Comment: The number of small and large claims is positively correlated.
The distribution of claims of all sizes is Negative Binomial with r = 2 and β = 0.5;
it has a variance of: (2)(0.5)!9.5) = 1.5 > 1.26 = 0.78 + 0.48.
19.46. C. Statements 1 and 2 are true, while #3 is false.

The mixture of a Poisson by a Poisson is not a Negative Binomial Distribution, but a much more
complicated distribution.
19.47. A. For the Gamma-Poisson, the mixed distribution is a Negative Binomial with mean rβ and
variance = rβ(1+β). Thus we have rβ = 0.1 and 0.15/0.1 = 1+β. Thus β = 0.5, and r = 0.1/0.5 = 0.2.
The parameters of the Gamma follow from those of the Negative Binomial: α = r = 0.2 and
θ = β = 0.5. The variance of the Gamma is αθ2 = 0.05.

Alternately, the total variance is 0.15.
For the Gamma-Poisson, the variance of the mixed Negative Binomial is equal to: mean of the
Gamma + variance of the Gamma.
Therefore, the variance of the Gamma = 0.15 - 0.10 = 0.05.
∞ ∞ ∞
19.48. B. ∫0 e- λ f(λ) dλ = ∫0 e- λ 36λ e - 6λ dλ = 36 ∫0 λ e- 7λ dλ = (36) {Γ(2)/ 72} = (36)(1/49) =
0.735.
Alternately, assume that the frequency for a single insured is given by a Poisson with a mean of λ.
(This is consistent with the given information that the chance of 0 claims is e−λ.) In that case one would
have a Gamma-Poisson process and the mixed distribution is a Negative Binomial. The given
Gamma distribution of θ has α = 2 and θ = 1/6. The mixed Negative Binomial has r = α = 2 and
β = θ = 1/6, and f(0) = (1+β)-r = (1 + 1/6)-2 = 36/49.
Comment: Note that while the situation described is consistent with a Gamma-Poisson, it need not
be a Gamma-Poisson.
19.49. A. One can solve for the parameters of the Gamma, αθ = 0.1, and αθ2 = 0.01,
therefore θ = 0.1 and α = 1.
The mixed distribution is a Negative Binomial with parameters r = α = 1 and β = θ = 0.1,
a Geometric Distribution. f(0) = 1/(1+β) = 1/1.1 = 10/11. f(1) = β/(1+β)2 = 0.1/1.12 = 10/121.
The chance of 2 or more accidents is: 1 - f(0) - f(1) = 1 - 10/11 - 10/121 = 1/121.
19.50. A. mean of Gamma = αθ = 1 and variance of Gamma = αθ2 = 2.

Therefore, θ = 2 and α = 1/2.
The mixed distribution is a Negative Binomial with r = α = 1/2 and β = θ = 2.
f(1) = rβ/(1+β)1+r = (1/2)(2)/(33/2) = 0.192.
Alternately, f(1) =
∞ ∞ ∞
(λ / 2)1/ 2 e- λ / 2
∫0 f(1 | λ) g(λ) dλ = ∫0 λ ∫0 λ1/ 2 e - 3λ / 2 dλ =
1
e- λ dλ =
λ Γ(1/ 2) Γ(1/ 2) 2
1 Γ(3 / 2) -3/2
Γ(3/2) (2/3)3/2 = 2 3 = 2(1/2) 3-3/2 = 0.192.
Γ(1/ 2) 2 Γ(1/ 2)
19.51. C. This is a Gamma-Poisson with α = 2 and θ = 1.

The mixed distribution is Negative Binomial with r = α = 2 , and β = θ = 1.
For a Negative Binomial Distribution,
f(3) = {(r)(r+1)(r+2)/3!}β3/(1+β)r+3 = {(2)(3)(4)/6}(13 )/(25 ) = 1/8.
Thus we expect (1000)(1/8) = 125 out of 1000 simulated values to be 3.
19.52. A. The mean of the Negative Binomial is rβ = 0.2, while the variance is: rβ(1+β) = 0.4.
Therefore, 1 + β = 2 ⇒ β = 1 and r = 0.2. For a Gamma-Poisson, α = r = 0.2 and θ = β = 1.
Therefore, the variance of the Gamma Distribution is: αθ2 = (0.2)(12 ) = 0.2.
Alternately, for the Gamma-Poisson, the variance of the mixed Negative Binomial is equal to:
mean of the Gamma + variance of the Gamma. Variance of the Gamma =
Variance of the Negative Binomial - Mean of the Gamma =
Variance of the Negative Binomial - Overall Mean =
Variance of the Negative Binomial - Mean of the Negative Binomial = 0.4 - 0.2 = 0.2.
19.53. A. For the Gamma, mean = αθ = 2, and variance = αθ2 = 4. Thus θ = 2 and α = 1.
This is a Gamma-Poisson, with mixed distribution a Negative Binomial, with r = α = 1 and β = θ = 2.
This is a Geometric with f(1) = β/(1+β)2 = 2/(1+2)2 = 2/9 = 0.222.
Alternately, λ is distributed via an Exponential with mean 2, f(λ) = e−λ/2/2.

∞
∫
Prob[1 claim] = Prob[1 claim | λ] f(λ) dλ = ∫ λ e- λ e- λ / 2 / 2 dλ = (1/2) ∫0 λ e- 3λ / 2 dλ
= (1/2) (2/3)2 Γ(2) = (1/2)(4/9)(1!) = 2/9 = 0.222.
Alternately, for the Gamma-Poisson, the variance of the mixed Negative Binomial = total variance =
E[Var[N | λ]] + Var[E[N | λ]] = E[λ] + Var[λ] = mean of the Gamma + variance of the Gamma = 2 + 4
= 6. The mean of the mixed Negative Binomial = overall mean = E[λ] = mean of the Gamma = 2.
Therefore, rβ = 2 and rβ(1+β) = 6. ⇒ r =1 and β = 2.
f(1) = β/(1+β)2 = 2/(1+2)2 = 2/9 = 0.222.

Comment: The fact that it is the sixth rather than some other minute is irrelevant.
19.54. C. Over two minutes (on the same day) we have a Poisson with mean 2λ.
λ ∼ Gamma(α, θ) = Gamma (1, 2).
2λ ∼ Gamma(α, 2θ) = Gamma (1, 4), as per inflation.

Mixed Distribution is Negative Binomial, with r = α = 1 and β = θ = 4.
f(1) = β/(1 + β)2 = 4/(1 + 4)2 = 16%.

Comment: If one multiplies a Gamma variable by a constant, one gets another Gamma with the
same alpha and with the new theta equal to that constant times the original theta.
19.55. D. A ∼ Negative Binomial with r = 1 and β = 2.
B ∼ Negative Binomial with r = 1 and β = 2.
A + B ∼ Negative Binomial with r = 2 and β = 2.

f(1) = r β / (1 + β)1+r = (2)(2) / (1 + 2)3 = 14.8%.
Alternately, the number of coins found in the minutes are independent Poissons with means λ1 and
λ 2 . Total number found is Poisson with mean λ1 + λ2 .
λ 1 + λ2 ∼ Gamma(2α, θ) = Gamma (2, 2).
Mixed Negative Binomial has r = 2 and β = 2. Proceed as before.

Alternately, P[A + B = 1] = P[A = 1]P[B = 0] + P[A = 0]P[B = 1] = (2/9)(1/3) + (1/3)(2/9) = 14.8%.
Comment: The sum of two independent Gamma variables with the same theta, is another Gamma
with the same theta and with the new alpha equal to the sum of the alphas.
19.56. E. Prob[1 coin during minute 3 | λ] = λe−λ. Prob[1 coin during minute 5 | λ] = λe−λ.
The Gamma has θ = 2 and α = 1, an Exponential. π(λ) = e−λ/2/2.

Prob[1 coin during minute 3 and 1 coin during minute 5] =
∫ Prob[1 coin during minute 3 | λ] Prob[1 coin during minute 5 | λ] π(λ) dλ =

∞ ∞
∫ (λe −λ) (λe−λ ) (e−λ/ 2 / 2) dλ = ∫ λ2 e− 2.5λ / 2 dλ = Γ(3) (1/2.5)3 / 2 = (1/2)(2/2.53) = 6.4%.
0 0
Comment: It is true that Prob[1 coin during minute 3] = Prob[1 coin during minute 5] = 2/9.
(2/9)(2/9) = 4.94%. However, since the two probabilities both depend on the same lambda, they
are not independent.
19.57. D. Prob[1 coin during minute 1 | λ] = λe−λ. Prob[2 coins during minute 2 | λ] = λ2e−λ/2.
Prob[3 coins during minute 3 | λ] = λ3e−λ/6.
The Gamma has θ = 2 and α = 1, an Exponential. π(λ) = e−λ/2/2.

Prob[1 coin during minute 1, 2 coins during minute 2, and 3 coins during minute 3] =
∫ Prob[1 coin minute 1 | λ] Prob[2 coins minute 2 | λ] Prob[3 coins minute 3 | λ] π(λ) dλ =
∞ ∞
∫ (λe −λ) (λ2 e− λ/2 ) (λ3e− λ/ 6) (e−λ / 2 / 2) dλ = ∫ λ6 e− 3.5λ / 24 dλ = Γ(7) (1/3.5)7 / 24 =
0 0
(720/24)/3.57 = 0.466%.
Comment: Prob[1 coin during minute 1] = 2/9. Prob[2 coins during minute 2] = 4/27.
Prob[3 coins during minute 3] = 8/81. (2/9)(4/27)(8/81) = 0.325%. However, since the three
probabilities depend on the same lambda, they are not independent.
19.58. A. From a previous solution, for one minute, the mixed distribution is Geometric with β = 2.
f(1) = β/(1+β)2 = 2/(1+2)2 = 2/9 = 0.2222.

Since the minutes are on different days, their lambdas are picked independently.
Prob[1 coin during 1 minute today and 1 coin during 1 minute tomorrow] =
Prob[1 coin during a minute] Prob[1 coin during a minute] = 0.22222 = 4.94%.
19.59. C. Over three minutes (on the same day) we have a Poisson with mean 3λ.
λ ∼ Gamma(α, θ) = Gamma (1, 2).
3λ ∼ Gamma(α, 3θ) = Gamma (1, 6).

Mixed Distribution is Negative Binomial, with r = α = 1 and β = θ = 6.
f(1) = β/(1 + β)2 = 6/(1 + 6)2 = 0.1224.

Since the time intervals are on different days, their lambdas are picked independently.
Prob[1 coin during 3 minutes today and 1 coin during 3 minutes tomorrow] =
Prob[1 coin during 3 minutes] Prob[1 coin during 3 minutes] = 0.12242 = 1.50%.
19.60. E. Gamma has mean = αθ = 3 and variance = αθ2 = 3 ⇒ θ = 1 and α = 3.

The Negative Binomial mixed distribution has r = α = 3 and β = θ = 1.
f(0) = 1/(1+β)3 = 1/8. f(1) = rβ/(1+β)4 = 3/16. F(1) = 1/8 + 3/16 = 5/16 = 0.3125.
19.61. E. Assume the prior Gamma, used by both actuaries, has parameters α and θ.
The first actuary is simulating N drivers from a Gamma-Poisson frequency process.
The number of claims from a random driver is Negative Binomial with r = α and β = θ.
The total number of claim is a sum of N independent, identically distributed Negative Binomials,
which is Negative Binomial with parameters r = Nα and β = θ.
The second actuary is simulating N years for a single driver.
An individual who is Poisson with mean λ, over N years is Poisson with mean Nλ.
I. The Negative Binomial Distribution simulated by the first actuary has mean Nαθ.
The Poisson simulated by the second actuary has mean Nλ, where λ depends on which driver the
second actuary has picked at random. There is no reason why the mean number of claims simulated
by the two actuaries should be the same. Thus statement I is not true.
II. The number of claims simulated will usually be different, since they are from two different
distributions. Thus statement II is not true.
III. The first actuaryʼs Negative Binomial has variance αθ(1 + θ). The second actuaryʼs simulated
sequence has an expected variance of λ, where λ depends on which driver the second actuary has
picked at random. The expected variance for the second actuaryʼs simulated sequence could be
higher or lower than the first actuaryʼs, depending on which driver he has picked. Thus statement III is
not true.
19.62. C. Gamma-Poisson. The mixed distribution is Negative Binomial with r = α = 2 and

β = θ = 1. f(0) = 1/(1 + β)r = 1/(1 + 1)2 = 1/4.
19.63. E. From the previous solution, the probability that each driver does not have a claim is 1/4.
Thus for 1000 independent drivers, the number of drivers with no claims is Binomial with m = 1000
and q = 1/4. This Binomial has mean mq = 250, and variance
mq(1 - q) = 187.5. Using the Normal Approximation with continuity correction,
Prob[At most 265 claim-free drivers] ≅ Φ[(265.5 - 250)/ 187.5 ] = Φ[1.13] = 87.08%.
19.64. D. The distribution of number of claims from a single driver is Negative Binomial with
r = 2 and β = 1. The distribution of the sum of 1000 independent drivers is Negative Binomial with
r = (1000)(2) = 2000 and β = 1. This Negative Binomial has mean rβ = 2000, and
variance rβ(1 + β) = 4000. Using the Normal Approximation with continuity correction,
Prob[more than 2020 claims] ≅ 1 - Φ[(2020.5 - 2000)/ 4000 ] = 1 - Φ[0.32] = 37.45%.

Alternately, the mean of the sum of 1000 independent drivers is 1000 times the mean of single
driver: (1000) (2) = 2000.
The variance of the sum of 1000 independent drivers is 1000 times the variance of single driver:
(1000) (2) (1) (1+1) = 4000. Proceed as before.
19.65. C. The distribution of number of claims from a single driver is Negative Binomial with r = 2
and β = 1. f(0) = 1/4. f(1) = rβ/(1 + β)r+1 = (2)(1)/(1 + 1)3 = 1/4.
f(2) = {r(r + 1)/2}β2/(1 + β)r+2 = {(2)(3)/2}(12 )/(1 + 1)4 = 3/16.

The number of drivers with given numbers of claims is a multinomial distribution,
with parameters 1000, f(0), f(1), f(2), ... = 1000, 1/4, 1/4, 3/16, ....
The covariance of the number of drivers with 1 claim and the number with 2 claims is:
-(1000)(1/4)(3/16) = -46.875.
The variance of the number of drivers with 1 claim is: (1000)(1/4)(1 - 1/4) = 187.5.
The variance of the number of drivers with 2 claims is: (1000)(3/16)(1 - 3/16) = 152.34.
The correlation of the number of drivers with 1 claim and the number with 2 claims is:
-46.875/ (187.5)(152.34) = -0.277.
Comment: Well beyond what you are likely to be asked on your exam!
The multinomial distribution is discussed in A First Course in Probability by Ross.
f(1) f(2)
The correlation is: - = -1/ 13 = -0.277.
{1 - f(1)} {1 - f(2)}
2016-C-1, Frequency Distributions, §20 Tails HCM 10/21/15, Page 412
Section 20, Tails of Frequency Distributions
Actuaries are sometimes interested in the behavior of a frequency distribution as the number of
claims gets very large.169 The question of interest is how quickly the density and survival function go
to zero as x approaches infinity. If the density and survival function go to zero more slowly, one
describes that as a "heavier-tailed distribution."
Those frequency distributions which are heavier-tailed than the Geometric distribution are often
considered to have heavy tails, while those lighter-tailed than Geometric are consider to have light
tails.170 There are number of general methods by which one can distinguish which distribution or
empirical data set has the heavier tail. Lighter-tailed distributions have more moments that exist.
For the frequency distributions on the exam all of the moments exist.
Nevertheless, the three common frequency distributions differ in their tail behavior. Since the
Binomial has finite support, f(x) = 0 for x > n, it is very light-tailed. The Negative Binomial has its
variance greater than its mean, so that the Negative Binomial is heavier-tailed than the Poisson which
has its variance equal to its mean.
From lightest to heaviest tailed, the frequency distribution in the (a,b,0) class are:
Binomial, Poisson, Negative Binomial r > 1, Geometric, Negative Binomial r < 1.
Skewness:
The larger the skewness, the heavier-tailed the distribution. The Binomial distribution for
q > 0.5 is skewed to the left (has negative skewness.) The Binomial distribution for q < 0.5, the
Poisson distribution, and the Negative Binomial distribution are skewed to the right (have positive
skewness); they have a few very large values and many smaller values. A symmetric distribution
has zero skewness. Therefore, the Binomial Distribution for q = 0.5 has zero skewness.
Mean Residual Lives/ Mean Excess Loss:
As with loss distributions one can define the concept of the mean residual life.
The Mean Residual Life, e(x) is defined as:
e(x) = (average number of claims for those insureds with more than x claims) - x.
Thus we only count those insureds with more than x and only that part of each number of claims
greater than x.171 Heavier-tailed distributions have their mean residual life increase to infinity, while
lighter-tailed distributions have their mean residual life approach a constant or decline to zero.
169
Actuaries are more commonly concerned with the tail behavior of loss distributions, as discussed in “Mahlerʼs
Guide to Loss Distributions.”
170
171
Thus the Mean Residual Life is the mean of the frequency distribution truncated and shifted at x.
One complication is that for discrete distributions this definition is discontinuous at the integers. For
example, assume we are interested in the mean residual life at 3. As we take the limit from below
we include those insureds with 3 claims in our average; as we approach 3 from above, we donʼt
include insureds with 3 claims in our average.
Define e(3-) as the limit as x approaches 3 from below of e(x). Similarly, one can define e(3+) as the
limit as x approaches 3 from above of e(x). Then it turns out that e(0-) = mean, in analogy to the
situation for continuous loss distributions. For purposes of comparing tail behavior of frequency
distributions, one can use either e(x-) or e(x+). I will use the former, since the results using e(x-) are
directly comparable to those for the continuous size of loss distributions. At integer values of x:
∞ ∞
∑ (i - x) f(i) ∑ (i - x) f(i)
i =x i =x
e(x-) = = .
∞ S(x)
∑ f(i)
i =x
One can compute the mean residual life for the Geometric Distribution, letting q = β/(1+β) and thus
1 - q = 1/(1+β):
∞ ∞ ∞
1
e(x-) S(x-1) = ∑ (i - x) f(i) = ∑ (i - x) βi / (1+β) i + 1 = ∑
1+β i=x+1
(i - x) qi =
i=x+1 i=x+1
1 ⎧⎪ ∞ ∞ ∞ ⎫⎪ ⎧ qx + 1 qx + 2 qx + 3 ⎫
⎨ ∑ qi + ∑ qi + ∑ qi + ...⎬ = (1- q) ⎨ + + + ... ⎬
1+β ⎪⎩i=x+1 ⎭⎪ ⎩ 1- q 1- q 1- q ⎭
i=x+2 i=x+3
= qx+1 + qx+2 + qx+3 + ... = qx+1 /(1-q) = {β/(1+β)}x+1(1+β) = βx+1 /(1+β)x.
In a previous section, the survival function for the geometric distribution was computed as:
S(x) = {β/(1+β)}x+1. Therefore, S(x-1) = {β/(1+β)}x.
β x + 1 / (1+ β) x
Thus e(x-) = = β.
{β / (1+ β)} x
The mean residual life for the Geometric distribution is constant.172 As discussed previously, the
Geometric distribution is the discrete analog of the Exponential distribution which also has a constant
mean residual life.173
172
e(x-) = β = E[X].
173
The Exponential and Geometric distributions have constant mean residual lives due to their memoryless property
as discussed in Section 6.3 of Loss Models.
As discussed previously, the Negative Binomial is the discrete analog of the Gamma Distribution.
The tail behavior of the Negative Binomial is analogous to that of the Gamma.174 The mean residual
life for a Negative Binomial goes to a constant. For r < 1, e(x-) increases to β, the mean of the
corresponding Geometric, while for r>1, e(x-) decreases to β as x approaches infinity. For r = 1, one
has the Geometric Distribution with e(x-) constant.
Using the relation between the Poisson Distribution and the Incomplete Gamma Function, it
λxe-λ
turns out that for the Poisson e(x-) = (λ-x) + .175 The mean residual life e(x-) for the
Γ (x) Γ (x; λ )
Poisson Distribution declines to zero as x approaches infinity.176 177 This is another way of seeing that
the Poisson has a lighter tail than the Negative Binomial Distribution.
Summary:
Here are the common frequency distributions, arranged from lightest to heaviest righthand tail:
Frequency Distribution Skewness Righthand Tail Behavior Tail Similar to

Binomial, q > 0.5 negative Finite Support
Binomial, q = 0.5 zero Finite Support
Binomial, q < 0.5 positive Finite Support
Poisson positive e(x-) → 0, Normal Distribution

approximately as 1/x
Negative Binomial, r >1 positive e(x-) decreases to β Gamma, α > 1
Geometric positive e(x-) constant = β Exponential
(Negative Binomial, r =1) (Gamma, α = 1)
Negative Binomial, r < 1 positive e(x-) increases to β Gamma, α < 1

174
See “Mahlerʼs Guide to Loss Distributions”, for a discussion of the mean residual life for the Gamma and other size
of loss distributions. For a Gamma Distribution with α>1, e(x) decreases towards a horizontal asymptote θ.
For a Gamma Distribution with α<1, e(x) increases towards a horizontal asymptote θ.
175
For the Poisson F(x) = 1 - Γ(x+1; λ).
176
It turns out that e(x-) ≅ λ / x for very large x. This is similar to the tail behavior for the Normal Distribution.
While e(x-) declines to zero, e(x+) for the Poisson Distribution declines to one as x approaches infinity.
177
This follows from the fact that the Poisson is a limit of Negative Binomial Distributions. For a sequence of Negative
Binomial distributions with rβ = λ as r → ∞ (and β → 0), in the limit one approaches a Poisson Distribution with the
mean λ. The tails of each Negative Binomial have e(x-) decreasing to β as x approaches infinity.
As β → 0, the limits of e(x-) → 0.
Skewness and Kurtosis of the Poisson versus the Negative Binomial:178
1
The Poisson has skewness: .
λ
1 + 2β
The Negative Binomial has skewness: .
rβ(1 + β)
Therefore, if a Poisson and Negative Binomial have the same mean, λ = rβ, then the ratio of the
1 + 2β
skewness of the Negative Binomial to that of the Poisson is: > 1.
1 + β
The Poisson has kurtosis: 3 + 1/λ.

6β 2 + 6β + 1
The Negative Binomial has kurtosis: 3 + .
r β (1+ β)
Therefore, if a Poisson and Negative Binomial have the same mean, λ = rβ, then the ratio of the
kurtosis minus 3 of the Negative Binomial to that of the Poisson is:179
6β 2 + 6β + 1
> 1.
1 +β
Tails of Compound Distributions:
Compound frequency distributions can have longer tails than either their primary or secondary
distribution. If the primary distribution is the number of accidents, and the secondary distribution is the
number of claims, then one can have a large number of claims either due to a large number of
accidents, or an accident with a large number of claims, or a combination of the two. Thus there is
more chance for an unusually large number of claims.
Generally the longer-tailed the primary distribution and the longer-tailed the secondary distribution,
the longer-tailed the compound distribution. The skewness of a compound distribution can be rather
large.
178
See “The Negative Binomial and Poisson Distributions Compared,” by Leroy J. Simon, PCAS 1960.
179
The kurtosis minus 3 is sometimes called the excess.
Tails of Mixed Distributions:
Mixed distributions can also have long tails. For example, the Gamma Mixture of Poissons is a
Negative Binomial, with a longer tail than the Poisson. As with compound distributions, with mixed
distributions there is more chance for an unusually large number of claims. One can either have a
unusually large number of claims for a typical value of the parameter, have an unusual value of the
parameter which corresponds to a large expected claim frequency, or a combination of the two.
Generally the longer tailed the distribution type being mixed and the longer tailed the mixing
distribution, the longer tailed the mixed distribution.
Tails of Aggregate Loss Distributions:
Actuaries commonly look at the combination of frequency and severity. This is termed the aggregate
loss distribution. The tail behavior of this aggregate distribution is determined by the behavior of the
heavier-tailed of the frequency and severity distributions.180
Since the common frequency distributions have tails that are similar to the Gamma Distribution or
lighter and the common severity distributions for Casualty Insurance have tails at least as heavy as
the Gamma, actuaries working on liability or workers compensation insurance are usually most
concerned with the heaviness of the tail of the severity distribution. It is the rare extremely large
claims that then are of concern.
However, natural catastrophes such as hurricanes or earthquakes can be examples where a

large number of claims can be the concern.181 (Tens of thousands of homeowners claims, even
limited to for example 1/4 million dollars each, can add up to a lot of money!) In that case the
tail of the frequency distribution could be heavier than a Negative Binomial.
180
See for example Panjer & WiIlmot, Insurance Risk Models.
181
Natural catastrophes are now commonly modeled using simulation models that incorporate the science of the
particular physical phenomenon and the particular distribution of insured exposures.
Problems:
20.1 (1 point) Which of the following frequency distributions have positive skewness?
1. Negative Binomial Distribution with r = 3, β = 0.4.
2. Poisson Distribution with λ = 0.7.
3. Binomial Distribution with m = 3, q = 0.7.
A. 1, 2 only
B. 1, 3 only
C. 2, 3 only
D. 1, 2, and 3
E. The correct answer is not given by A, B, C, or D.
Five friends: Oleg Puller, Minnie Van, Bob Alou, Louis Liu, and Shelly Fish, are discussing studying
for their next actuarial exam. Theyʼve counted 10,000 pages worth of readings and agree that on
average they expect to find about 2000 “important ideas”. However, they are debating how many
of these pages there are expected to be with 3 or more important ideas.
20.2 (2 points) Oleg assumes the important ideas are distributed as a Binomial with
q = 0.04 and m = 5.
How many pages should Oleg expect to find with 3 or more important ideas?
A. Less than 10
B. At least 10 but less than 20
C. At least 20 but less than 40
D. At least 40 but less than 80
E. At least 80
20.3 (2 points) Minnie assumes the important ideas are distributed as a Poisson with λ = 0.20.
How many pages should Minnie expect to find with 3 or more important ideas?
A. Less than 10
E. At least 80
20.4 (2 points) Bob assumes the important ideas are distributed as a Negative Binomial with
β = 0.1 and r = 2. How many pages should Bob expect to find with 3 or more important ideas?
A. Less than 10
E. At least 80
20.5 (3 points) Louis assumes the important ideas are distributed as a compound
Poisson-Poisson distribution, with λ1 = 1 and λ2 = 0.2.
How many pages should Louis expect to find with 3 or more important ideas?
A. Less than 10
E. At least 80
20.6 (3 points) Shelly assumes the important ideas are distributed as a compound
Poisson-Poisson distribution, with λ1 = 0.2 and λ2 = 1.
How many pages should Shelly expect to find with 3 or more important ideas?
A. Less than 10
E. At least 80
∞
20.7 (3 points) Define Riemannʼs zeta function as: ζ(s) = ∑1/ ks , s > 1.
k=1
1
Let the zeta distribution be: f(x) = ρ + 1 , x = 1, 2, 3, ..., ρ > 0.
x ζ(ρ +1)
Determine the moments of the zeta distribution.
20.8 (4B, 5/99, Q.29) (2 points) A Bernoulli distribution, a Poisson distribution, and a uniform
distribution each has mean 0.8. Rank their skewness from smallest to largest.
A. Bernoulli, uniform, Poisson B. Poisson, Bernoulli, uniform
C. Poisson, uniform, Bernoulli D. uniform, Bernoulli, Poisson
E. uniform, Poisson, Bernoulli
20.1. A. 1. True. The skewness of any Negative Binomial Distribution is positive.

2. True. The skewness of any Poisson Distribution is positive. 3. False. The skewness of the
Binomial Distribution depends on the value of q. For q > .5, the skewness is negative.
5!
20.2. A. f(x) = 0.04x 0.965-x.
x! (5 - x)!
One needs to sum the chances of having x = 0, 1, and 2 :
n 0 1 2
f(n) 0.81537 0.16987 0.01416
F(n) 0.81537 0.98524 0.99940
Thus the chance of 3 or more important ideas is: 1 - 0.99940 = 0.00060.
Thus we expect: (10000)(0.00060) = 6.0 such pages.
20.3. B. f(x) = e-0.2 0.2x /x! .

One needs to sum the chances of having x = 0, 1, and 2 :
n 0 1 2
f(n) 0.81873 0.16375 0.01637
F(n) 0.81873 0.98248 0.99885
(x + 2 -1)!
20.4. C. f(x) = (0.1)x / (1.1)x+2 = (x+1)(10/11)2 (1/11)x.
x! (2 - 1)!
One needs to sum the chances of having x = 0, 1, and 2:
n 0 1 2
f(n) 0.82645 0.15026 0.02049
F(n) 0.82645 0.97671 0.99720
Comment: Note that the distributions of important ideas in these three questions all have a mean of
.2. Since the Negative Binomial has the longest tail, it has the largest expected number of pages
with lots of important ideas. Since the Binomial has the shortest tail, it has the smallest expected
number of pages with lots of important ideas.
20.5. D. For the Primary Poisson a = 0 and b = λ1 = 1. The secondary Poisson has density at zero
of e-0.2 = 0.8187. The p.g.f of the Primary Poisson is P(z) = e(z-1). The density of the compound
distribution at zero is the p.g.f. of the primary distribution at 0.8187: e(0.8187-1) = 0.83421.
The densities of the secondary Poisson Distribution with λ = 0.2 are:
n s(n)
0 0.818731
1 0.163746
2 0.016375
3 0.001092
4 0.000055
5 0.000002
x x
∑ (a + jb / x) s(j) c(x - j) = (1/x) ∑ j s(j) c(x- j).

1
Use the Panjer Algorithm, c(x) =
1 - a s(0)
j=1 j=1
c(1) = (1/1) (1) s(1) c(0) = (.16375)(.83421) = .13660.

c(2) = (1/2) {(1)s(1) c(1) + (2)s(2)c(0)} = (1/2){(0.16375)(0.13660) + (2)(0.01638)(0.83421)} =
0.02485. Thus c(0) + c(1) + c(2) = 0.83421 + 0.13660 + 0.02485 = 0.99566.
Comment: The Panjer Algorithm (recursive method) is discussed in “Mahlerʼs Guide to Aggregate
Distributions.”
20.6. E. For the Primary Poisson a = 0 and b = λ1 = 0.2.
The secondary Poisson has density at zero of e-1 = 0.3679.

The p.g.f of the Primary Poisson is P(z) = e0.2(z-1).
The density of the compound distribution at zero is the p.g.f. of the primary distribution at 0.3679 :
e0.2(0.3679-1) = 0.88124.
The densities of the secondary Poisson Distribution with λ = 1 are:
n s(n)
0 0.367879
1 0.367879
2 0.183940
3 0.061313
4 0.015328
5 0.003066
x x
∑ (a + jb / x) s(j) c(x - j) = (0.2/x) ∑ j s(j) c(x- j).

1
Use the Panjer Algorithm, c(x) =
1 - a s(0)
j=1 j=1
c(1) = (0.2/1) (1) s(1) c(0) = (0.2)(0.3679)(0.88124) = 0.06484.

c(2) = (0.2/2) {(1)s(1) c(1) + (2)s(2)c(0)} = (.1){(0.3679)(0.06484)+(2)(0.1839)(0.88124)} =
0.03480. Thus c(0) + c(1) + c(2) = 0.88124 + 0.06484 + 0.03480 = 0.98088.
Comment: This Poisson-Poisson has a mean of .2, but an even longer tail than the previous
Poisson-Poisson which has the same mean. Note that it has a variance of (0.2)(1) + (1)2 (0.2) =
0.40, while the previous Poisson-Poisson has a variance of (1)(0.2) + (0.2)2 (1) = 0.24.
The Negative Binomial has a variance of (2)(0.1)(1.1) = 0.22.
The variance of the Poisson is 0.20. The variance of the Binomial is (5)(0.04)(0.96) = 0.192.
20.7.
∞ ∞
E[Xn ] = ∑ xn (1/ xρ + 1) / ζ(ρ + 1) = ∑ (1/ xρ + 1- n)/ ζ(ρ + 1) = ζ(ρ + 1 - n) / ζ(ρ + 1), n < ρ.
x=1 x=1
Comment: You are extremely unlikely to be asked about the zeta distribution!
The zeta distribution is discrete and has a heavy righthand tail similar to a Pareto Distribution or a
Single Parameter Pareto Distribution, with only some of its moments existing.
The zeta distribution is mentioned in Exercise A.12 at the end of Section 7 in Loss Models.
ζ(2) = π2/6. ζ(4) = π4/90. See the Handbook of Mathematical Functions.
20.8. A. The uniform distribution is symmetric, so it has a skewness of zero.

The Poisson has a positive skewness.
The Bernoulli has a negative skewness for q = 0.8 > 0.5.
Comment: For the Poisson with mean µ, the skewness is 1/ µ .
1 - 2q 1 - (2)(0.8)
For the Bernoulli, the skewness is: = = -1.5.
q(1 - q) (0.8) (1 - 0.8)
If one computes for this Bernoulli, the third central moment E[(X-0.8)3 ] =
-0.096
0.2(0 - 0.8)3 + 0.8(1 - 0.8)3 = -0.096. Thus the skewness is: = -1.5.
(0.8) (1 - 0.8)
2016-C-1, Frequency Distributions, §21 Important Ideas HCM 10/21/15, Page 423
Section 21, Important Formulas and Ideas
Here are what I believe are the most important formulas and ideas from this study guide to know for
the exam.
Basic Concepts (Section 2)
The mean is the average or expected value of the random variable.

The mode is the point at which the density function reaches its maximum.
The median, the 50th percentile, is the first value at which the distribution function is ≥ 0.5.
The 100pth percentile as the first value at which the distribution function ≥ p.
Variance = second central moment = E[(X - E[X])2 ] = E[X2 ] - E[X]2 .
Standard Deviation = Square Root of Variance.
Binomial Distribution (Section 3)
⎛m⎞ m!
f(x) = f(x) = ⎜ ⎟ qx (1- q )m-x = qx (1- q)m - x , 0 ≤ x ≤ m.
⎝x ⎠ x! (m- x)!
Mean = mq Variance = mq(1-q)
Probability Generating Function: P(z) = {1 + q(z-1)}m
The Binomial Distribution for m =1 is a Bernoulli Distribution.
X is Binomial with parameters q and m1 , and Y is Binomial with parameters q and m2 ,
X and Y independent, then X + Y is Binomial with parameters q and m1 + m2 .
Poisson Distribution (Section 4)
f(x) = λx e−λ / x!, x ≥ 0
Mean = λ Variance = λ
A Poisson is characterized by a constant independent claim intensity and vice versa.

λ 2 is also Poisson, with parameter λ1 + λ2 .
If frequency is given by a Poisson and severity is independent of frequency, then the

Geometric Distribution (Section 5)
βx
f(x) = .
(1 + β) x + 1
Mean = β Variance = β(1+β)
1
Probability Generating Function: P(z) = .
1- β(z-1)
⎛ β ⎞n
For a Geometric Distribution, for n > 0, the chance of at least n claims is: ⎜ ⎟ .
⎝ 1+ β ⎠
For a series of independent identical Bernoulli trials, the chance of the first success following x failures
is given by a Geometric Distribution with mean
β = (chance of a failure) / (chance of a success).
The Geometric shares with the Exponential distribution, the “memoryless property.” If one were to
truncate and shift a Geometric Distribution, then one obtains the same Geometric Distribution.
Negative Binomial Distribution (Section 6)
r(r + 1)...(r + x - 1) βx
f(x) = . Mean = rβ Variance = rβ(1+β)
x! (1+ β )x + r
Negative Binomial for r = 1 is a Geometric Distribution.

For the Negative Binomial Distribution with parameters β and r, with r integer, can be
thought of as the sum of r independent Geometric distributions with parameter β.
If X is Negative Binomial with parameters β and r1 , and Y is Negative Binomial with parameters β
and r2 , X and Y independent, then X + Y is Negative Binomial with parameters β and r1 + r2 .
For a series of independent identical Bernoulli trials, the chance of success number r following x
failures is given by a Negative Binomial Distribution with parameters r and
β = (chance of a failure) / (chance of a success).
Normal Approximation (Section 7)
In general, let µ be the mean of the frequency distribution, while σ is the standard
deviation of the frequency distribution, then the chance of observing at least i claims
and not more than j claims is approximately:
(j + 0.5) - µ (i - 0.5) − µ
Φ[ ] - Φ[ ].
σ σ
Normal Distribution
F(x) = Φ[(x-µ)/σ]
(x - µ)2
exp[- ]
2σ 2 , -∞ < x < ∞. exp[-x2 / 2]
f(x) = φ[(x-µ)/σ] / σ = φ(x) = , -∞ < x < ∞.
σ 2π 2π
Mean = µ Variance = σ2
Skewness = 0 (distribution is symmetric) Kurtosis = 3
Skewness (Section 8)
Skewness = third central moment /STDDEV3 = E[(X - E[X])3 ]/STDDEV3

= {E[X3 ] - 3 X E[X2 ] + 2 X 3 } / Variance3/2.
A symmetric distribution has zero skewness.
Binomial Distribution with q < 1/2 ⇔ positive skewness ⇔ skewed to the right.
Binomial Distribution q = 1/2 ⇔ symmetric ⇒ zero skewness.
Binomial Distribution q > 1/2 ⇔ negative skewness ⇔ skewed to the left.

Poisson and Negative Binomial have positive skewness.
Probability Generating Function (Section 9)
Probability Generating Function, p.g.f.:

∞
P(z) = Expected Value of zn = E[zn ] = ∑ f(n) zn .
n=0
The Probability Generating Function of the sum of independent frequencies is the product of the
individual Probability Generating Functions.
The distribution determines the probability generating function and vice versa.
⎛ dn P(z)⎞
f(n) = / n!. f(0) = P(0). Pʼ(1) = Mean.
⎝ dzn ⎠ z = 0
If a distribution is infinitely divisible, then if one takes the probability generating function to any
positive power, one gets the probability generating function of another member of the same family
of distributions. Examples of infinitely divisible distributions include: Poisson, Negative Binomial,
Compound Poisson, Compound Negative Binomial, Normal, Gamma.
Factorial Moments (Section 10)
nth factorial moment = µ(n) = E[X(X-1) .. (X+1-n)].
⎛ dn P(z)⎞
µ(n) = . Pʼ(1) = E[X]. Pʼʼ(1) = E[X(X-1)].
⎝ dzn ⎠ z = 1
(a, b, 0) Class of Distributions (Section 11)
For each of these three frequency distributions: f(x+1) / f(x) = a + {b / (x+1)}, x = 0, 1, ...
where a and b depend on the parameters of the distribution:
Poisson 0 λ e−λ
Distribution Mean Variance Variance Over Mean

Binomial mq mq(1-q) 1-q < 1 Variance < Mean
Poisson λ λ 1 Variance = Mean
Negative Binomial rβ rβ(1+β) 1+β > 1 Variance > Mean
Distribution Thinning by factor of t Adding n independent, identical copies

Binomial q → tq m → nm
Poisson λ → tλ λ → nλ
Negative Binomial β → tβ r → nr

X Y X+Y
Binomial(q, m1 ) Binomial(q, m2 ) Binomial(q, m1 + m2 )
Poisson(λ 1) Poisson(λ 2) Poisson(λ 1 + λ2)
Negative Binomial(β, r1 ) Negative Bin.(β, r2 ) Negative Bin.(β, r1 + r2 )
Accident Profiles (Section 12)
For the Binomial, Poisson and Negative Binomial Distributions:

(x+1) f(x+1) / f(x) = a(x + 1) + b, where a and b depend on the parameters of the distribution.
a < 0 for the Binomial, a = 0 for the Poisson, and a > 0 for the Negative Binomial Distribution.
Thus if data is drawn from one of these three distributions, then we expect (x+1) f(x+1) / f(x) for this
data to be approximately linear with slope a; the sign of the slope, and thus the sign of a,
distinguishes between these three distributions of the (a, b, 0) class.
Zero-Truncated Distributions (Section 13)
f(k)
In general if f is a distribution on 0, 1, 2, 3,..., then pT
k = is a distribution on 1, 2, 3, ....
1 - f(0)
Distribution Density of the Zero-Truncated Distribution
m! qx (1- q)m - x
x! (m - x)!
Binomial x = 1, 2, 3,... , m
1 - (1- q)m
e- λ λx / x!
Poisson x = 1, 2, 3,...
1 - e- λ
r(r +1)...(r + x - 1) βx
x! (1+ β)x + r
Negative Binomial x = 1, 2, 3,...
1 - 1/ (1+ β)r
The moments of a zero-truncated distribution are given in terms of those of the corresponding
Ef[Xn ]
untruncated distribution, f, by: ETruncated[Xn ] = .
1 - f(0)
⎛ β ⎞x
⎜ ⎟
⎝ 1+ β ⎠
The Logarithmic Distribution has support on the positive integers: f(x) = .
x ln(1+β)
The (a,b,1) class of frequency distributions is a generalization of the (a,b,0) class.

density at x + 1 b
As with the (a,b,0) class, the recursion formula applies: =a+ .
density at x x + 1
However, it need only apply now for x ≥ 1, rather than x ≥ 0.
Members of the (a,b,1) family include: all the members of the (a,b,0) family, the zero-truncated
versions of those distributions: Zero-Truncated Binomial, Zero-Truncated Poisson,
Extended Truncated Negative Binomial, and the Logarithmic Distribution.
In addition the (a,b,1) class includes the zero-modified distributions corresponding to these.
Zero-Modified Distributions (Section 14)
If f is a distribution on 0, 1, 2, 3,..., and 0 < pM

0 < 1,
1 - pM
then probability at zero is pM M
0 , pk = f(k) 0 , k = 1, 2, 3,... is a distribution on 0, 1, 2, 3, ...
1 - f(0)
The moments of a zero-modified distribution are given in terms of those of f by:

Ef [Xn]
EModified[Xn ] = (1 - pM
0 ) = (1 - pM
0)E
Truncated[Xn ].
1 - f(0)
Compound Frequency Distributions (Section 15)
p.g.f. of compound distribution = p.g.f. of primary dist.[p.g.f. of secondary dist.]

P(z) = P1 [P2 (z)].
compound density at 0 = p.g.f. of the primary at the density at 0 of the secondary.
Moments of Compound Distributions (Section 16)
Mean of Compound Dist. = (Mean of Primary Dist.)(Mean of Sec. Dist.)

Variance of Compound Dist. = (Mean of Primary Dist.)(Var. of Sec. Dist.)
+ (Mean of Secondary Dist.)2 (Variance of Primary Dist.
In the case of a Poisson primary distribution with mean λ, the variance of the compound distribution
could be rewritten as: λ(2nd moment of Second. Dist.).
The third central moment of a compound Poisson distribution = λ(3rd moment of Sec. Dist.).
Mixed Frequency Distributions (Section 17)
The nth moment of a mixed distribution is the mixture of the nth moments.
First one mixes the moments, and then computes the variance of the mixture from its first and
second moments.
The Probability Generating Function of the mixed distribution, is the mixture of the probability
generating functions for specific values of the parameter.
For a mixture of Poissons, the variance is always greater than the mean.
Gamma Function (Section 18)
The (complete) Gamma Function is defined as:

∞
∞
Γ(α) = ∫ tα - 1 e - t dt = θ−α
0
∫0 tα - 1 e- t / θ dt , for α ≥ 0 , θ ≥ 0.
Γ(α) = (α -1)! Γ(α) = (α-1)Γ(α-1)
∞
∫ tα - 1 e - t / θ dt = Γ(α) θα.
0
The Incomplete Gamma Function is defined as:
x
Γ(α ; x) = ∫ tα - 1 e- t dt / Γ(α).
0
Gamma-Poisson Frequency Process (Section 19)
If one mixes Poissons via a Gamma, then the mixed distribution is in the form of the
Negative Binomial distribution with r = α and β = θ.
If one mixes Poissons via a Gamma Distribution with parameters α and θ, then over a period of
length Y, the mixed distribution is Negative Binomial with r = α and β = Yθ.
For the Gamma-Poisson, the variance of the mixed Negative Binomial is equal to:
mean of the Gamma + variance of the Gamma.
Var[X] = E[Var[X | λ]] + Var[E[X | λ]]. Mixing increases the variance.
Tails of Frequency Distributions (Section 20)
From lightest to heaviest tailed, the frequency distribution in the (a,b,0) class are:
Binomial, Poisson, Negative Binomial r > 1, Geometric, Negative Binomial r < 1.
Mahlerʼs Guide to
Loss Distributions
Exam C
prepared by
Study Aid 2016-C-2
Howard Mahler
hmahler@mac.com
2016-C-2, Loss Distributions, HCM 10/21/15, Page 1
Mahlerʼs Guide to Loss Distributions
The Loss Distributions concepts in Loss Models, by Klugman, Panjer, and WiIlmot,
are demonstrated.1
Information in bold or sections whose title is in bold are more important for passing the exam. Larger
bold type indicates it is extremely important. Information presented in italics (and sections whose
titles are in italics) should not be needed to directly answer exam questions and should be skipped
on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be
useful in practical applications.

Recommended problems are underlined.
Note that problems include both some written by me and some from past exams.2 The latter are
copyright by the Casualty Actuarial Society and SOA and are reproduced here solely to aid
students in studying for exams. The solutions and comments are solely the responsibility of the
author; the CAS and SOA bear no responsibility for their accuracy. While some of the comments
may seem critical of certain questions, this is intended solely to aid you in studying and in no way is
intended as a criticism of the many volunteers who work extremely long and hard to produce quality
exams.
Greek letters used in Loss Models:
α = alpha, β = beta, γ = gamma, θ = theta, λ = lambda, µ = mu, σ = sigma, τ = tau

β = beta, used for the Beta and incomplete Beta functions.
Γ = Gamma, used for the Gamma and incomplete Gamma functions.
Φ = Phi, used for the Normal distribution. φ = phi, used for the Normal density function.
Π = Pi is used for the continued product just as

Σ = Sigma is used for the continued sum
1
In some cases the material covered is preliminary to the current Syllabus; you will be assumed to know it in order to
answer exam questions, but it will not be specifically tested.
2
In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus.
Loss Distributions as per Loss Models
Distribution Distribution Probability Density

Name Function Function
Exponential 1 - e-x/θ e-x/θ / θ
⎛ θ⎞ α α θα
Single Parameter Pareto 1 - ⎜ ⎟ , x > θ. ,x>θ
⎝ x⎠ xα + 1
⎛ x ⎞τ ⎡ ⎛ x ⎞ τ⎤
τ ⎜ ⎟ exp ⎢-⎜ ⎟ ⎥
⎡ ⎛ x ⎞ τ⎤ ⎝ θ⎠ ⎣ ⎝ θ⎠ ⎦
Weibull 1 - exp⎢-⎜ ⎟ ⎥
⎣ ⎝ θ⎠ ⎦ x
(x / θ)α e- x/ θ x α -1 e - x / θ
Gamma Γ[α ; x/θ] =
x Γ(α) θ α Γ(α)
( ln(x)
- µ)2
LogNormal
⎡ ln(x) − µ ⎤
Φ⎢
exp -[ 2σ2 ]
⎣ σ ⎥⎦ x σ 2π
⎛ θ ⎞α α θα
Pareto 1 - ⎜ ⎟
⎝ θ + x⎠ (θ + x ) α + 1
⎛x ⎞2
θ⎜ - 1⎟
⎛x ⎞ ⎛x ⎞ exp -[ ⎝µ ⎠
]
[
Inverse Gaussian Φ ⎜ − 1⎟
⎝µ ⎠
θ
x
] + e2θ / µ [
Φ − ⎜ + 1⎟
⎝µ ⎠
θ
x
] θ
2π
2x
x1 .5
θα e - θ / x
Inverse Gamma 1 - Γ[α ; θ/x]
xα + 1 Γ[α]
Moments of Loss Distributions as per Loss Models
Distribution
Name Mean Variance Moments
Exponential θ θ2 n! θn
αθ α θ2 α θn
Single Parameter Pareto , α> n
α −1 (α − 1)2 (α − 2) α−n
Weibull θ Γ[1 + 1/τ] θ2 {Γ([1 + 2/τ] − Γ[1 + 1/τ]2 } θn Γ[1 + n/τ]
Gamma αθ αθ2
n−1
Γ[α + n]
θn ∏(α + i) = θn (α)...(α + n -1) = θn Γ[α]
i=0
LogNormal exp[µ + σ2/2] exp[2µ + σ2] (exp[σ2] - 1) exp[nµ + n2 σ2/2]
θ α θ2 n! θn n! θ n
Pareto = , α> n
α −1 (α − 1)2 (α − 2) n
(α − 1)...(α − n)
∏ (α − i)
i=1
2θ n
Inverse Gaussian µ µ3/θ eθ/µ µ Kn - 1/2 (θ/µ)
µπ
θ θ2 θn θn
Inverse Gamma = ,α>n
α −1 (α − 1)2 (α − 2) n
(α − 1)...(α − n)
∏ (α − i)
i=1

A 1 9-10 Ungrouped Data
2 11-27 Statistics of Ungrouped Data
3 28-47 Coefficient of Variation, Skewness, and Kurtosis
4 48-57 Empirical Distribution Function
5 58-62 Limited Losses
B 6 63-64 Losses Eliminated
7 67-72 Excess Losses
8 73-79 Excess Loss Variable
9 80-87 Layers of Loss
10 88-101 Average Size of Losses in an Interval
11 102 Grouped Data
C 12 103-116 Working with Grouped Data
13 117-129 Uniform Distribution
14 130-143 Statistics of Grouped Data
15 144-161 Policy Provisions
16 162-174 Truncated Data
17 175-184 Censored Data
D 18 185-217 Average Sizes
19 218-223 Percentiles
20 224-228 Definitions
21 229-233 Parameters of Distributions
22 234-254 Exponential Distribution
E 23 255-270 Single Parameter Pareto Distribution
24 271-334 Common Two Parameter Distributions
25 335-351 Other Two Parameter Distributions
26 352-365 Three Parameter Distributions
F 27 366-379 Beta Function and Distribution
28 380-389 Transformed Beta Distribution
29 390-402 Producing Additional Distributions
G 30 403-429 Tails of Loss Distributions
31 430-494 Limited Expected Values
32 495-536 Limited Higher Moments
H 33 537-568 Mean Excess Loss
34 569-597 Hazard Rate
35 598-619 Loss Elimination Ratios and Excess Ratios
I 36 620-713 The Effects of Inflation
37 714-803 Lee Diagrams
38 804-862 N-Point Mixtures of Models
39 863-904 Continuous Mixtures of Models
40 905-929 Spliced Models
41 930-936 Extreme Value Distributions
42 937-940 Relationship to Life Contingencies
43 941-960 Gini Coefficient
44 961-980 Important Ideas & Formulas
Exam 3/M Questions by Section of this Study Aid
Section Sample 5/00 11/00 5/01 11/01 11/02
1
2
3
4
5
6
7
8 35
9
10
11
12
13 33
14 31
15
16
17
18 5
19
20
21
22
23 37
24 8 24
25
26
27
28
29
30
31 25 27 37
32 21
33
34
35 17
36 18 30 41-42 6
37
38 10 28
39 17 28
40
Sec. CAS 3 SOA 3 CAS 3 CAS 3 SOA 3 CAS 3 SOA M
11/03 11/03 5/04 11/04 11/04 5/05 5/05
1
2
3
4
5
6
7
8
9
10
11
12 26
13 5 39
14
15 35 32
16
17
18 22
19
20
21
22 17 34 20
23
24 21 25 35
25
26
27
28
29
30 16
31 21 3 7
32 28
33 24 4 9
34 19 7 27 30
35 29
36 17, 29, 34 33 18
37 20 23 33 30
38 18 28 29 34
39 10
40
Starting in 11/03, the CAS and SOA gave separate exams.

The SOA did not release its 5/04 exam.
Sec. CAS 3 SOA M CAS 3 CAS 3 SOA M
11/05 11/05 5/06 11/06 11/06
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 26 6
19
20
21
22 20
23
24 8 25, 36
25
26
27
28
29 19 27
30
31 22 14 37 20, 31
32
33 10 38
34 11 13 10, 11, 16
35 29
36 21, 33 28 26, 39 30
37 28
38 32 32 20
39 17, 20
40 35 18
The SOA did not release its 5/06 exam.

Course 4 Exam Questions by Section of this Study Aid3
Section Sample 5/00 11/00 5/01 11/01 11/02 11/03 11/04 5/05 11/05 11/06 5/07
1
2
3 3 3
4
5 36
6
7
8
9
10
11
12 7
13
14 2 7
15
16
17
18 6 18
19
20
21
22 26
23
24 39
25
26
27
28
29
30
31
32 37 13
33
34
35
36
37
38 13
39
40
The CAS/SOA did not release the 5/02, 5/03, 5/04, 5/06, 11/07 and subsequent exams.
3
Questions on more advanced ideas are in “Mahlerʼs Guide to Fitting Loss Distributions”.
2016-C-2, Loss Distributions, §1 Ungrouped Data HCM 10/21/15, Page 9
Section 1, Ungrouped Data
There are 130 losses of sizes:
300 37,300 86,600 150,300 423,200

400 39,500 88,600 171,800 437,900
2,800 39,900 91,700 173,200 442,700
4,500 41,200 96,600 177,700 457,800
4,900 42,800 96,900 183,000 463,000
5,000 45,900 106,800 183,300 469,300
7,700 49,200 107,800 190,100 469,600
9,600 54,600 111,900 209,400 544,300
10,400 56,700 113,000 212,900 552,700
10,600 57,200 113,200 225,100 566,700
11,200 57,500 115,000 226,600 571,800
11,400 59,100 117,100 233,200 596,500
12,200 60,800 119,300 234,200 737,700
12,900 62,500 122,000 244,900 766,100
13,400 63,600 123,100 253,400 846,100
14,100 66,400 126,600 261,300 852,700
15,500 66,900 127,300 261,800 920,300
19,300 68,100 127,600 273,300 981,100
19,400 68,900 127,900 276,200 988,300
22,100 71,100 128,000 284,300 1,078,800
24,800 72,100 131,300 316,300 1,117,600
29,600 79,900 132,900 322,600 1,546,800
32,200 80,700 134,300 343,400 2,211,000
32,500 83,200 134,700 350,700 2,229,700
33,700 84,500 135,800 395,800 3,961,000
34,300 84,600 146,100 406,900 4,802,200
Each individual value is shown, rather than the data being grouped into intervals.
The type of data shown here is called individual or ungrouped data.
Some students will find it helpful to put this data set on a computer and follow along with the
computations in the study guide to the best of their ability.4 The best way to learn is by doing.
4
Even this data set is far bigger than would be presented on an exam. In many actual applications, there are many
thousands of claims, but such a large data set is very difficult to present in a Study Aid. It is important to realize that
with modern computers, actuaries routinely deal with such large data sets. There are other situations where all that is
available is a small data set such as presented here.
2016-C-2, Loss Distributions, §1 Ungrouped Data HCM 10/21/15, Page 10
This ungrouped data set is used in many examples throughout this study guide:
300, 400, 2800, 4500, 4900, 5000, 7700, 9600, 10400, 10600, 11200, 11400, 12200, 12900,
13400, 14100, 15500, 19300, 19400, 22100, 24800, 29600, 32200, 32500, 33700, 34300,
37300, 39500, 39900, 41200, 42800, 45900, 49200, 54600, 56700, 57200, 57500, 59100,
60800, 62500, 63600, 66400, 66900, 68100, 68900, 71100, 72100, 79900, 80700, 83200,
84500, 84600, 86600, 88600, 91700, 96600, 96900, 106800, 107800, 111900, 113000,
113200, 115000, 117100, 119300, 122000, 123100, 126600, 127300, 127600, 127900,
128000, 131300, 132900, 134300, 134700, 135800, 146100, 150300, 171800, 173200,
177700, 183000, 183300, 190100, 209400, 212900, 225100, 226600, 233200, 234200,
244900, 253400, 261300, 261800, 273300, 276200, 284300, 316300, 322600, 343400,
350700, 395800, 406900, 423200, 437900, 442700, 457800, 463000, 469300, 469600,
544300, 552700, 566700, 571800, 596500, 737700, 766100, 846100, 852700, 920300,
981100, 988300, 1078800, 1117600, 1546800, 2211000, 2229700, 3961000, 4802200
2016-C-2, Loss Distributions, §2 Stats Ungrouped Data HCM 10/21/15, Page 11
Section 2, Statistics of Ungrouped Data
For the ungrouped data in Section 1:
Average of X = E[X] = 1st moment = Σ xi /n = 40647700/130 ≅ 312,674.6.
Average of X2 = E[X2 ] = 2nd moment about the origin = Σ xi2 /n = 4.9284598 x 1011.
(empirical) Mean = X = 312,674.6.

(empirical) Variance = E[X2 ] - E[X]2 = 3.9508 x 1011.
(empirical) Standard Deviation = Square Root of Variance = 6.286 x 105 .
Mean:
The mean is the average or expected value of the random variable.

Mean of the variable X = E[X].
Empirical Mean of a sample of random draws from X = X .
The mean of the data allows you to set the scale for the fitted distribution.
In general means add: E[X + Y] = E[X] + E[Y].
Also multiplying a variable by a constant multiplies the mean by the same constant;
E[kX] = kE[X]. The mean is a linear operator, E[aX + bY] = aE[X] + bE[Y].
Mode:
The mean differs from the mode which represents the value most likely to occur. For a continuous
distribution function the mode is the point at which the density function reaches its
maximum. For the empirical data in Section 1 there is no clear mode5. For discrete distributions, for
example frequency distributions, the mode has the same definition but is easier to pick out. If one
multiplies all the claims by a constant, the mode is multiplied by that same constant.
Median:
The median is that value such that half of the claims are on either side. At the median the
distribution function is 0.5. The median is the 50th percentile. For a discrete loss distribution, one
may linearly interpolate in order to estimate the median. If one multiplies all the claims by a constant,
the median is multiplied by that same constant.
5
One would expect a curve fit to this data to have a mode much smaller than the mean.
The sample median for the data in Section 1 is about $121 thousand.6 This is much less than the
sample mean of about $313 thousand. While the mean can be affected greatly by a few large
claims, the median is affected equally by each claim size.
For a continuous7 distribution with positive skewness typically: mean > median > mode (alphabetical
order.) The situation is reversed for negative skewness. Also usually the median is closer to the
mean than to the mode (just as it is in the dictionary.)8
Variance:
The variance is the expected value of the squared difference of the variable and its mean.
The variance is the second central moment.
Var[X] = E[(X - E[X])2 ] = E[X2 ] - E[X]2 =

second moment minus the square of the first moment.
For the Ungrouped Data in Section 1, we calculate the empirical variance as:
(1/N)Σ(Xi - X )2 = E[X2 ] - E[X]2 = 4.92845 x 1011 - 312674.62 = 3.9508 x 1011.
Thus if X is in dollars, then Var[X] is in dollars squared.

Multiplying a variable by a constant multiplies the variance by the square of that constant;
Var[kX] = k2 Var[X]. In particular, Var[-X] = Var[X].
Exercise: Var[X] = 6. What is Var[3X]?

[Solution: Var[3X] = 32 Var[X] = (9)(6) = 54.]
For independent random variables the variances add.9

If X and Y are independent, then Var [X + Y] = Var [X] + Var [Y].
Also If X and Y are independent, then Var[aX + bY] = a2 Var[X] + b2 Var[Y].
In particular, Var [X - Y] = Var[X] + Var[Y], for X and Y independent.
6
The 65th out 130 claims is $119,300 and the 66th claim is $122,000. As discussed in “Mahlerʼs Guide to Fitting
Loss Distributions”, a point estimate for the median would be at the (.5)(1+130) = 65.5th claim. So one would linearly
interpolate half way between the 65th and 66th claim to get a point estimate of the median of:
(.5)(119300) +(.5) (122000) = $120,650.
7
For frequency distributions the relationship may be different due to the fact that only certain discrete values can
appear as the mode or median.
8
See page 49 of Kendallʼs Advanced Theory of Statistics, Volume 1 (1994) by Stuart & Ord.
9
In general Var[X+Y] = Var[X] + Var[Y] + 2Cov[X,Y], where Cov[X,Y] = E[XY] - E[X]E[Y] = covariance of X and Y.
For X and Y independent E[XY] = E[X]E[Y] and Cov[X,Y] = 0.
Exercise: Var[X] = 6. Var[Y] = 8. X and Y are independent. What is Var[3X + 10Y]?

[Solution: Var[3X + 10Y] = 9Var[X] + 100Var[Y] = 54 + 800 = 854.]
Note that if X and Y are independent and identically distributed, then Var[X1 + X2 ] =
2 Var [X]. Adding up n such variables gives a variable with variance = nVar[X].
Exercise: Var[X] = 6. What is Var[X1 + X2 + X3 ]?

[Solution: Var[X1 + X2 + X3 ] = Var[X] + Var[X] + Var[X] = 3Var[X] = (3)(6) = 18.]
Averaging consists of summing n random draws, and then dividing by n.

Averaging n such variables gives a variable with variance:
Var[(1/n)ΣXi] = Var[ΣXi] / n2 = n Var[X] / n2 = Var[X] / n.
Thus the sample mean has a variance that is inversely proportional to the number of
points. Therefore, the sample mean has a standard deviation that is inversely proportional to the
square root of the number of points.
Exercise: Var[X] = 6.
What is the variance of the average of 100 independent random draws from X?
[Solution: Var[X] / n = 6/100 = .06.]
While variances add for independent variables, more generally:
Var[X + Y] = Var[X] + Var[Y] + 2Cov[X,Y].
Exercise: Var[X] = 6, Var[Y] = 8, and Cov[X, Y] = 5. What is Var[X + Y]?

[Solution: Var[X + Y] = Var[X] + Var[Y] + 2Cov[X,Y] = 6 + 8 + (2)(5) = 24.]
Covariances and Correlations:
The Covariance of two variables X and Y is defined by:

Cov[X, Y] = E[XY] - E[X]E[Y].
Exercise: E[X] = 3, E[Y] = 5, and E[XY] = 25. What is the covariance of X and Y?
[Solution: Cov[X,Y] = E[XY] - E[X]E[Y] = 25 - (3)(5) = 10.]
Since Cov[X,X] = E[X2 ] - E[X]E[X] = Var[X], the covariance is a generalization of the variance.
Covariances have the following useful properties:

Cov[X, aY] = aCov[X, Y].
Cov[X, Y] = Cov[Y, X].
Cov[X, Y + Z] = Cov[X, Y] + Cov[X, Z].
The Correlation of two random variables is defined in terms of their covariances:

Cov[X ,Y]
Corr[X, Y] = .
Var[X] Var[Y]
Exercise: Var[X] = 6, Var[Y] = 8, and Cov[X, Y] = 5. What is Corr[X, Y]?

Cov[X ,Y] 5
[Solution: Corr[X,Y] = = = 0.72.]
Var[X] Var[Y] (6) (8)
The correlation is always between -1 and +1.

Corr[X, Y] = Corr[Y, X]
Corr[X, X] = 1
Corr[X, -X] = -1
⎧Corr[X, Y] if a > 0
⎪
Corr[X, aY] = ⎨ 0 if a = 0
⎪-Corr[X, Y] if a < 0
⎩
Corr[X, aX] = 1 if a > 0
Two variables that are proportional with a positive proportionality constant are perfectly correlated
and have a correlation of one. Closely related variables, such as height and weight, have a
correlation close to but less than one. Unrelated variables have a correlation near zero. Inversely
related variables, such as the average temperature and use of heating oil, are negatively correlated.
Standard Deviation:
The standard deviation is the square root of the variance.

If X is in dollars, then the standard deviation of X is also in dollars.
STDDEV[kX] = kSTDDEV[X].
Exercise: Var[X] = 16. Var[Y] = 9. X and Y are independent.

What is the standard deviation of X + Y?
[Solution: Var[X + Y] = 16 + 9 = 25. StdDev[X + Y] = 25 = 5.
Comment: Standard deviations do not add. In the exercise, 4 + 3 ≠ 5.]
Exercise: Var[X] = 16.

What is the standard deviation of the average of 100 independent random draws from X?
[Solution: variance of the average = Var[X] / n = 16/100 = .16.
standard deviation of the average = 0.16 = 0.4.
Alternately, StdDev[X] = 16 = 4.
standard deviation of the average = StdDev[X] / n = 4/10 = 0.4.]
Sample Variance:
Sample Mean = ΣXi / N = X .
∑ (Xi - X )2
Note that the variance as calculated above, , is a biased estimator of the variance of the
N
distribution from which this data set was drawn.
The sample variance is an unbiased estimator of the variance of the distribution from
which a data set was drawn:10
∑ (Xi - X )2
Sample variance ≡ .
N - 1
For the Ungrouped Data in Section 1, we calculate the sample variance as:
∑ (Xi - X )2
= 3.9814 x 1011.
N - 1
For 130 data points, the sample variance is the empirical variance multiplied by:
N / (N - 1) = 130/129.
10
Bias is discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
Problems:

Prob[ X = 1] = 70%, Prob[X = 5] = 20%, Prob[X =10] = 10%.
2.1 (1 point) What is the mean of X?

A. less than 3.0
E. at least 4.5
2.2 (1 point) What is the variance of X?

A. less than 6
E. at least 9
Use the following data set for the next two questions:
4, 7, 13, 20.

A. 8 B. 9 C. 10 D. 11 E. 12
2.4 (1 point) What is the sample variance?

A. 30 B. 35 C. 40 D. 45 E. 50
2.5 (1 point) You are given the following:

• Let X be a random variable X.
• Y is defined to be X/2.
Determine the correlation coefficient of X and Y.
A. 0.00 B. 0.25 C. 0.50 D. 0.75 E. 1.00
2.6 (2 points) X and Y are two independent variables.

E[X] = 3. Var[X] = 5. E[Y] = 6. Var[Y] = 2.
Let Z = XY. Determine the standard deviation of Z.
A. 14 B. 16 C. 18 D. 20 E. 22
2.7 (1 point) Let X1 , X2 , ..., X20, be 20 independent, identically distributed

random variables, each of which has variance of 17. If one estimates the mean by the average of
the 20 observed values, what is the variance of this estimate?
A. less than 0.6
E. at least 0.9
2.8 (1 point) Let X and Y be independent random variables.

1. If Z is the sum of X and Y, the variance of Z is the sum of the variance of X and the
variance of Y.
2. If Z is the difference between X and Y, the variance of Z is the difference between the
variance of X and the variance of Y.
3. If Z is the product of X and Y, then the expected value of Z is the product of the
expected values of X and Y.
A. None of 1, 2, or 3 B. 1 C. 2 D. 3 E. None of A ,B, C, or D

Height of Husband (inches): 66 68 69 71 73
Height of Wife (inches): 64 63 67 65 69
2.9 (2 points) What is the covariance of the heights of husbands and wives?
A. 2 B. 3 C. 4 D. 5 E. 6
2.10 (2 points) What is the correlation of the heights of husbands and wives?
A. 70% B. 75% C. 80% D. 85% E. 90%
2.11 (1 point) f(x) = (x - 10) / 200, 10 < x < 30. Determine the mean.
2.12 (1 point) Let x and y be two independent random draws from a continuous distribution.
Demonstrate that the mean squared difference between x and y is twice the variance of the
distribution.
2.13 (2, 5/83, Q.24) (1.5 points) Let the random variable X have the density function
f(x) = kx for 0 < x < 2 / k . If the mode of this distribution is at x = 2 /4,
then what is the median of this distribution?
A. 2 /6 B. 1/4 C. 2 /4 D. 2 /24 E. 1/2
2.14 (2, 5/83, Q.30) (1.5 point) Below are shown the probability density functions of two
symmetric bounded distributions with the same median.
Which of the following statements about the means and standard deviations of the two distributions
are true?
A. µII > µI and σII = σI B. µII > µI and σII > σI C. µI = µII and σII < σI
D. µI = µII and σI < σII E. Cannot be determined from the given information
2.15 (2, 5/83, Q.49) (1.5 point) Let X and Y be random variables with Var(X) = 4,
Var(Y) = 9, and Var(X - Y) = 16. What is Cov(X, Y)?
A. -3/2 B. -1/2 C. 1/2 D. 3/2 E. 13/16
2.16 (2, 5/85, Q.5) (1.5 points)

Let X and Y be random variables with variances 2 and 3, respectively, and covariance -1.
Which of the following random variables has the smallest variance?
A. 2X + Y B. 2X - Y C. 3X - Y D. 4X E. 3Y
2.17 (2, 5/85, Q.11) (1.5 points) Let X and Y be independent random variables, each with
density f(t) = 1/(2θ) for -θ < t < θ. If Var(XY) = 64/9, then what is θ?
A. 1 B. 2 C. 4 3 / 3 D. 2 2 E. 8 3 / 3
2.18 (2, 5/85, Q.47) (1.5 points) Let X be a random variable with finite variance.
If Y = 15 - X, then determine Corr[X, (X + Y)X].
A. -1 B. 0 C. 1/15 D. 1
E. Cannot be determined from the information given.
2.19 (4, 5/86, Q.31) (1 point) Which of the following statements are true about the distribution of a
random variable X?
1. If X is discrete, the value of X which occurs most frequently is the mode.
2. If X is continuous, the expected value of X is equal to the mode of X.
3. The median of X is the value of X which divides the distribution in half.
A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
2.20 (4, 5/86, Q.32) (1 point) Let X, Y and Z be random variables.

1. The variance of X is the second moment about the origin of X.
2. If Z is the product of X and Y, then the expected value of Z is the product of the expected
values of X and Y.
3. The expected value of X is equal to the expectation over all possible values of Y, of the
conditional expectation of X given Y.
A. 2 B. 3 C. 1, 2 D. 1, 3 E. 2, 3
2.21 (2, 5/88, Q.43) (1.5 points) X, Y, and Z have means 1, 2, and 3, respectively, and variances
4, 5, and 9, respectively. The covariance of X and Y is 2, the covariance of X and Z is 3, and the
covariance of Y and Z is 1.
What are the mean and variance, respectively, of the random variable 3X + 2Y - Z?
A. 4 and 31 B. 4 and 65 C. 4 and 67 D. 14 and 13 E. 14 and 65
2.22 (4, 5/89, Q.26) (1 point) If the random variables X and Y are not independent, which of the
following equations will still be true?
1. E(X + Y) = E(X) + E(Y)
2. E(XY) = E(X) E(Y)
3. Var (X + Y) = Var (X) + Var (Y)
A. 1 B. 2 C. 1, 2 D. 1, 3 E. None of the above
2.23 (2, 5/90, Q.18) (1.7 points) Let X be a continuous random variable with density function
f(x) = x(4 - x)/9 for 0 < x < 3. What is the mode of X?
A. 4/9 B. 1 C. 1/2 D. 7/4 E. 2
2.24 (2, 5/92, Q.2) (1.7 points) Let X be a random variable such that E(X) = 2, E(X3 ) = 9, and
E[(X - 2)3 ] = 0. What is Var(X)?
A. 1/6 B. 13/6 C. 25/6 D. 49/6 E. 17/2
2.25 (4B, 5/93, Q.9) (1 point) If X and Y are independent random variables, which of the following
statements are true?
1. Var[X + Y] = Var[X] + Var[Y]
2. Var[X - Y] = Var[X] + Var[Y]
3. Var[aX + bY] = a2 E[X2 ] - a(E[X])2 + b2 E[Y2 ] - b(E[Y])2
A. 1 B. 1,2 C. 1,3 D. 2,3 E. 1,2,3
2.26 (4B, 5/94, Q.5) (2 points) Two honest, six-sided dice are rolled, and the results D1 and D2
are observed. Let S = D1 + D2 . Which of the following are true concerning the conditional
distribution of D1 given that S<6?
1. The mean is less than the median.
2. The mode is less than the mean.
3. The probability that D1 = 2 is 1/3.
A. 2 B. 3 C. 1, 2 D. 2, 3 E. None of A, B, C, or D
2.27 (Course 160 Sample Exam #3, 1994, Q.1) (1.9 points) You are given:
(i) T is the failure time random variable.
(ii) f(t) = {(10 - t)/10}9 for 0 < t ≤ 10.
Calculate the ratio of the mean of T to the median of T.
(A) 0.67 (B) 0.74 (C) 1.00 (D) 1.36 (E) 1.49
2.28 (2, 2/96, Q.25) (1.7 points) The sum of the sample mean and median of ten distinct data
points is equal to 20. The largest data point is equal to 15. Calculate the sum of the sample mean
and median if the largest data point were replaced by 25.
A. 20 B. 21 C. 22 D. 30 E. 31
2.29 (Course 1 Sample Exam, Q.9) (1.9 points)

The distribution of loss due to fire damage to a warehouse is:
Amount of Loss Probability
0 0.900
500 0.060
1,000 0.030
10,000 0.008
50,000 0.001
100,000 0.001
Given that a loss is greater than zero, calculate the expected amount of the loss.
A. 290 B. 322 C. 1,704 D. 2,900 E. 32,222
2.30 (1, 5/00, Q.8) (1.9 points) A probability distribution of the claim sizes for an auto insurance
policy is given in the table below:
Claim Size Probability
20 0.15
30 0.10
40 0.05
50 0.20
60 0.10
70 0.10
80 0.30
What percentage of the claims are within one standard deviation of the mean claim size?
(A) 45% (B) 55% (C) 68% (D) 85% (E) 100%
2.31 (IOA 101, 9/00, Q.2) (3 points) Consider a random sample of 47 white-collar workers and a
random sample of 24 blue-collar workers from the workforce of a large company. The mean salary
for the sample of white-collar workers is 28,470 and the standard deviation is 4,270; whereas the
mean salary for the sample of blue-collar workers is 21,420 and the standard deviation is 3,020.
Calculate the mean and the standard deviation of the salaries in the combined sample of 71
employees.
2.32 (1, 11/00, Q.1) (1.9 points) A recent study indicates that the annual cost of maintaining and
repairing a car in a town in Ontario averages 200 with a variance of 260.
If a tax of 20% is introduced on all items associated with the maintenance and repair of
cars (i.e., everything is made 20% more expensive), what will be the variance of the
annual cost of maintaining and repairing a car?
(A) 208 (B) 260 (C) 270 (D) 312 (E) 374
2.33 (1, 11/00, Q.38) (1.9 points) The profit for a new product is given by Z = 3X - Y - 5.
X and Y are independent random variables with Var(X) = 1 and Var(Y) = 2.
What is the variance of Z?
(A) 1 (B) 5 (C) 7 (D) 11 (E) 16
2.34 (1, 11/01, Q.7) (1.9 points) Let X denote the size of a surgical claim and let Y denote the size
of the associated hospital claim. An actuary is using a model in which E(X) = 5,
E(X2 ) = 27.4, E(Y) = 7, E(Y2 ) = 51.4, and Var(X+Y) = 8.
Let C1 = X+Y denote the size of the combined claims before the application of a 20%
surcharge on the hospital portion of the claim, and let C2 denote the size of the combined
claims after the application of that surcharge. Calculate Cov(C1 , C2 ).
(A) 8.80 (B) 9.60 (C) 9.76 (D) 11.52 (E) 12.32
2.35 (1, 5/03, Q.15) (2.5 points) An insurance policy pays a total medical benefit consisting of two
parts for each claim. Let X represent the part of the benefit that is paid to the surgeon, and let Y
represent the part that is paid to the hospital. The variance of X is 5000, the variance of Y is 10,000,
and the variance of the total benefit, X + Y, is 17,000.
Due to increasing medical costs, the company that issues the policy decides to increase
X by a flat amount of 100 per claim and to increase Y by 10% per claim.
Calculate the variance of the total benefit after these revisions have been made.
(A) 18,200 (B) 18,800 (C) 19,300 (D) 19,520 (E) 20,670
2.1. A. E[X] = (70%)(1) + (20%)(5) + (10%)(10) = 2.7.
2.2. D. E[X2 ] = (70%)(12 ) + (20%)(52 ) + (10%)(102 ) = 15.7. Var[X] = 15.7 - 2.72 = 8.41.
Alternately, Var[X] = (70%)(1 - 2.7)2 + (20%)(5 - 2.7)2 + (10%)(10 - 2.7)2 = 8.41.
2.3. D. mean = (4 + 7 + 13 + 20)/4 = 11.
2.4. E. sample variance = {(4 - 11)2 + (7 - 11)2 + (13 - 11)2 + (20 - 11)2 }/(4 -1) = 50.
2.5. E. Var[Y] = Var[X/2] = Var[X]/4.

Cov[X, Y] = Cov[X, X/2] =Cov[X, X]/2 = Var[X]/2.
Therefore, Corr[X, Y] = (Var[X]/2) / Var[X] Var[X]/ 4 = 1.
Comments: Two variables that are proportional with a positive proportionality constant are perfectly
correlated and have a correlation of one.
2.6. A. E[Z] = E[X]E[Y] = (3)(6) = 18.

E[Z2 ] = E[X2 ]E[Y2 ] = (5 + 32 )(2 + 62 ) = 532.
Var[Z] = 532 - 182 = 208. StdDev[Z] = 208 = 14.4.
Comment: In general if X and Y are independent, then:
Var[XY] = E[(XY)2 ] - E[XY]2 = E[X2 ]E[Y2 ] - E[X]2 E[Y]2 =
(Var[X] + E[X2 ]) (Var[Y] + E[Y]2 ) - E[X]2 E[Y]2 = E[Y2 ]Var[X] + E[X2 ]Var[Y] + Var[X] Var[Y].
See Lemma 5.1.2 in “Credibility For Treaty Reinsurance Excess Pricing,”
by Gary Patrik and Isaac Mashitz in the 1990 Cas Discussion Paper Program.
20
2.7. D. The estimated mean is: (1/20) ∑ xi . Therefore,
i=1
20
Var(mean) = (1/202 ) ∑ Var[xi] = (20)(1/202) Var(x) = 17/20 = 0.85.
i=1
Comment: Since the xi are independent, Var(x1 +x2 ) = Var(x1 )+Var(x2 ). Since they are identically
distributed Var(x1 )= Var(x2 ). Since Var(aY) = a2 Var(Y), Var(x1 /20) =
(1/202 )Var(x1 ). Note that as the number of observations n increases, the variance of the mean
decreases as 1/n.
2.8. E. 1. True, since X and Y are independent.

2. False. In general VAR[X - Y] = VAR[X] + VAR[Y] - 2COV[X, Y]. When X and Y are
independent, Cov[X, Y] = 0 and therefore, VAR[X - Y] = VAR[X] + VAR[Y].
3. True. In general E[XY] ≠ E[X]E[Y], although this is true if X and Y are independent.
2.9. C. Let H = husbands heights and W = wives heights. E[H] = 69.4. E[W] = 65.6.
E[HW] = {(66)(64) + (68)(63) + (69)(67) + (71)(65) + (73)(69)}/5 = 4556.6.
Cov[H, W] = 4556.6 - (69.4)(65.6) = 3.96.
2.10. B. Var[H] = {(66 -69.4)2 + (68 -69.4)2 + (69 -69.4)2 + (71 -69.4)2 + (73 -69.4)2 }/5 = 5.84.
Var[W] = {(64 -65.6)2 + (63 -65.6)2 + (67 -65.6)2 + (65 -65.6)2 + (69 -65.6)2 }/5 = 4.64.
Corr[H, W] = Cov[H, W]/ Var[H]Var[W] = 3.96/ (5.84)(4.64) } = 0.761.
30 30 30
2.11. E[X] = ∫10 x (x - 10) / 200 dx = (1/200) 10∫ x2 dx - (1/20) 10∫ x dx =
(303 - 103 ) / 600 - (302 - 102 ) / 40 = 23.33.
2.12. E[(X - Y)2 ] = E[X2 - 2XY + Y2 ] = E[X2 ] - 2E[XY] + E[Y2 ] = E[X2 ] - 2 E[X]E[Y] + E[X2 ] =
2E[X2 ] - 2E[X]2 = 2 Var[X].
2.13. B. The mode is where the density is largest, which in this case is at the righthand endpoint of
the support, 2/ k. 2 / k = 2 /4. ⇒ k = 4. ⇒ k = 16.
f(x) = 16x. F(x) = 8x2 . At the median, F(x) = 0.5. ⇒ 8x2 = 0.5. ⇒ x = 1/4.
2.14. D. The means are each equal to the medians, since the distributions are symmetric.
The two medians are equal, therefore so are the means.
The second central moment of distribution II is larger than that of distribution I, since distribution II is
more dispersed around its mean. σII2 > σI2. ⇒ σII > σI.
2.15. A. 16 = Var(X - Y) = Var(X) + Var(Y) - 2Cov(X, Y). Cov(X, Y) = -(16 - 4 - 9)/2 = -3/2.
2.16. A. Var[2X + Y] = (4)(2) + 3 + (2)(2)(-1) = 7.

Var[2X - Y] = (4)(2) + 3 - (2)(2)(-1) = 15. Var[3X - Y] = (9)(2) + 3 - (2)(3)(-1) = 27.
Var[4X] = (16)(2) = 32. Var[3Y] = (9)(3) = 27. 2X + Y has the smallest variance.
2.17. D. E[X] = E[Y] = 0. Var[X] = Var[Y] = (2θ)2 /12 = θ2/3. E[X2 ] = E[Y2 ] = θ2/3.
Var[XY] = E[(XY)2 ] - E[XY]2 = E[X2 ] E[Y2 ] - E[X]2 E[Y]2 = (θ2/3)(θ2/3) - 0 = θ4/9.
θ4/9 = 64/9. ⇒ θ = 8 = 2 2.
2.18. D. X + Y = 15. Corr[X, (X + Y)X] = Corr[X, 15X] = 1.

Comment: Two variables that are proportional with a positive constant have a correlation of 1.
2.19. C. 1. True. 2. False. The expected value of X is the mean. Usually the mean and the mode
are not equal. 3. True.
2.20. B. 1. False. The variance is the second central moment: VAR[X] = E[(X-E[X])2] =
E[X2] - E[X]2. The second moment around the origin is E[X2].

2. False. COVAR[X,Y] = E[XY] - E[X]E[Y], so statement 2 only holds when the covariance of X and
Y is zero. (This is true if X and Y are independent.) 3. True. E[X] = EY[E[X|Y]].
2.21. C. E[3X + 2Y - Z] = 3E[X] + 2E[Y] - E[Z] = (3)(1) + (2)(2) - 3 = 4.

Var[3X + 2Y - Z] = 9Var[X] + 4Var[Y] + Var[Z] + 12Cov[X, Y] - 6Cov[X, Z] - 4Cov[Y, Z] =
(9)(4) + (4)(5) + 9 + (12)(2) - (6)(3) - (4)(1) = 67.
2.22. A. Means always add, so statement 1 is true. E[XY] =E[X]E[Y] + COVAR[X,Y], therefore
E[XY] = E[X]E[Y] if and only if the covariance of X and Y is zero. Thus statement 2 is not true in
general. In general VAR[X+Y] = VAR[X] + VAR[Y] + 2 COVAR[X,Y]. If X and Y are independent
then their covariance is zero, and statement 3 would hold. However, statement 3 is not true in
general.
2.23. E. fʼ(x) = (4 - x)/9 - x/9 = 0. ⇒ x = 2. f(2) = 4/9. Check endpoints: f(0) = 0, f(3) = 1/3.
2.24. A. 0 = E[(X - 2)3 ] = E[X3 ] - 6E[X2 ] + 12E[X] - 8. ⇒ 9 - 6E[X2 ] + (12)(2) - 8 = 0.
⇒ E[X2 ] = 25/6. ⇒ Var(X) = E[X2 ] - E[X]2 = 25/6 - 22 = 1/6.
2.25. B. 1. True. 2. True. 3. For X and Y Independent, Var[aX + bY] = a2 Var[X] + b2 Var[Y] =
a2 E[X2 ] - a2 E[X]2 + b2 E[Y2 ] - b2 E[Y]2 , therefore Statement #3 is False.
2.26. A. When S < 6 we have the following equally likely possibilities:

D2 Conditional Density Function
D1 1 2 3 4 Possibilities of D1 given that S<6
1 x x x x 4 4/10
2 x x x 3 3/10
3 x x 2 2/10
4 x 1 1/10
The mean of the conditional density function of D1 given that S<6 is:
(0.4)(1) + (0.3)(2) + (0.2)(3) + (0.1)(4) = 2.
The median is equal to 2, since the Distribution Function at 2 is 0.7 ≥ 0 .5, but at 1 it is 0.4 < 0.5.
The mode is 1, since that is the value at which the density is a maximum. 1. F, 2. T, 3. F.
10
2.27. D. S(t) = {(10 - t)/10}10. E[T] =
∫0 S(t)dt = 10/11 = .9091.
Set 0.5 = S(t) = {(10 - t)/10}10. ⇒ Median = 10(1 - 0.50.1) = 0.6697.
Mean/Median = 0.9091/0.6697 = 1.357.
2.28. B. The sample median remains the same, while the sample mean is increased by
(25 - 15)/10 = 1. The sum of the sample mean and median is now: 20 + 1 = 21.
2.29. D. {(500)(0.060) + (1,000)(0.030) + (10,000)(0.008) + (50,000)(0.001) +

(100,000)(0.001)}/(.06 + .03 + .008 + .001 + .001) = 29000/.10 = 2900.
2.30. A. mean = (20)(0.15) + (30)(0.10) + (40)(0.05)+ (50)(0.20) + (60)(0.10) + (70)(0.10) +

(80)(0.30) = 55.
second moment = (202 )(0.15) + (302 )(0.10) + (402 )(0.05)+ (502 )(0.20) + (602 )(0.10) +
(702 )(0.10) + (802 )(0.30) = 3500. standard deviation = 3500 - 552 = 21.79.
Prob[within one standard deviation of the mean] = Prob[33.21 ≤ X ≤ 76.79]
= 0.05 + 0.20 + 0.10 + 0.10 = 45%.
2.31. Total is: (47)(28470) + (24)(21420) = 1,852,170.

Overall mean is: 1,852,170/(47 + 24) = 26,087.
The overall second moment is: {(47)(4,2702 + 284702 ) + (24)(3,0202 + 214202 )}/(47 + 24) =
706,800,730.
Overall variance is: 706,800,730 - 26,0872 = 26,269,161.
Overall standard deviation is: 26,269,161 = 5125.
2.32. E. When one multiplies a variable by a constant, in this case 1.2, one multiplies the variable
by that the square of that constant, in this case 1.22 = 1.44. (1.44)(260) = 374.4.
2.33. D. Var[Z] = Var[3X - Y - 5] = 9Var[X] + Var[Y] = (9)(1) + 2 = 11.
2.34. A. Var[X] = 27.4 - 52 = 2.4. Var[Y] = 51.4 - 72 = 2.4.

Cov[X, Y] = (Var[X+Y] - Var[X] - Var[Y])/2 = (8 - 2.4 - 2.4)/2 = 1.6.
Cov[C1 , C2 ] = Cov[X + Y, X + 1.2Y] = Cov[X, X] + Cov[X, 1.2 Y] + Cov[Y, X] + Cov[Y, 1.2 Y]
= Var[X] + 1.2 Cov[X, Y] + Cov[Y, X] + 1.2 Cov[Y, Y]
=Var[X] + 1.2 Var[Y] + 2.2 Cov[X, Y] = 2.4 + (1.2)(2.4) + (2.2)(1.6) = 8.8.
2.35. C. Cov[X, Y] = (Var[X + Y] - Var[X] - Var[Y])/2 = 1000.

Adding a flat amount of 100 does not affect the variance.
Var[X + 1.10Y] = Var[X] + 1.21Var[Y] + 2(1.1)Cov[X, Y] = 5000 + (1.21)(10000) + (2.2)(1000) =
19,300.
2016-C-2, Loss Distributions, §3 CV Skewness Kurtosis HCM 10/21/15, Page 28
Section 3, Coefficient of Variation, Skewness, and Kurtosis
The coefficient of variation, skewness, and kurtosis, all help to describe the shape of a size of loss
distribution.
For the ungrouped data in Section 1:
Average of X3 = E[X3 ] = 3rd moment about the origin = Σ xi3 / n = 1.600225 x 1018.
Average of X4 = E[X4 ] = 4th moment about the origin = Σ xi4 / n = 6.465278 x 1024.
Coefficient of Variation = Standard Deviation / Mean = 6.286 x 105 / 312675 = 2.01.
E[(X - E[X]) 3] E[X3] - 3 E[X] E[X2] + 2 E[X] 3

(Coefficient of) Skewness = γ1 = = = 4.83.
STDDEV 3 STDDEV 3
E[(X - E[X])4 ] E[X4] - 4 E[X] E[X3] + 6 E[X]2 E[X2] - 3 E[X]4

Kurtosis = γ2 = = = 30.3.
Variance2 Variance2
Coefficient of Variation:
The coefficient of variation (CV) = standard deviation / mean.

The coefficient of variation measures how dispersed the sizes of loss are around their mean. The
larger the coefficient of variation the more dispersed the distribution. The coefficient of variation helps
describe the shape of the distribution.
Exercise: Let 5 and 77 be the first two moments (around the origin) of a distribution.
What is the coefficient of variation of this distribution?
[Solution: Variance = 77 - 52 = 52. CV = 52 / 5 = 1.44. ]
Since if X is in dollars then both the standard deviation and the mean are in dollars, the coefficient of
variation is a dimensionless quantity; i.e., it is a pure number which is not in any particular currency.
Thus the coefficient of variation of X is unaffected if X is multiplied by a constant.
Var[X] + Var[Y]
When adding two independent random variables: CV[X + Y] = . In particular,
E[X] + E[Y]
when adding two independent identically distributed random variables: CV[X + X] = CV[X] / 2.
So if one adds up more and more independent identically distributed random variables, then the
coefficient of variation declines towards zero.11
The following formula for unity plus the square of the coefficient of variation follows directly from the
definition Coefficient of Variation.
C V2 = Variance / E[X]2 = (E[X2 ] - E[X]2 ) / E[X]2 = (E[X2 ] / E[X]2 ) - 1.
Thus, 1 + CV2 = E[X2 ] / E[X]2 = 2nd moment divided by the square of the mean.
Exercise: One observes losses of sizes: $300, $600, $1,200, $1,500, and $2,800.
Determine the empirical coefficient of variation.
[Solution: X = (300 + 600 + 1200 + 1500 + 2800)/5 = 1280. Variance =
{(300 - 1280)2 + (600 - 1280)2 + (1200 - 1280)2 + (1500 - 1280)2 + (2800 - 1280)2 }/5 =
757,600. CV = 757,600 / 1280 = 0.680.
Alternately, 2nd moment is: (3002 + 6002 + 12002 + 15002 + 28002 )/5 = 2,396,000.
1 + CV2 = 2,396,000 / 12802 = 1.4624. ⇒ CV = 0.680.
Comment: Note the use of the biased estimator of the variance rather than the sample variance.
The CV would be the same using each of the losses divided by 100.]
Skewness:
The (coefficient of) skewness is defined as the 3rd central moment divided by the cube of the
standard deviation:
E[(X - E[X]) 3]
Skewness = γ1 = .
STDDEV 3
The third central moment can be written in terms of moments around the origin:
E[(X - E[X])3 ] = E[X3 - 3X2 E[X] + 3XE[X]2 - E[X]3 ] = E[X3 ] - 3 E[X] E[X2 ] + 3E[X]E[X]2 - E[X]3
= E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 .
E[(X - E[X])3 ] = E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 .
Exercise: Let 5, 77, 812 be the first three moments (around the origin) of a distribution.
What is the skewness of this distribution?
[Solution: Variance = 77 - 52 = 52.
Third central moment = E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 = 812 - (3)(5)(77) + 2(53 ) = -93.
Skewness = Third central moment / Variance1.5 = -93/521.5 = - 0.248.]
11
This is the fundamental idea behind the usefulness of Credibility.
The skewness helps to describe the shape of the distribution. Typically size of loss distributions
have positive skewness, such as the following Pareto Distribution with mode of zero:12
density
0.00003
0.00002
0.00001
x
20000 40000 60000 80000 100000
Or the following LogNormal Distribution with a positive mode:
density
0.00003
0.00002
0.00001
x
20000 40000 60000 80000 100000
Positive skewness ⇔ skewed to the right.
There is a significant probability of very large results.
A symmetric distribution has zero skewness.13

While a symmetric curve has a skewness of zero, the converse is not true.
12
The Pareto Distribution is discussed in the subsequent section on Common Two Parameter Distributions.
13
For example, the Normal distribution has zero skewness.
The following Weibull Distribution has skewness of zero, but it is not symmetric:14
density
0.00002
0.00001
x
20000 40000 60000 80000 100000
The following Weibull Distribution has negative skewness and is skewed to the left:15
density
0.00003
0.00002
0.00001
x
20000 40000 60000 80000 100000
14
With τ = 3.60235. See the Section on Common Two Parameter Distributions.
15
With τ = 6. The skewness depends on the value of the shape parameter tau. For τ > 3.60235, the Weibull has
negative skewness. For τ < 3.60235, the Weibull has positive skewness.
If X is in dollars, both the third central moments of X and the cube of the standard deviation are in
dollars cubed. Therefore the skewness is a dimensionless quantity; i.e., it is a pure number which is
not in any particular currency. Thus the skewness of X is unaffected if X is multiplied by a positive
constant. However, Skew[-X] = -Skew[X].16
Thus if X has positive skewness then -X has negative skewness.17
Exercise: The skewness of a random variable X is 3.5. What is the skewness of 1.1X?
[Solution: 3.5. The skewness is unaffected when a variable is multiplied by a positive constant.
Comment: This could be due to the impact of 10% inflation.]
Exercise: The skewness of a random variable X is 3.5. What is the skewness of -1.1X?
[Solution: -3.5. The skewness is multiplied by -1 when a variable is multiplied by a negative
constant.]
The numerator and the denominator of the skewness both involve central moments. The numerator
is the third central moment, while the denominator is the second central moment taken to the 3/2
power. Therefore they are unaffected by the addition or subtraction of a constant. Therefore, the
skewness of X + c is the same as the skewness of X. Translating a curve to the left or the right does
not change its shape; specifically it does not change its skewness.
Exercise: The skewness of a random variable X is 3.5.

What is the skewness of 10X + 7?
[Solution: 3.5. The skewness is unaffected when a variable is multiplied by a positive constant.
Also, the skewness is unaffected when a constant is added.]
Note that skewnesses do not add. However, since third central moments of independent variables
do add, one can derive useful formulas.18
16
The numerator of the skewness is negative of what it was, but the denominator is unaffected since the standard
deviation is never negative by definition.
17
If X is skewed to the right, then -X, which is X reflected in the Y-Axis, is skewed to the left.
18
For X and Y independent the 2nd and 3rd central moments add; the 4th central moment and higher central
moments do not add. Cumulants of independent variables add and the 2nd and 3rd central moments are equal to
the 2nd and 3rd cumulants. See for example, Practical Risk Theory for Actuaries, by Daykin, Pentikainen and
Pesonen.

3rd central moment of X+Y = 3rd central moment of X + 3rd central moment of Y =
Skew[X]Var[X]1.5 + Skew[Y]Var[Y]1.5.
Thus for X and Y independent:
Skew[X] Var[X]1.5 + Skew[Y] Var[Y]1.5
Skew[X + Y] = .
(Var[X] + Var[Y])1.5
In particular, when adding two independent identically distributed random variables,

Skew[X + X] = Skew[X]/√2. As we add more identically distributed random variables the skewness
goes to zero; the sum goes to a symmetric distribution.19
The coefficient of variation and the skewness are useful summary statistics that describe the shape of
the distribution. They give you an idea of which type of distribution is likely to fit. Note that the
Coefficient of Variation and Skewness do not depend on the scale parameter if any.
Most size of loss distributions have a positive skewness (skewed to the right), with a few very large
claims and many smaller claims. The more of the total dollars of loss represented by the rare large
claims, the more skewed the distribution.20
Determine the empirical coefficient of skewness.
[Solution: From a previous exercise X = 1280, and the variance = 757,600. 3rd central moment =
{(300 - 1280)3 + (600 - 1280)3 + (1200 - 1280)3 + (1500 - 1280)3 + (2800 - 1280)3 }/5 =
453,264,000. Skewness = 453,264,000/ 757,6001.5 = 0.687.
Comment: We have again used the “biased” estimate of the variance rather than the sample
variance. The skewness would be the same using each of the losses divided by 100.]
19
Note the relation to the central limit theorem, where a sum of standardized identical distributions goes to a
symmetric normal distribution.
20
As discussed subsequently, this situation is referred to as a heavy-tailed distribution.
Kurtosis:
The kurtosis is defined as the fourth central moment divided by the square of the variance.
E[(X - E[X])4 ]
Kurtosis = .
Variance2
Determine the empirical kurtosis.
[Solution: From a previous exercise X = 1280, and the variance = 757,600. 4th central moment =
{(300 - 1280)4 + (600 - 1280)4 + (1200 - 1280)4 + (1500 - 1280)4 + (2800 - 1280)4 }/5 =
1,295,302,720,000. Kurtosis = 1,295,302,720,000/ 757,6002 = 2.257.
Comment: The kurtosis would be the same using each of the losses divided by 100.]
As with the skewness, the kurtosis is a dimensionless quantity, which describes the shape of the
distribution.21 Thus the kurtosis is unaffected when a variable is multiplied by a (non-zero) constant.
Since the fourth central moment is always non-negative, so is the kurtosis.
Large kurtosis ⇔ a heavy-tailed distribution.
Exercise: The kurtosis of a random variable X is 3.5. What is the kurtosis of 1.1X?
[Solution: 3.5. The kurtosis is unaffected when a variable is multiplied by a constant.
Comment: This exercise could be referring to the impact of 10% inflation.]
Exercise: The kurtosis of a random variable X is 3.5. What is the kurtosis of -1.1X?
[Solution: 3.5. The kurtosis is unaffected when a variable is multiplied by a non-zero constant.
Comment: Remember that the kurtosis is always positive.]
The numerator and the denominator of the kurtosis both involve central moments. The numerator is
the fourth central moment, while the denominator is the second central moment squared. Therefore
they are unaffected by the addition or subtraction of a constant; the kurtosis of X + c is the same as
the kurtosis of X. Translating a curve to the left or the right does not change its shape; it does not
change its kurtosis.
21
Both the numerator and denominator are in dollars to the fourth power.
Exercise: The kurtosis of a random variable X is 3.5. What is the kurtosis of 10X + 7?
[Solution: 3.5. The kurtosis is unaffected when a variable is multiplied by a non-zero constant. Also,
the kurtosis is unaffected when a constant is added.]
If X is a Normal Distribution, then (X-µ)/σ is a Standard Normal.

X = σ(Standard Normal + µ/σ) = a constant times (Standard Normal plus another constant.)
Thus all Normal Distributions have the same kurtosis as a Standard Normal.
It turns out that, all Normal Distributions have a kurtosis of 3.
Distributions with a kurtosis less than 3 are lighter-tailed than a Normal Distribution. Distributions with a
kurtosis more than 3 are heavier-tailed than a Normal Distribution; they have their densities go to
zero more slowly as x approaches infinity than a Normal.
Most size of loss distributions encountered in practice have a kurtosis greater than 3.
For example, the kurtosis of a Gamma Distribution with shape parameter α is: 3 + 6/α.
Exercise: What is the 4th central moment in terms of moments around the origin?
[Solution: The 4th central moment is: E[(X - E[X])4 ] = E[X4 - 4E[X]X3 + 6E[X]2 X2 - 4E[X]3 X + E[X]4 ]
= E[X4 ] - 4E[X]E[X3 ] + 6E[X]2 E[X2 ] - 4E[X]3 E[X] + E[X]4
= E[X4 ] - 4E[X]E[X3 ] + 6E[X]2 E[X2 ] - 3E[X]4 .]
Thus we have the formula for the Kurtosis in terms of moments around the origin:
Kurtosis = γ2 = = .
Variance2 Variance2
The empirical kurtosis of the ungrouped data in Section 1 is:

E[X4] - 4 E[X] E[X3] + 6 E[X]2 E[X2] - 3 E[X]4
=
Variance2
{(6.465278 x 1024 - (4)(312,674.6)(1.600225 x 1018) + (6)( 312,674.62 )(4.9284598 x 1011)

- (3)(312,674.64 )} / (3.9508 x 1011)2 = 30.3
It should be noted that empirical estimates of the kurtosis are subject to large estimation errors, since
the empirical kurtosis is very heavily affected by the absence or presence of a few large claims.
Exercise: Let 5, 77, 812, 10423 be the first four moments (around the origin) of a distribution.
What is the kurtosis of this distribution?
E[X4] - 4 E[X] E[X3] + 6 E[X]2 E[X2] - 3 E[X]4
[Solution: Variance = 77 - 52 = 52. Kurtosis = =
Variance2
{10423 - (4)(5)(812) + (6)(52 )(77) - (3)(54 ) }/522 = 1.43.]
Jensenʼs Inequality states that for a convex function f, E[f(X)] ≥ f(E[X]).22

f(x) = x2 is an example of a convex function; its second derivative is positive.
Therefore, by Jensenʼs Inequality, E[X2 ] ≥ E[X]2 .23
Letting X = Y2 , we therefore have that E[Y4 ] ≥ E[Y2 ]2 .

The fourth moment is greater than or equal to the square of the second moment.
Letting X = (Y - µY)2 , we therefore have that E[(Y - µY)4 ] ≥ E[(Y - µY)2 ]2 .

The fourth central moment is greater than or equal to the square of the variance.
Therefore, the Kurtosis is always greater than or equal to one.

In fact, Kurtosis ≥ 1 + Skewness2 .24
Exercise: Let Prob[X = -10] = 50% and Prob[X = 10] = 50%.

Determine the skewness and kurtosis of X.
[Solution: Since this distribution is symmetric around its mean of 0, skewness = 0.
Variance = 102 = 100. Fourth Central Moment = 104 . Kurtosis = 104 /1002 = 1.]
Determine the empirical kurtosis.
[Solution: From a previous exercise X = 1280, and the variance = 757,600. 4th central moment =
{(300 - 1280)4 + (600 - 1280)4 + (1200 - 1280)4 + (1500 - 1280)4 + (2800 - 1280)4 }/5 =
1,295,302,720,000. Kurtosis = 1,295,302,720,000/ 757,6002 = 2.257.]
When computing the empirical coefficient of variation, skewness, or kurtosis, we use

the biased estimate of the variance, with n in the denominator, rather than the sample
variance. We do so since everyone else does.25
22
See for example Actuarial Mathematics.
23
This also follows from the fact that the variance is never negative.
24
See Exercise 3.19 in Volume I of Kendallʼs Advanced Theory of Statistics.
25
Problems:
3.1 (1 point) A size of loss distribution has moments as follows: First moment = 3,
Second moment = 50, Third Moment = 2000. Determine the skewness.
A. less than 6
B. at least 6 but less than 6.2
E. at least 6.6

E[X] = 5, E[X2 ] = 42.8571, E[X3 ] = 584.184, E[X4 ] = 11,503.3.

A. less than 17
E. at least 20
3.3 (1 point) What is the coefficient of variation of X?

A. less than 0.6
E. at least 0.9
3.4 (2 points) What is the skewness of X?

A. less than 2.4
E. at least 2.7
3.5 (3 points) What is the kurtosis of X?

A. less than 10
E. at least 13
3.6 (1 point) Let X be a random variable. Which of the following statements are true?
1. A measure of skewness of X is E[X3 ] / Var[X]3/2.
2. The measure of skewness is positive if X has a heavy tail to the right.
3. If X is given by Standard Unit Normal, then X has kurtosis equal to one.
A. None of 1, 2, or 3 B. 1 C. 2 D. 3 E. None of A, B, C, or D
3.7 (3 points) There are 10,000 claims observed as follows:

Size of Claim Number of Claims
100 9000
200 800
300 170
400 30
1. The mean of this distribution is 112.3.
2. The variance of this distribution is 14210.
3. The skewness of this distribution is positive.

There are five losses of sizes: 5, 10, 20, 50, 100.
3.8 (2 points) What is the empirical coefficient of variation?

A. 0.90 B. 0.95 C. 1.00 D. 1.05 E. 1.1
3.9 (2 points) What is the empirical coefficient of skewness?

A. 0.8 B. 0.9 C. 1.0 D. 1.2 E. 1.4
3.10 (2 points) What is the empirical kurtosis?

A. 1.1 B. 1.4 C. 1.7 D. 2.0 E. 2.3
3.11 (3 points) f(x) = 2x, 0 < x < 1. Determine the skewness.

A. -0.6 B. -0.3 C. 0 D. 0.3 E. 0.6
3.12 (3 points) f(x) = 1, 0 < x < 1. Determine the skewness.

A. -0.6 B. -0.3 C. 0 D. 0.3 E. 0.6
3.13 (3 points) f(x) = 2(1 - x), 0 < x < 1. Determine the skewness.
A. -0.6 B. -0.3 C. 0 D. 0.3 E. 0.6
3.14 (4, 5/86, Q.33) (1 point)

Which of the following statements are true about the random variable X?
1. If X is given by a unit normal distribution, then X has its measure of skewness equal to one.
2. A measure of the skewness of X is E[X3 ]/(VAR[X])3 .
3. The measure of skewness of X is positive if X has a heavy tail to the right.
A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3
3.15 (4, 5/88, Q.30) (1 point) Let X be a random variable with mean m, and let ar denote the rth
moment of X about the origin. Which of the following statements are true?
1. m = a1
2. The third central moment is equivalent to a3 + 3a2 - 2a1 3 .
3. The variance of X is the second central moment of X.
A. 1 B. 2 C. 2, 3 D. 1, 3 E. 1, 2 and 3
3.16 (4, 5/89, Q.27) (1 point) There are 30 claims for a total of $180,000.
Given the following claim size distribution, calculate the coefficient of skewness.
Claim Size ( $000 ) Number of Claims
2 2
4 6
6 12
8 10
A. Less than -.6
B. At least -.6, but less than -.2
C. At least -.2, but less than .2
D. At least .2, but less than .6
E. .6 or more
3.17 (2 points) In the previous question, determine the kurtosis.

A. 1.0 B. 1.5 C. 2.0 D. 2.5 E. 3.0
3.18 (4B, 5/93, Q.34) (1 point) Claim severity has the following distribution:
$100 0.05
$200 0.20
$300 0.50
$400 0.20
$500 0.05
Determine the distribution's measure of skewness.
A. -0.25 B. 0.00 C. 0.15 D. 0.35 E. Cannot be determined
3.19 (2 points) In the previous question, determine the kurtosis.

A. 1.0 B. 1.5 C. 2.0 D. 2.5 E. 3.0
• For any random variable X with finite first three moments, the skewness of the distribution of X is
denoted Sk(X).
• X and Y are independent, identically distributed random variables with mean = 0 and
finite second and third moments.
Which of the following statements must be true?
1. 2Sk(X) = Sk(2X)
2. -Sk(Y) = Sk(-Y)
3. |Sk(X)| ≥ |Sk(X+Y)|
A. 2 B. 3 C. 1, 2 D. 2, 3 E. None of A, B, C, or D
• Both the mean and the coefficient of variation of a particular distribution are 2.
• The third moment of this distribution about the origin is 136.
Determine the skewness of this distribution.
Hint: The skewness of a distribution is defined to be the third central moment divided by the cube of
the standard deviation.
A. 1/4 B. 1/2 C. 1 D. 4 E. 17
3.22 (4B, 11/99. Q.29) (2 points) You are given the following:
• A is a random variable with mean 5 and coefficient of variation 1.
• B is a random variable with mean 5 and coefficient of variation 1.
• C is a random variable with mean 20 and coefficient of variation 1/2.
• A, B, and C are independent.
•X=A+B
•Y=A+C
Determine the correlation coefficient between X and Y.
A. -2/ 10 B. -1/ 10 C. 0 D. 1/ 10 E. 2/ 10
3.23 (4, 5/01, Q.3) (2.5 points) You are given the following times of first claim for five randomly
selected auto insurance policies observed from time t = 0: 1, 2, 3, 4, 5.
Calculate the kurtosis of this sample.
(A) 0.0 (B) 0.5 (C) 1.7 (D) 3.4 (E) 6.8
3.24 (4, 11/06, Q.3 & 2009 Sample Q.248) (2.9 points)
You are given a random sample of 10 claims consisting of
two claims of 400, seven claims of 800, and one claim of 1600.
Determine the empirical skewness coefficient.
(A) Less than 1.0
(B) At least 1.0, but less than 1.5
(C) At least 1.5, but less than 2.0
(D) At least 2.0, but less than 2.5
(E) At least 2.5
3.1. B. Stand. deviation = 41 = 6.403. Skewness = (2000 - 450 + 54) / 262.5 = 6.1
3.2. B. Variance = 42.8571 - 52 = 17.8571.

Comment: The given moments are for an Inverse Gaussian Distribution with parameters µ = 5 and
θ = 7. The variance for an Inverse Gaussian is: µ3/θ = 125 / 7 = 17.8571.
3.3. D. CV = 17.8571 / 5 = 0.845.

θ = 7. The coefficient of variation for an Inverse Gaussian is: µ /θ = 5 /7 = .845.
E[X3] - 3 E[X] E[X2] + 2 E[X] 3

3.4. C. Skewness = =
STDDEV 3
{584.184 - (3)(5)(42.8571) + (2)(125)}/ 17.85711.5 = 191.3275 / 75.4599 = 2.535.

θ = 7. The skewness for an Inverse Gaussian is 3 µ /θ = 3 5 /7 = 2.535.
E[X4] - 4 E[X] E[X3] + 6 E[X]2 E[X2] - 3 E[X]4

3.5. E. Kurtosis = =
Variance2
{11503.3 -(4)(5)(584.184) + (6)(25)(42.8571) - (3)(625)}/17.85712 = 4373.19/318.88 = 13.71.

θ = 7. The kurtosis for an Inverse Gaussian is 3 + 15µ/θ = 96/7 = 13.71.
3.6. C. 1. The numerator of the skewness should be the third central moment:
E[(X - E[X])3 ] = E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 . Thus Statement 1 is not true in general.
2. Statement 2 is true. A good example is the Pareto Distribution.
3. The Normal Distribution has a kurtosis of three, thus Statement #3 is not true.
3.7. E. The mean is: 1,123,000 / 10000 = 112.3. So Statement #1 is true.

The second moment is: 142,100,000 / 10,000 = 14,210, thus the variance is:
14210 - 11.232 = 1598.7. Thus Statement #2 is false.
A B C D E
Size of Claim Number of Claims Col.A x Col.B Sq. of Col.A x Col.B Cube of Col.A x Col.B
100 9000 900,000 90,000,000 9,000,000,000
200 800 160,000 32,000,000 6,400,000,000
300 170 51,000 15,300,000 4,590,000,000
400 30 12,000 4,800,000 1,920,000,000
10000 1,123,000 142,100,000 21,910,000,000
E[X] = 1,123,000 / 10,000 = 112.3. E[X2 ] = 142,100,000 / 10,000 = 14,210.
E[X3 ] = 21,910,000,000 / 10,000 = 2,191,000. STDDEV = 1598.7 = 39.98.

E[X3] - 3 E[X] E[X2] + 2 E[X] 3 3
Skewness = =
STDDEV 3
{2,191,000 - (3)(112.3)( 14,210) + (2)(112.3)3 )} / 39.983 = 3.7. Thus Statement #3 is true.
3.8. B. X = (5 + 10 + 20 + 50 + 100)/5 = 37.

Variance = {(5 - 37)2 + (10 - 37)2 + (20 - 37)2 + (50 - 37)2 + (100 - 37)2 }/5 = 1236.
CV = 1236 / 37 = 0.950.
Comment: Note the use of the biased estimator of the variance rather than the sample variance.
3.9. B. Third Central Moment = {(5 - 37)3 + (10 - 37)3 + (20 - 37)3 + (50 - 37)3 + (100 - 37)3 }/5
= 38,976. Skewness = 38,976 / 12361.5 = 0.897.
3.10. E. 4th Central Moment = {(5 - 37)4 + (10 - 37)4 + (20 - 37)4 + (50 - 37)4 + (100 - 37)4 }/5 =
3,489,012. Kurtosis = 3,489,012 / 12362 = 2.284.
1 1 1
3.11. A. E[X] = ∫ xf(x)dx = 2/3. E[X2 ] = ∫ x2 f(x)dx = 1/2. E[X3 ] = ∫ x3 f(x)dx = 2/5.
0 0 0
variance = (1/2) - (2/3)2 = 1/18.

third central moment = E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 = 2/5 - (3)(2/3)(1/2) + 2(2/3)3 = -0.0074074.
Skewness = -0.0074074/(1/18)1.5 = -0.5657.
Comment: A Beta Distribution with a = 2, b = 1, and θ = 1.
Skewness = 2 (b - a) a + b + 1 / {(a + b + 2) a b } = -2 4 / {5 2 } = -0.5657.
3.12. C. The distribution is symmetric around its mean of 1/2. ⇒ The skewness is 0.
1 1 1
E[X] = ∫ xf(x)dx = 1/2. E[X2 ] = ∫ x2 f(x)dx = 1/3. E[X3 ] = ∫ x3 f(x)dx = 1/4.
0 0 0
variance = (1/3) - (1/2)2 = 1/12. third central moment = E[X3 ] - 3E[X]E[X2 ] + 2E[X]3
= 1/4 - (3)(1/2)(1/3) + 2(1/2)3 = 0. Skewness = 0/(1/12)1.5 = 0.
1 1
3.13. E. E[X] = ∫ xf(x)dx = 1 - 2/3 = 1/3. E[X2 ] = ∫ x2 f(x)dx = 2/3 - 1/2 = 1/6.
0 0
1
variance = (1/6) - (1/3)2 = 1/18. E[X3 ] = ∫ x3 f(x)dx = 1/2 - 2/5 = 1/10.
0
3rd central moment = E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 = 1/10 - (3)(1/3)(1/6) + 2(1/3)3 = 0.0074074.
Skewness = 0.0074074/(1/18)1.5 = 0.5657.
Comment: A Beta Distribution with a = 1, b = 2, and θ = 1.
Skewness = 2 (b - a) a + b + 1 / {(a + b + 2) a b } = 2 4 / {5 2 } = 0.5657.
3.14. C. 1. False . The Normal Distribution is symmetric with a skewness of zero. 2. False. The
numerator should be the third central moment, while the denominator should be the standard
deviation cubed. SKEW[X] = E[(X-E[X])3 ]/(VAR[X])3/2 . 3. True.
3.15. D. Statement one is true, the mean is the 1st moment around the origin: E[X].
Statement 2 is false. The 3rd central moment = E[(X-m)3 ] = a3 - 3a2 a1 + 2a1 3 ;
Statement 3 is true, the variance is the 2nd central moment: E[(X-m)2].

Comment: In the 3rd central moment each term must be in dollars cubed.
3.16. B. & 3.17. D. Calculate the moments:

Number Size of Square of Cube of
of Claims Claim Size of Claim Size of Claim
2 2 4 8
6 4 16 64
12 6 36 216
10 8 64 512
Average 6.000 39.200 270.400
E[X] = 6 , E[X2 ] = 39.2, E[X3 ] = {(2)(23 ) +(6)(43 ) +(12)(63 ) +(10)(83 ) }/(2+6+10+12) = 270.4.
Variance = E[X2 ] - E[X]2 = 39.2 - 62 = 3.2.
(Coefficient of) Skewness = {E[X3 ] - (3 E[X] E[X2 ]) + (2 E[X]3 )} / STDDEV3 =

{(270.4) - (3)(6)(39.2) +(2)(6)3 } / (3.2)3/2 = -3.2 / 5.724 = -0.56.
Alternately, Third central moment = {2(2 - 6)3 + 6(4 - 6)3 + 12(6 - 6)3 + 10(8 - 6)3 }/30 = -3.2.
Skewness = -3.2/ (3.2)3/2 = -0.56.
Fourth central moment = {2(2 - 6)4 + 6(4 - 6)4 + 12(6 - 6)4 + 10(8 - 6)4 }/30 = 25.6.
Kurtosis = (Fourth central moment)/Variance2 = 25.6/3.22 = 2.5.
Comment: The distribution is skewed to the left and therefore has a negative skewness.
3.18. B. A symmetric distribution has zero skewness.
3.19. E. Calculate the moments:

Probability Size of Square of
Claim ($00) Size of Claim
5% 1 1
20% 2 4
50% 3 9
20% 4 16
5% 5 25
Average 3.0 9.8
E[X] = 3 , E[X2 ] = 9.8. Variance = E[X2 ] - E[X]2 = 9.8 - 32 = 0.8.
4th central moment = (.05)(1 - 3)4 + (.2)(2 - 3)4 + (.5)(3 - 3)4 + (.2)(4 - 3)4 + (.05)(5 - 3)4 = 2.
Kurtosis = (Fourth central moment)/Variance2 = 2/0.82 = 3.125.
Comment: The kurtosis does not depend on the scale. So dividing all of the claim sizes by 100
makes the arithmetic easier, but does not affect the answer.
3.20. D. Statement 1 is false. The skewness is a dimensionless quantity; i.e., it is a pure number
which is not in any particular currency. Thus the skewness of X is unaffected if X is multiplied by a
positive constant. In this specific case both the 3rd central moment and the cube of the standard
deviation are multiplied by 23 = 8. Therefore the skewness which is their ratio is unaffected.
Statement 2 is true. The skewness is defined as the 3rd central moment divided by the cube of the
standard deviation. The former is multiplied by -1 since by definition the third central moment is E[(X-
E[X])3 ]. Alternately, recall that the third central moment = µ3 ′ - (3 µ1 ′ µ2 ′) + (2 µ1 ′3), each of whose
terms is multiplied by -1. (The odd powered moments around the origin are each multiplied by -1,
while the even powered moments are unaffected.) The cube of the standard deviation is unaffected
since the standard deviation is always positive. Statement 3 is true. Skewnesses do not add.
However, since third central moments of independent variables do add, for X and Y independent,
3rd central moment of X+Y = 3rd central moment of X + 3rd central moment of Y =
Skew[X]Var[X]1.5 + Skew[Y]Var[Y]1.5. Thus for X and Y independent,
Skew[X+Y] = {Skew[X]Var[X]1.5 + Skew[Y]Var[Y]1.5} / {Var[X] + Var [Y]}1.5.
In particular, when adding two independent identically distributed random variables,
Skew[X + Y] = Skew[X] / √2 ≤ Skew [X].
Comment: Long and difficult. Tests important concepts. Statement 2 says that if X is skewed to the
right, then -X is skewed to the left by the same amount.
3.21. B. We are given E[X3 ] = 136, E[X] = 2 and CV = σ/ E[X] = 2. Therefore σ = 4.

Therefore E[X2 ] = σ2 + E[X]2 = 42 + 22 = 20. Skewness = {E[X3 ] - (3 E[X] E[X2 ]) + (2 E[X]3 )} / σ3 =
{136 - (3)(20)(2) + 2(23 )} / 43 = 32/64 = 1/2.
3.22. D. Var[A] = {(mean)(CV)}2 = 25. Var[B] = {(5)(1)}2 = 25. Var[C] = {(20)(1/2)}2 = 100.
Var[X] = Var[A] + Var[B] = 25 + 25 = 50, since A and B are independent.
Var[Y] = Var[A] + Var[C] = 25 + 100 = 125, since A and C are independent.
Cov[X, Y] = Cov[A+B, A+C] = Cov[A, A] + Cov[A, C] + Cov[B, A] + Cov[B, C] =
Var[A] + 0 + 0 + 0 = 25.
Corr[X , Y] = Cov[X , Y] / Var[X] Var[Y] = 25 / (50)(125) = 1/ 10 .
Comment: Since A, B, and C are independent, Cov[A, C] = Cov[B, A] = Cov[B, C] = 0.
3.23. C. Mean = (1 + 2 + 3 + 4 +5)/5 = 3.

Variance = 2nd central moment = {(1-3)2 + (2-3)2 + (3-3)2 + (4-3)2 + (5-3)2 }/5 = 2.
4th central moment = {(1-3)4 + (2-3)4 + (3-3)4 + (4-3)4 + (5-3)4 }/5 = 34/5.
Kurtosis = the fourth central moment divided by the variance squared = (34/5)/22 = 1.7.
Comment: We use the biased estimator of the variance rather than the sample variance.
3.24. B. E[X] = {(2)(400) + (7)(800) + 1600}/10 = 800.

E[X2 ] = {(2)(4002 ) + (7)(8002 ) + 16002 }/10 = 736,000.
E[X3 ] = {(2)(4003 ) + (7)(8003 ) + 16003 }/10 = 780,800,000.
Variance is: 736,000 - 8002 = 96,000.
Third Central Moment is: 780,800,000 - (3)(736,000)(800) + (2)(8003 ) = 38,400,000.
Skewness is: 38,400,000/96,0001.5 = 1.291.
Alternately, Third Central Moment is: {(2)(400 - 800)3 + (7)(800 - 800)3 + (1600 - 800)3 }/10 =
38,400,000. Proceed as before.
Comment: If one divide all of the claim sizes by 100, then the skewness is unaffected.
Note that the denominator is not based on using the sample variance.
2016-C-2, Loss Distributions, §4 Empirical Dist. Function HCM 10/21/15, Page 48
Section 4, Empirical Distribution Function
This section will discuss the Distribution and Survival Functions.
Cumulative Distribution Function:
For the Cumulative Distribution Function, F(x) = Prob[X ≤ x].
Various Distribution Functions are listed in Appendix A attached to your exam.

For example, for the Exponential Distribution, F(x) = 1 - e-x/θ.26
Exercise: What is the value at 3 of an Exponential Distribution with θ = 2.
[Solution: F(x) = 1 - e-x/θ. F(3) = 1- e-3/2 = .777.]
Fʼ(x) = f(x) ≥ 0.
0 ≤ F(x) ≤ 1, nondecreasing, right-continuous, starts at 0 and ends at 1.27
Here is graph of the Exponential Distribution with θ = 2:
0.8
0.6
0.4
0.2
2 4 6 8 10
26
See Appendix A in the tables attached to the exam. The Exponential Distribution will be discussed in detail in a
subsequent section.
27
As x approaches y from above, F(x) approaches F(y). F would not be continuous at a jump discontinuity, but would
still be right continuous. See Section 2.2 of Loss Models.
Survival Function:
Similarly, we can define the Survival Function, S(x) = 1 - F(x) = Prob[X > x].
Sʼ(x) = -f(x) ≤ 0.
0 ≤ S(x) ≤ 1, nonincreasing, right-continuous, starts at 1 and ends at 0.28
S(x) = 1 - F(x) = Prob[X > x] = the Survival Function

= the tail probability of the Distribution Function F.
For example, for the Exponential Distribution, S(x) = 1 - F(x) = 1 - (1- e-x/θ) = e-x/θ.
Here is graph of the Survival Function of an Exponential with θ = 2:
0.8
0.6
0.4
0.2
2 4 6 8 10
Exercise: What is S(5) for a Pareto Distribution29 with α = 2 and θ = 3?
[Solution: F(x) = 1 - {θ/(x+θ)}α. S(x) = {θ/(x+θ)}α. S(5) = {3/(3+5)}2 = 9/64 = 14.1%.]
In many situations you may find that the survival function is easier for you to use than the distribution
function. Whenever a formula has S(x), one can always use 1 - F(x) instead, and vice-versa.
28
29
See Appendix A in the tables attached to the exam.
The Pareto Distribution will be discussed in a subsequent section.
Empirical Model:
The Empirical Model: probability of 1/(# data points) is assigned to each observed value.30
For the ungrouped data set in Section 1, the corresponding empirical model has density of 1/130 at
each of the 130 data points:
p(300) = 1/130, p(400) = 1/130, ..., p(4802200) = 1/130.
Exercise: The following observations: 17, 16, 16, 19 are taken from a random sample.
What is the probability function (pdf) of the corresponding empirical model?
[Solution: p(16) = 1/2, p(17) = 1/4, p(19) = 1/4.]
Empirical Distribution Function:
The Empirical Model is the density that corresponds to the Empirical Distribution Function:
Fn (x) = (# data points ≤ x)/(total # of data points).
The Empirical Distribution Function at x, is the observed number of claims less than or
equal to x divided by the total number of claims observed.
At each observed claim size the Empirical Distribution Function has a jump
discontinuity.
For example for the ungrouped data in Section 1, just prior to 37,300 the Empirical Distribution
Function is 26/130 = .2000, while at 37,300 it is 27/130 = .2077.
What is the Empirical Distribution Function?
[Solution: Fn (x) is: 0 for x < 300, 1/5 for 300 ≤ x < 600, 2/5 for 600 ≤ x < 1200,
3/5 for 1200 ≤ x < 1500, 4/5 for 1500 ≤ x < 2800, 1 for x ≥ 2800.]
30
Here is a graph of this Empirical Distribution Function:
Probability
1.0
0.8 o
0.6 o
0.4 o
0.2 o
o x
300 600 1200 1500 2800
The empirical distribution function is constant on intervals, with jumps up of 1/5 at each of the five
observed points. For example, it is 1/5 at 599.99999 but 2/5 at 600.
Mean and Variance of the Empirical Distribution Function:
Assume the losses are drawn from a Distribution Function F(x). Then each observed loss has a
chance of F(x) of being less than or equal to x. Thus the number of losses observed less than or
equal to x is a sum of N independent Bernoulli trials with chance of success F(x). Thus if one has a
sample of N losses, the number of losses observed less than or equal to x is Binomially distributed
with parameters N and F(x).
Therefore, the Empirical Distribution Function is (1/N) times a Binomial Distribution with parameters N
and F(x). Therefore, the Empirical Distribution Function has mean of F(x)
and a variance of: F(x){1-F(x)}/N.
Exercise: Assume 130 losses are independently drawn from an Exponential Distribution:
F(x) = 1 - e-x/300,000.
Then what is the distribution of the number of losses less than or equal to 100,000?
[Solution: The number of losses observed less than or equal to 100,000 is Binomially distributed
with parameters 130, 1-e-1/3 = 0.283.]
Exercise: Assume 130 losses are independently drawn from an Exponential Distribution:
F(x) = 1 - e-x/300000.
Then what is the variance of the number of losses less than or equal to 100,000?
with parameters 130, 1-e-1/3 = 0.283.
Thus it has a variance of: (130)(0.283)(1 - 0.283) = 26.38. ]
Exercise: 130 losses are independently drawn from an Exponential Distribution:

F(x) = 1 - e-x/300000. What is the distribution of the empirical distribution function at 100,000?
with parameters 130, .283. The empirical distribution function at 100,000, Fn (100000), is the
percentage of losses ≤ 100,000. Thus the empirical distribution function at 100,000 is (1/130) times
a Binomial with parameters 130 and .283.]
Exercise: 130 losses are independently drawn from an Exponential Distribution:

F(x) = 1 - e-x/300000.
What is the variance of the percentage of losses less than or equal to 100,000?
[Solution: Fn (100000) is (1/130) times a Binomial with parameters 130 and .283. Thus it has a
variance of (1/130)2 (130)(0.283)(1 - 0.283) = 0.00156. ]
As the number of losses, N, increases, the variance of the estimate of the distribution decreases as
1/N. All other things being equal, the variance of the empirical distribution function is largest when
trying to estimate the middle of the distribution rather than either of the tails31.
Empirical Survival Function:
The Empirical Survival Function is: 1 - Empirical Distribution Function.
Empirical Distribution Function at x is: (# losses ≤ x)/(total # of losses).32
Empirical Survival Function at x is: (# losses > x)/(total # of losses).
What are the empirical distribution function and survival function at 1000?
[Solution: Fn (1000) = (# losses ≤ 1000) / (# losses) = 2/5.
S n (1000) = (# losses > 1000) / (# losses) = 3/5.]
31
F(x){1-F(x)} is largest for F(x) ≅ 1/2. However, small differences in the tail probabilities can be important.
32
More generally, the empirical distribution function is: (# observations ≤ x) / (total # of observations).
For 300, 600, 1,200, 1,500, and 2,800, here is a graph of the Empirical Survival Function:
Probability
1.0 o
0.8 o
0.6 o
0.4 o
0.2 o
x
300 600 1200 1500 2800
The empirical survival function is constant on intervals, with jumps down of 1/5 at each of the five
observed points. For example, it is 4/5 at 599.99999 but 3/5 at 600.
Exercise: Determine the area under this empirical survival function.

[Solution: (1)(300) + (.8)(300) + (.6)(600) + (.4)(300) + (.2)(1300) = 1280.]
The sample mean, X = (300 + 600 + 1200 +1500 + 2800)/5 = 1280.

The sample mean is equal to the integral of the empirical survival function.
As will be discussed in a subsequent section, the mean is equal to the integral of the survival
function, for those cases where the support of the survival function starts at zero.
Problems:
4.1 (1 point) Insureds suffer six losses of sizes: 3, 8, 13, 22, 35, 62.
What is the empirical survival function at 30?
A. 1/6 B. 1/3 C. 1/2 D. 2/3 E. 5/6
4.2 (1 point) You observe 5 losses of sizes: 15, 35, 70, 90, 140.
What is the empirical distribution function at 50?
A. 20% B. 30% C. 40% D. 50% E. 60%
4.3 (2 points) F(200) = 0.9, F(d) = 0.25, and

200
∫ x f(x) dx = 75.
d
200
∫ F(x) dx + d = 150.
d
Determine d.
A. 60 B. 70 C. 80 D. 90 E. 100

You are given the following graph of an empirical distribution function:
Probability
1
0.6
0.4
0
Size
0 7 11 19
4.4 (1 point) Determine the mean of the data.

(A) Less than 10
(B) At least 10, but less than 11
(C) At least 11, but less than 12
(D) At least 12, but less than 13
(E) At least 13
∑ (Xi - X )2
4.5 (1 point) For this data, determine the biased estimator of the variance, .
N
A) Less than 26
(E) At least 34
4.6 (CAS9, 11/99, Q.16) (1 point) Which of the following can cause distortions in a loss claim
size distribution derived from empirical data?
1. Claim values tend to cluster around target values, such as $5,000 or $10,000.
2. Individual clams may come from policies with different policy limits.
3. Final individual claim sizes are not always known.
A. 1 B. 2 C. 3 D. 1, 2 E. 1, 2, 3
4.7 (IOA 101, 9/01, Q.7) (3.75 points) The probability density function of a random variable X is
given by f(x) = kx(1 - ax2 ), 0 ≤ x ≤ 1, where k and a are positive constants.
(i) (2.25 points) Show that a ≤ 1, and determine the value of k in terms of a.
(ii) (1.5 points) For the case a = 1, determine the mean of X.
4.1. B. S(30) = (# losses > 30)/(# losses) = 2/6 = 1/3.
4.2. C. There are 2 losses of size ≤ 50. Empirical distribution function at 50 is: 2/5 = 0.4.
200
4.3. A. By integration by parts: ∫d F(x) dx =
200
x = 200
xF(x) ]
x= d
- ∫d x f(x) dx = (200)F(200) - dF(d) - 75 = (200)(0.9) - 0.25d - 75 = 105 - 0.25d.
⇒ 105 - 0.25d + d = 150. ⇒ d = 45/0.75 = 60.
4.4. D. From the empirical distribution function, 40% of the data is 7, 60% - 40% = 20% of the data
is 11, and 100% - 60% = 40% of the data is 19.
The mean is: (40%)(7) + (20%)(11) + (40%)(19) = 12.6.
Comment: If the data set was of size five, then it was: 7, 7, 11, 19, 19. The mean is: 63/5 = 12.6.
4.5. C. From the empirical distribution function, 40% of the data is 7,
60% - 40% = 20% of the data is 11, and 100% - 60% = 40% of the data is 19.
The mean is: (40%)(7) + (20%)(11) + (40%)(19) = 12.6.
The second moment is: (40%)(72 ) + (20%)(112 ) + (40%)(192 ) = 188.2.
∑ (Xi - X )2
= 188.2 - 12.62 = 29.44.
N
4.6 . E. All of these are true.

Item #3 is referring to the time between when the insurer knows about a claim and sets up a
reserve, and when the claim is paid and closed.
4.7. (i) f(x) ≥ 0. ⇒ 1 - ax2 ≥ 0, 0 ≤ x ≤ 1. ⇒ a ≤ 1.

Integral from 0 to 1 of f(x) = k(x - ax3 ) is: k(1/2 - a/4).
Setting this integral equal to one: k(1/2 - a/4) = 1. ⇒ k = 4/(2 - a).
(ii) k = 4/(2 - a) = 4/(2 - 1) = 4. f(x) = 4x - 4x3 .
The integral from zero to one of xf(x) = 4x2 - 4x4 is: 4/3 - 4/5 = 8/15.
2016-C-2, Loss Distributions, §5 Limited Losses HCM 10/21/15, Page 58
Section 5, Limited Losses
The next few sections will introduce a number of related ideas: the Limited Loss Variable,
Limited Expected Value, Losses Eliminated, Loss Elimination Ratio, Excess Losses,
Excess Ratio, Excess Loss Variable, Mean Residual Life/ Mean Excess Loss, and
Hazard Rate/ Failure Rate.
X ∧ 1000 ≡ Minimum of x and 1000 = Limited Loss Variable.
Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800.
What is X ∧ 1000?
[Solution: X ∧ 1000 = $300, $600, $1000, $1000, $1000.]
If the insured had a policy with a $1000 policy limit (and no deductible), then the insurer would pay
$300, $600, $1000, $1000, and $1000, for a total of $3900 for these five losses.
The Limited Loss Variable33 corresponding to a limit L ⇔ X ∧ L⇔
censored from above at L ⇔ right censored at L34 ⇔
the payments with a policy limit L (and no deductible) ⇔ X for X < L, L for X ≥ L.
Limited Expected Value:
Limited Expected Value at 1000 = E[X ∧ 1000] =

an average over all sizes of loss of the minimum of 1000 and the size of loss.
What is the (empirical) limited expected value at $1000?
[Solution: E[X ∧ 1000] = (300 + 600 + 1000 + 1000 + 1000)/5 = 3900/5 = 780.]
In this case, the insurer pays 3900 on 5 losses or an average of 780 per loss.
The mean of the limited loss variable corresponding to L = E[ X ∧ L] =

the average payment per loss with a policy limit of L.
Since E[X ∧ L] ≡ E[Min[X, L]] = average of numbers each ≤ L, E[ X ∧ L] ≤ L.
Since E[X ∧ L] ≡ E[Min[X, L]] = average of numbers each ≤ X, E[ X ∧ L] ≤ E[X].

33
34
Censoring will be discussed in a subsequent section.
Exercise: For the ungrouped data in Section 1, what is the Limited Expected Value at $10,000?
[Solution: E[X ∧ 10000] is an average over all sizes of loss of the minimum of $10,000 and the size
of loss. So the first 8 losses in Section 1 would all enter into the average at their total size, while the
remaining 122 losses all enter at $10,000.
E[X ∧ 10000] = (35200 + (122)(10000)) / 130 = $1,255,200 / 130 = $9655.4.]
The limited expected value can be written as the sum of the contributions of the small losses and the
large losses. The (theoretical) Limited Expected Value (LEV),
E[X ∧ L], would be written for a continuous size of loss distribution as two pieces:
L
E[X ∧ L] = ∫ x f(x) dx + L S(L)
0
= contribution of small losses + contribution of large losses.
The first piece represents the contribution of losses up to L in size, while the second piece
represents the contribution of those losses larger than L. The smaller losses each contribute their
size, while the larger losses each contribute L to the average.
For example, for the Exponential Distribution:
L x =L
E[X ∧ L] = ∫ x e- x / θ / θ dx + L e-L/θ = (-x e- x / θ - θ e - x / θ )] + L e-L/θ = θ(1 - e-L/θ).35
0 x=0
35
See Appendix A of Loss Models and the tables attached to the exam.
Problems:
What is the Limited Expected Value, for a Limit of 25?
A. 15 B. 16 C. 17 D. 18 E. 19
What is the Limited Expected Value at 50?
A. 10 B. 20 C. 30 D. 40 E. 50

Frequency is Poisson with λ = 20.
E[X] = $10,000.
E[X ∧ 25,000] = $8000.
5.3 (1 point) If there is no policy limit, what is the expected aggregate annual loss?
5.4 (1 point) If there is a 25,000 policy limit, what is the expected aggregate annual loss?
5.5 (2 points) For an insurance policy, you are given:

(i) The policy limit is 100,000 per loss, with no deductible.
(ii) Expected aggregate losses are 1,000,000 annually.
(iii) The number of losses follows a Poisson distribution.
(iv) The claim severity distribution has:
S(50,000) = 10%.
S(100,000) = 4%.
E[X ∧ 50,000] = 28,000.
E[X ∧ 100,000] = 32,000.
(v) Frequency and severity are independent.
Determine the probability that no losses will exceed 50,000 during the next year.
(A) 3.0% (B) 3.5% (C) 4.0% (D) 4.5% (E) 5.0%
5.6 (1 point) E[X ∧ 5000] = 3200.

Size Number of Losses Dollars of Loss
0 to 5000 170 ???
5001 to 25,000 60 700,000
over 25,000 20 ???
Determine E[X ∧ 25,000].
(A) 5600 (B) 5800 (C) 6000 (D) 6200 (E) 6400
5.7 (4, 11/01, Q.36) (2.5 points) For an insurance policy, you are given:
(i) The policy limit is 1,000,000 per loss, with no deductible.
(ii) Expected aggregate losses are 2,000,000 annually.
(iii) The number of losses exceeding 500,000 follows a Poisson distribution.
(iv) The claim severity distribution has
Pr(Loss > 500,000) = 0.0106
E[min(Loss; 500,000)] = 20,133
E[min(Loss; 1,000,000)] = 23,759
Determine the probability that no losses will exceed 500,000 during 5 years.
(A) 0.01 (B) 0.02 (C) 0.03 (D) 0.04 (E) 0.05
5.1. B. E[X ∧ 25] = (3 + 8 + 13 + 22 + 25 + 25) / 6 = 96 / 6 = 16.
5.2. D. E[X ∧ 50] = (15 + 35 + 50 + 50 + 50)/5 = 40.
5.3. (20)($10000) = $200,000.
5.4. (20)($8000) = $160,000.
5.5. D. 1,000,000 = expected annual aggregate loss = (mean frequency)E[X ∧ 100,000] =

(mean frequency) (32,000). ⇒ mean frequency = 1 million / 32,000 = 31.25 losses per year.
The expected number of losses exceeding 50,000 is: (31.25)S(50,000) = 3.125.
The large losses are Poisson; the chance of having zero of them is: e-3.125 = 4.4%.
5.6. E. E[X ∧ 25,000] =

E[X ∧ 5000] + (contribution above 5000 from medium claims)
+ (contribution above 5000 from large claims)
= 3200 + {700,000 - (60)(5000)} / 250 + (20)(25,000 - 5000) / 250 = 6400.
Alternately, let y be the dollars of loss on losses of size 0 to 5000.
Then, 3200 = E[X ∧ 5000] = {y + (5000) (60 + 20)} / 250. ⇒ y = 400,000.
E[X ∧ 25,000] = {400,000 + 700,000 + (20)(25,000)} / 250 = 6400.
Comment: Each loss of size more than 25,000, contributes an additional 20,000 to E[X ∧ 25,000],
compared to E[X ∧ 5000].
Each loss of size 5001 to 25,000 contributes an additional x - 5000 to E[X ∧ 25,000],
compared to E[X ∧ 5000].
5.7. A. 2,000,000 = expected annual aggregate loss = (mean frequency)E[X ∧ 1 million] =

(mean frequency) (23,759). Therefore, mean frequency = 2 million/ 23759 = 84.18 per year.
Assuming frequency and severity are independent, the expected number of losses exceeding
1/2 million is: (84.18)(.0106) = .892 per year.
Over 5 years we expect (5)(.892) = 4.461 losses > 1/2 million. Since we are told these losses are
Poisson Distributed, the chance of having zero of them is: e-4.461 = 0.012.
2016-4-2, Loss Distributions, §6 Losses Eliminated HCM 10/21/15, Page 63
Section 6, Losses Eliminated
Assume an (ordinary) deductible of $10,000 and the ground up36 loss sizes from Section 1. Then
the insurer would pay nothing for the first 8 losses, each of which is less than the $10,000 deductible.
For the ninth loss of size $10,400, the insurer would pay $400 while the insured would have to
absorb $10,000. For a loss of $37,300, the insurer would pay:
$37,300 - $10,000 = $27,300. Similarly, for each of the larger losses $10,000 is eliminated, from
the point of view of the insurer.
The total dollars of loss eliminated is computed by summing up the sizes of loss for all losses less
than the deductible amount of $10,000, and adding to that the sum of $10,000 per each loss greater
than or equal to $10,000. In this case the losses eliminated are:
$35,200 + (122)($10,000) = $1,255,200. Note that the Empirical Losses Eliminated are a
continuous function of the deductible amount; a small increase in the deductible amount produces a
corresponding small increase in the empirical losses eliminated.
How many dollars of loss are eliminated by a deductible of $1000?
[Solution: $300 + $600 + (3)($1000) = $3900.]
Let N be the total number of losses.
Then the Losses Eliminated by a deductible d would be written for a continuous size of loss
distribution as the sum of the same two pieces, the contribution of the small losses plus the
contribution of the large losses:
d
N ∫ x f(x) dx + N d S(d).
0
The first piece is the sum of losses less than d. (We have multiplied by the total number of losses
since f(x) is normalized to integrate to unity.) The second piece is the number of losses greater than
d times d per such loss. Note that the losses eliminated are just the number of losses times the
Limited Expected Value.
Losses Eliminated by deductible d are: N E[X ∧ d].
36
By “ground up” I mean the economic loss to the insured, prior to the impact any deductible.
Loss Elimination Ratio:
The total losses in Section 1 are $40,647,700. Therefore, the $1,255,200 of losses eliminated by a
deductible of size $10,000 represent $1255200 / $40647700 = 3.09% of the total losses.
This corresponds to an empirical Loss Elimination Ratio (LER) at 10,000 of 3.09%.
Losses Eliminated by a deductible of size d

Loss Elimination Ratio at d = LER(d) = .
Total Losses
In general the LER is the ratio of the losses eliminated to the total losses. Since its numerator is
continuous while its denominator is independent of the deductible amount, the empirical loss
elimination ratio is a continuous function of the deductible amount.
What is the (empirical) loss elimination ratio at $1000?
[Solution: $3900 losses are eliminated out of a total of 300 + 600 + 1200 + 1500 + 2800 = 6400.
Therefore, LER(1000) = 3900/6400 = 60.9%.]
The loss elimination ratio at x can be written as:

dollars of loss limited by x (dollars of loss limited by x) / N E[X ∧ x]
LER(x) = = = .
total losses (total losses) / N Mean
E[X∧ x]
LER(x) = .
E[X]
For example, for the ungrouped data in Section 1, E[X ∧ 10000] is equal to the losses eliminated
by a deductible of 10,000: $1,255,200, divided by the total number of losses 130.
E[X ∧ 10000] = 1,255,200 / 130 = 9655.4.
The mean is the total losses of $40,647,700 divided by 130. E[X] = 40,647,700/130 = 312,675.
Therefore, LER(10000) = E[X ∧ 10000] / E[X] = 9655.4 / 312675 = 3.09%,
matching the previous computation of LER(10000).
Problems:
What is the Loss Elimination Ratio, for a deductible of 10?
A. less than 0.37
E. at least 0.43
What is the Loss Elimination Ratio at 50?
A. 54% B. 57% C. 60% D. 63% E. 66%
6.3 (2 points) You observe the following payments on 6 losses with no deductible applied:
$200, $300, $400, $800, $900, $1,600.
Let A be the loss elimination ratio (LER) for a $500 deductible.
Let B be the loss elimination ratio (LER) for a $1000 deductible. Determine B - A.
A. 30% B. 35% C. 40% D. 45% E. 50%
6.4 (4, 5/89, Q.57) (1 point) Given the following payments on 6 losses, calculate the loss
elimination ratio (LER) for a $300 deductible (assume the paid losses had no deductible applied).
Paid Losses: $200, $300, $400, $800, $900, $1,600.
A. LER < .40 B. .40 < LER ≤ .41 C. .41 < LER < .42 D. .42 < LER ≤ .43 E. .43 < LER
6.5 (CAS5, 5/03, Q.38) (3 points) Given the information below, calculate the loss elimination ratio
for ABC Company's collision coverage in State X at a $250 deductible. Show all work.
• ABC insures 5,000 cars at a $250 deductible with the following fully credible data on the
collision claims:
Paid losses are $1,000,000 per year.
The average number of claims per year is 500.
• A fully credible study found that in State X:
The average number of car accidents per year involving collision damage was 10,000.
The average number of vehicles was 67,000.
• Assume ABC Company's expected ground-up claims frequency is equal to that of State X.
• Assume the average size of accidents that fall below the deductible is $150.
6.1. A. LER(10) = Losses Eliminated / Total Losses =

(3 + 8 + 10 + 10 + 10 + 10) / (3 + 8 + 13 + 22 + 35 + 62) = 51 / 143 = 0.357.
6.2. B. E[X] = (15+35+70+90+140)/ 5 = 70. LER(50) = E[X ∧ 50]/E[X] = 40/70 = 0.571.
6.3. A. The Losses Eliminated for a $500 deductible are: 200 + 300 + 400 + (3)(500) = 2400.
The total losses are 4200.
Thus LER(500) = Losses Eliminated / Total Losses = 2400/4200 = .571.
Losses Eliminated for a $1000 deductible are: 200 + 300 + 400 + 800 + 900 + 1000 = 3600.
Thus LER(1000) = Losses Eliminated / Total Losses = 3600/4200 = .857.
LER(1000) - LER(500) = .857 - .571 = 0.286.
6.4. B. The Losses Eliminated are: (200)+(300)+(4)(300) = 1700. The total losses are 4200.
Thus the LER = Losses Eliminated / Total Losses = 1700/4200 = 0.405.
6.5. Accident Frequency for State X is: 10,000/67,000 = 14.925%.

For 5000 cars, expect: (14.925%)(5000) = 746.3 accidents.
There were 500 claims, in other words 500 accidents of size greater than the $250 deductible.
Thus we infer: 746.3 - 500 = 246.3 small accidents.
These small accidents had average size $150, for a total of: (246.3)($150) = $36,945.
Deductible eliminates $250 for each large accident, for a total of: ($250)(500) = $125,000.
Losses eliminated = $36,945 + $125,000 = $161,945.
Total losses = losses eliminated + losses paid = $161,945 + $1,000,000 = $1,161,945.
LER at $250 = Losses Eliminated / Total Losses = $161,945 / $1,161,945 = 13.9%.
Alternately, frequency of loss = 10,000/67,000 = 14.925%.
Frequency of claims (accidents of size > 250) = 500/5000 = 10%.
S(250) = 10%/14.925% = .6700. F(250) = 1 - S(250) = .3300.
Average size of accidents that fall below the deductible = average size of small accidents =
$150 = {E[X ∧ 250] - 250S(250)}/F(250) = {E[X ∧ 250] - ($250)(.67)}/.33.
⇒ E[X ∧ 250] = (.33)($150) + (.67)($250) = $217.
Average payment per non-zero payment = $1,000,000/500 =
$2000 = (E[X] - E[X ∧ 250])/S(250) = (E[X] - E[X ∧ 250])/.67.
⇒ E[X] - E[X ∧ 250] = $1340. ⇒ E[X] = $1340 + $217 = $1557.
LER(250) = E[X ∧ 250] / E[X] = $217/$1557 = 13.9%.
2016-C-2, Loss Distributions, §7 Excess Losses HCM 10/21/15, Page 67
Section 7, Excess Losses
The dollars of loss excess of $10,000 per loss are also of interest. These are precisely the dollars of
loss not eliminated by a deductible of size $10,000. For the ungrouped data in Section 1, the
losses excess of $10,000 are $40,647,700 - $1,255,200 = $39,392,500.
(X - d)+ ≡ 0 when X ≤ d, X - d when X > d.37
(X - d)+ is the amount paid to an insured with a deductible of d.

The insurer pays nothing if X ≤ d, and pays X - d if X > d.
What is (X - 1000)+?
[Solution: 0, 0, $200, $500, and $1800.]
(X - d)+ is referred to as the “left censored and shifted variable” at d.38
(X - d)+ ⇔ left censored and shifted variable at d ⇔
0 when X ≤ d, X - d when X > d ⇔ the amounts paid to insured with a deductible of d
⇔ payments per loss, including when the insured is paid nothing due to the deductible of d ⇔
amount paid per loss.
What is E[(X - 1000)+]?
[Solution: (0 + 0 + $200 + $500 + $1800)/ 5 = $500.]
The expected losses excess of 10,000 per loss would be written for a continuous size of loss
distribution as:
∞
E[(X - 10000)+ ] = Losses Excess of 10,000 per loss = ∫ (x - 10,000) f(x) dx
10,000
Note that we only integrate over those losses greater than $10,000 in size, since smaller losses
contribute nothing to the excess losses. Also larger losses only contribute the amount by which each
exceeds $10,000.
37
The “+” refers to taking the variable X - d when it is positive, and otherwise setting the result equal to zero.
38
Censoring will be discussed in a subsequent section. See Definition 3.4 in Loss Models.
∞ ∞ ∞
E[(X - 10000)+ ] = ∫ (x - 10,000) f(x) dx = ∫ x f(x) dx - 10,000 ∫ f(x) dx =
10,000 10,000 10,000
∞ 10,000
∫0 x f(x) dx - { ∫0 x f(x) dx + 10,000 S(10,000)} = E[X] - E[X ∧ 10000].
Losses Excess of L per loss = E[(X - L)+] = E[X] - E[X ∧ L].
Show that E[(X - 1000)+] = E[X] - E[X ∧ 1000].
[Solution: E[X] - E[X ∧ 1000] = 1280 - 780 = 500 = E[(X - 1000)+].]
Exercise: For an Exponential Distribution with θ = 100, what is E[(X - 70)+]?

[Solution: E[(X - 70)+] = E[X] - E[X ∧ 70] = 100 - 100(1 - e-70/100) = 49.7.]
Excess Ratio:39
The Excess Ratio is the losses excess of the given limit divided by the total losses.
Excess Ratio at x = R(x) ≡ (losses excess of x)/(total losses)

= E[(X - x)+ ] / E[X] = (E[X] - E[X ∧ x]) / E[X].
Therefore, for the data in Section 1, the empirical Excess Ratio,

R(10,000) = (40,647,700 - 1,255,200) / 40,647,700 = 96.91%.
Note that: R(10,000) = 96.91% = 1 - 3.09% = 1 - LER(10,000).
R(10,000) = (losses excess of 10,000) / (total losses) = N E[(X - 10,000)+] / (N E[X]) =
(E[X] - E[X ∧ 10,000]) / E[X] = 1 - E[X ∧ 10,000] / E[X] = 1 - LER(10,000).
R(x) = 1 - LER(x) = 1 - E[X ∧ x] / E[X].
What is the (empirical) excess ratio at $1000?
[Solution: R(1000) = (200 + 500 + 1800)/6400 = 39.1% = 1 - 60.9% = 1 - LER(1000)
= 1 - E[X ∧ 1000] / E[X] = 1 - 780/1280.]
39
Loss Models does not use the commonly used term Excess Ratio. However, this important concept may help you
to understand and answer questions. Since the Excess Ratio is just one minus the Loss Elimination Ratio, one can
always work with the Loss Elimination Ratio instead of the Excess Ratio.
One can also write the Excess Ratio in terms of integrals as:
∞ ∞
∫ (x - L) f(x) dx ∫x f(x) dx - L S(L)
R(L) = L ∞ = L ∞ .
∫x f(x) dx ∫x f(x) dx
0 0
However, in order to compute the excess ratio or loss elimination ratio, it is usually faster to use the
formulas in Appendix A of Loss Models for the Mean and Limited Expected Value.
Exercise: For an Exponential Distribution with θ = 100, what is R(70)?

[Solution: R(70) = 1 - E[X ∧ 70]/E[X] = 1 - 100(1 - e-70/100)/100 = 49.7%.]
Total Losses = Limited Losses + Excess Losses:
Exercise: For a loss of size 6 and a loss of size 15, list X ∧ 10, (X-10)+, and (X ∧ 10) + (X-10)+.
[Solution: X X ∧ 10 (X-10)+ (X ∧ 10) + (X-10)+
6 6 0 6
15 10 5 15]
In general, X = (X ∧ d) + (X - d)+ .
In other words, buying two policies, one with a policy limit of 1000 (and no deductible), and another
with a deductible of 1000 (and no policy limit), provides the same coverage as a single policy with
no deductible or policy limit.
A deductible of 1000 caps the policyholderʼs payments at 1000, so from his point of view the 1000
deductible acts as a limit. The policyholderʼs retained loss is: X ∧ 1000. The insurerʼs payment to
the policyholder is: (X - 1000)+ . Together they total to the loss, X.
A deductible from one point of view is a policy limit from another point of view.40 Remember the
losses eliminated by a deductible of size 1000 are E[X ∧ 1000], the same expression as the
losses paid under a policy with limit of size 1000 (and no deductible).
X = (X ∧ d) + (X - d)+. ⇒ E[X] = E[X ∧ d] + E[(X - d)+]. ⇒ E[(X - d)+ ] = E[X] - E[X ∧ d].
Expected Excess = Expected Total Losses - Expected Limited Losses.
40
An insurer who buys reinsurance with a per claim deductible of 1 million, has capped its retained losses at
1 million per claim. In that sense the 1 million deductible from the point of view of the reinsurer acts as if the insurer
had sold policies with a 1 million policy limit from the point of view of the insurer.
Problems:
What is the Excess Ratio, excess of 30?
A. less than 0.16
E. at least 0.25
7.2 (2 points) Determine the excess ratio at $200,000.

Frequency Dollar
of Losses Amount
40% $5,000
20% $10,000
15% $25,000
10% $50,000
5% $100,000
4% $250,000
3% $500,000
2% $1,000,000
1% $2,000,000
7.3 (1 point) X is 70 with probability 40% and 700 with probability 60%.
Determine E[(X - 100)+ ].
A. less than 345
E. at least 360

If E[(X - d)+ ] = 3, determine d.
A. 4 B. 6 C. 8 D. 10 E. 12
7.5 (CAS9, 11/96, Q.42a) (1 point)

You are given the following empirical data from a closed claim study.
Interval # of Claims Losses

0 2,648 0
1 - 50,000 1,500 25,001,000
50,001 - 100,000 400 30,000,000
100,001 - 250,000 274 45,758,000
250,001 - 400,000 100 30,000,000
400,001 - 750,000 22 12,474,000
750,001 - 2,500,000 3 2,621,000
Over 2,500,000 1 2,586,000
Total 4,948 148,440,000
Calculate the losses excess of $250,000 as a percentage of total losses.

7.1. E. R(30) = (dollars excess of 30) / (total dollars) =

(5 + 32) / (3 + 8 + 13 + 22 + 35 + 62) = 37 / 143 = 0.259.
7.2. Excess Ratio = expected excess losses / expected total losses = 45000/82750 = 54.4%.
Probability Amount Product Excess of 200000 Product
0.4 $5,000 $2,000 $0 $0
0.2 $10,000 $2,000 $0 $0
0.15 $25,000 $3,750 $0 $0
0.1 $50,000 $5,000 $0 $0
0.05 $100,000 $5,000 $0 $0
0.04 $250,000 $10,000 $50,000 $2,000
0.03 $500,000 $15,000 $300,000 $9,000
0.02 $1,000,000 $20,000 $800,000 $16,000
0.01 $2,000,000 $20,000 $1,800,000 $18,000
$82,750 $45,000
7.3. E. (70 - 100)+ = 0. (700 - 100)+ = 600. E[(X - 100)+ ] = (40%)(0) + (60%)(600) = 360.
7.4. D. E[(X - 5)+ ] = (0)(80%) + (25 - 5)(20%) = 4 > 3. ⇒ d must be greater than 5.
Therefore, E[(X - d)+ ] = (.2)(25 - d) = 3. ⇒ d = 10.
7.5. # claims of size more than $250,000 is: 100 + 22 + 3 + 1 = 126

$ on claims of size more than $250,000 is:
30,000,000 + 12,474,000 + 2,621,000 + 2,586,000 = 47,681,000
excess losses are: 47,681,000 - (126)(250,000) = 16,181,000
Ratio is: 16,181,000/148,440,000 = 10.9%.
2016-C-2, Loss Distributions, §8 Excess Loss Variable HCM 10/21/15, Page 73
Section 8, Excess Loss Variable
The Excess Loss Variable for d is defined for X > d as X-d and is undefined for X ≤ d.41
What is the Excess Loss Variable for $1000?
[Solution: undefined, undefined, $200, $500, $1,800.]
Excess Loss Variable for d ⇔ the nonzero payments excess of a deductible of d
⇔ X - d for X > d ⇔ truncated from below at d and shifted42 ⇔

amount paid per (non-zero) payment.
The Excess Loss Variable at d, which could be called the left truncated and shifted variable at d, is
similar to (X - d)+ , the left censored and shifted variable at d. However, the Excess Loss Variable at
d is undefined for X ≤ d, while in contrast (X - d)+ is zero for X ≤ d.
Excess Loss Variable ⇔ undefined X ≤ d ⇔ amount paid per (non-zero) payment.
(X - d)+ ⇔ 0 for X ≤ d ⇔ amount paid per loss.
Exercise: An insured has four losses of size: 700, 3500, 16,000 and 40,000.
What are the excess loss variable at 5000, the left censored and shifted variable at 5000, and the
limited loss variable at 5000?
[Solution: Excess Loss Variable at 5000: 11,000 and 35,000, corresponding to the last two losses.
(It is not defined for the first two losses of size less than 5000.)
Left censored and shifted variable at 5000: 0, 0, 11,000 and 35,000.
Limited Loss Variable at 5000: 700, 3500, 5000, 5000.]
41
42
Truncation will be discussed in a subsequent section.
Mean Residual Life / Mean Excess Loss:
The mean of the excess loss variable for d =

the mean excess loss, e(d) =
(Losses Excess of d) / (number of losses > d) =
(E[X] - E[ X ∧ d])/S(d) =
the average payment per (nonzero) payment with a deductible of d.
What is the (empirical) mean excess loss at $1000?
[Solution: e(1000) = ($200 + $500 + $1,800)/3 = $833.33.]
Note that the first step in computing e(1000) is to ignore the two losses that “died before 1000.”
Then one computes the average “lifetime beyond 1000” for the 3 remaining losses.43
In this situation, on a policy with a $1000 deductible, the insurer would make 3
(non-zero) payments totaling $2500, for an average (non-zero) payment of $833.33.
The Mean Residual Life or Mean Excess Loss at x, e(x), is defined as the average dollars of
loss above x on losses of size exceeding x.44
For the ungrouped data in Section 1, there are 122 losses of size greater than $10,000 and they
have (40647700 - 1255200) dollars of loss above $10,000.45 Therefore, e(10,000) =
$39,392,500 / 122 = $322,889. This can also be written as:
mean - E[X ∧ 10,000] $312,674.6 - $9655.4

= = $322,889.
S(10,000) 122 / 130
E[X] − E[X ∧x]

e(x) = .
S(x)
Note that the empirical mean excess loss is discontinuous. While the excess losses in the numerator
are continuous, the empirical survival function in the denominator is discontinuous. The denominator
has a jump discontinuity at every observed claim size.
43
In Life Contingencies, this is how one computes the mean residual life.
44
45
Note that only losses which exceed the limit even enter into the computation; we ignore small losses. Thus the
denominator in this case is 122 rather than 130.
The (theoretical) mean excess loss would be written for a continuous size of loss distribution as:
∞ ⎛L ⎞ ∞
∫ x f(x) dx -
⎜∫
⎜ x f(x) dx + L S(L)⎟
⎟ ∫ x f(x) dx - L S(L)
0 ⎝0 ⎠ L
e(L) = = .
S(L) S(L)
The numerator of e(L) is the losses eliminated divided by the total number of losses; this is equal to
the excess ratio R(x) times the mean.
Thus e(x) = R(x) mean / S(x).
Specifically, for the ungrouped data in Section 1,

e(10000) = (96.91%)(312674.6 ) / (122 / 130) ≈ 322,883.
One can also write e(L) as:

∞
∫ x f(x) dx
L dollars on losses of size > L
e(L) = -L= - L.
S(L) # of losses of size > L
Thus, e(x) = (average size of those losses of size greater than x) - x.
Summary of Related Ideas:
Loss Elimination Ratio at x = LER(x) = E[X ∧ x] / E[X].
E[X] − E[X ∧x] E[X ∧x]

Excess Ratio at x = R(x) = =1- = 1 - LER(x).
E[X] E[X]
E[X] − E[X ∧x]

Mean Residual Life at x = Mean Excess Loss at x = e(x) = .
S(x)
On the exam, one wants to avoid doing integrals if at all possible. Therefore, one should use the
formulas for the Limited Expected Value, E[X ∧ x], in Appendix A of Loss Models whenever
possible. Those who are graphically oriented may find the Section on Lee Diagrams helps them to
understand these concepts.
Exercise: For the data in Section 1, what are the Limited Expected Value, Loss Elimination Ratio, the
Excess Ratio, and the mean excess loss, all at $25,000?
[Solution: The Limited Expected Value at $25,000 is:
{(sum of losses < $25,000) + ($25,000)(# losses > $25,000) } / (# losses)
= ($232500)+ ($25,000)(109) = $2,957,500 / 130 = $22,750.
The Loss Elimination Ratio at $25,000 is:
E[X ∧ 25000] / mean = $22,750 / $312674.6 = 7.28%.
The Excess Ratio at $25,000 = R(25000) = 1 - LER(25000) = 1 - 7.28% = 92.72%.
The mean excess loss at $25,000 = e(25000) = (mean - E[X ∧ 25000]) / S(25000)
= ($312674.6 - $22,750) / (109 / 130) = $345,782.]
Hazard Rate/ Failure Rate:
The failure rate, force of mortality, or hazard rate, is defined as:
h(x) = f(x)/S(x), x≥0.
For a given age x, the hazard rate is the density of the deaths, divided by the number of people still
alive at age x.
The hazard rate determines the survival (distribution) function and vice versa:
x
S(x) = exp - [ ∫ h(t) dt ].
0
d ln[S(x)]
- = h(x).
dx
As will be discussed in a subsequent section, the limit as x approaches infinity of e(x) is equal
to the the limit as x approaches infinity of 1/h(x). These behaviors will be used to distinguish
the tails of distributions.
Exercise: For the data in Section 1, estimate the empirical hazard rate at $25,000.
[Solution: There is no unique estimate of the hazard rate at 25,000.
However, there are 109 claims greater than 25,000 and 3 claims within 5000 of 25,000.
Thus the density at 25,000 is about 3/5000, while the empirical survival function is 109/130.
Therefore, h(25000) ≅ (3/5000)/(109/130) = 0.0007.]
Problems:
What is the empirical mean excess loss at 20?
A. less than 16
E. at least 22
8.2 (2 points) Match the concepts.

1. Limited Loss Variable at 20 a. 3, 8, 13, 20, 20, 20.
2. Excess Loss Variable at 20 b. 2, 15, 42.
3. (X - 20)+ c. 0, 0, 0, 2, 15, 42.
A. 1a, 2b, 3c B. 1a, 2c, 3b C. 1b, 2a, 3c D. 1b, 2c, 3a E. 1c, 2b, 3a
8.3 (2 points) The random variable for a loss, X, has the following characteristics:
x F(x) Limited Expected Value at x
0 0.0 0
500 0.3 360
1000 0.9 670
5000 1.0 770
Calculate the mean excess loss for a deductible of 500.
A. less than 600
E. at least 675
8.4 (2 points) You are given S(60) = 0.50, S(70) = 0.40, and e(70) = 13.
Assuming the survival function is a straight line between ages 60 and 70, estimate e(67).
A. 14.8 B. 15.0 C. 15.2 D. 15.4 E. 15.6
What is the Mean Excess Loss at 50?
A. 10 B. 20 C. 30 D. 40 E. 50
8.6 (4, 5/88, Q.60) (1 point) What is the empirical mean residual life at x = 4 given the following
sample of total lifetimes:
3, 2, 5, 8, 10, 1, 6, 9.
A. Less than 1.5
D. 3.5 or more
E. Cannot be determined from the information given
8.7 (4B, 5/93, Q.25) (2 points) The following random sample has been observed:
2.0, 10.3, 4.8, 16.4, 21.6, 3.7, 21.4, 34.4
Calculate the value of the empirical mean excess loss function for x = 8.
A. less than 7.00
E. at least 13.00
8.8 (4B, 11/94, Q.16) (1 point) A random sample of auto glass claims has yielded the following
five observed claim amounts:
100, 125, 200, 250, 300.
What is the value of the empirical mean excess loss function at x = 150?
A. 75 B. 100 C. 200 D. 225 E. 250
8.9 (3, 11/01, Q.35 & 2009 Sample Q.101) (2.5 points)
The random variable for a loss, X, has the following characteristics:
0 0.0 0
100 0.2 91
200 0.6 153
1000 1.0 331
Calculate the mean excess loss for a deductible of 100.
(A) 250 (B) 300 (C) 350 (D) 400 (E) 450
8.1. C. e(20) = (dollars excess of 20) / (# claims greater than 20) = (2 + 15 + 42) / 3 =
59 /3 = 19.7.
8.2. A. Limited Loss Variable at 20, limit each large loss to 20: 3, 8, 13, 20, 20, 20.
Excess Loss Variable at 20: 2, 15, 42, corresponding to 20 subtracted from each of the last 3
losses. It is not defined for the first 3 losses, each of size less than 20.
(X - 20)+ is 0 for X ≤ 20, and X - 20 for X > 20: 0, 0, 0, 2, 15, 42.
8.3. A. F(5000) = 1 ⇒ E[X] = E[X ∧ 5000] = 770.

e(500) = (E[X] - E[X ∧ 500])/S(500) = (770 - 360)/(1 - 0.3) = 586.
8.4. B. Years excess of 70 = S(70)e(70) = (.4)(13) = 5.2.

S(70) = 0.40, S(69) ≅ 0.41, S(68) ≅ 0.42, S(67) ≅ 0.43.
Years lived between ages 67 and 70 ≅ 0.425 + 0.415 + 0.405 = 1.245.
e(67) = (years excess of 67)/S(67) ≅ (5.2 + 1.245)/0.43 = 15.0.
8.5. E. e(50) = (20 + 40 + 90)/3 = 50.

Alternately, e(50) = (E[X] - E[X ∧ 50])/S(50) = (70 - 40)/(1 - 0.4) = 50.
8.6. D. We ignore all claims of size 4 or less. Each of the 5 claims greater than 4 contributes the
amount by which it exceeds 4. The empirical mean excess loss at x=4 is:
{(5-4) + (8-4) + (10-4) + (6-4) + (9-4)} / 5 = 18/5 = 3.6.
8.7. D. To compute the mean excess loss at 8, we only look at accidents greater than 8.
There are 5 such accidents, and we compute the average amount by which they exceed 8:
e(8) = (2.3 +8.4 + 13.6 + 13.4 + 26.4) / 5 = 64.1 / 5 = 12.82.
8.8. B. Add up the dollars excess of 150 and divide by the 3 claims of size exceeding 150.
e(150) = (50 + 100 + 150) / 3 = 100.
8.9. B. F(1000) = 1 ⇒ E[X] = E[X ∧ 1000] = 331.

e(100) = (E[X] - E[X ∧ 100])/S(100) = (331 - 91)/(1 - 0.2) = 300.
2016-C-2, Loss Distributions, §9 Layers of Loss HCM 10/21/15, Page 80
Section 9, Layers of Loss
Actuaries, particularly those working with reinsurance, often look at the losses in a layer. The following
diagram shows how the Layer of Loss between $10,000 and $25,000 relates to three specific
claims of size: $30,000, $16,000 and $7,000.
The claim of size $30,000 contributes to the layer $15,000, the width of the layer, since it is larger
than the upper boundary of the layer. The claim of size $16,000 contributes to the layer $16,000 -
$10,000 = $6,000; since it is between the two boundaries of the layer it contributes its size minus
the lower boundary of the layer. The claim of size $7,000 contributes nothing to the layer, since it is
smaller than the lower boundary of the layer.
$30,000
$25,000
$16,000
$10,000
$7,000
For example, for the data in Section 1 the losses in the layer between $10,000 and $25,000 are
calculated in three pieces. The 8 losses smaller than $10,000 contribute nothing to this layer. The 13
losses between $10,000 and $25,000 each contribute their value minus $10,000. This sums to
$67,300. The remaining 109 losses which are bigger than the upper limit of the interval at $25,000,
each contribute the width of the interval, $25,000 - $10,000 = $15,000.
Thus the total losses in the layer between $10,000 and $25,000 are: 0 + $67,300 +
(109)($15,000) = $1,702,300. This is $1,702,300 / $40,647,700 = 4.19% of the total losses.
Exercise: For the data in Section 1, what is the percentage of total losses in the layer from $25,000
to $50,000?
[Solution: $2,583,100 / $40,647,700 = 6.35%.]
For a continuous size of loss distribution, the percentage of losses in the layer from $10,000 to
$25,000 would be written as:
25,000
∫ (x - 10,000) f(x) dx + S(25,000) (25,000 -10,000)
10,000
∞ .
∫x f(x) dx
0
The percentage of losses in a layer can be rewritten in terms of limited expected values.
The percentage of losses in the layer from $10,000 to $25,000 is:
(E[X ∧ 25000] - E[X ∧ 10000]) / mean = (22,750- 9655.4) / 312,675 = 4.19%.
This can also be written in terms of the Loss Elimination Ratios:

LER(25000) - LER(10000) = 7.28% - 3.09% = 4.19%.
This can also be written in terms of the Excess Ratios (with the order reversed):
R(10000) - R(25000) = 96.91% - 92.72% = 4.19%.
The percentage of losses in the layer from d to u =

u
∫ (x - d) f(x) dx + S(u) (u - d)
d E[X ∧ u] − E[X ∧ d]
∞ = = LER(u) - LER(d) = R(d) - R(u).
E[X]
∫x f(x) dx
0
Layer Average Severity for the layer from d to u =

The mean losses in the layer from d to u = E[X ∧ u] - E[X ∧ d] =
{LER(u) - LER(d)} E[X] = {R(d) - R(u)} E[X].
The Layer from d to u can be thought of as either:

(Layer from 0 to u) - (Layer from 0 to d)
or (Layer from d to ∞) - (Layer from u to ∞).
For example, the Layer from 10 to 25 can be thought of as either:

(Layer from 0 to 25) - (Layer from 0 to 10)
or (Layer from 10 to ∞) - (Layer from 25 to ∞):
R(25)
25
R(10)
LER(25)
10
LER(10)
The percentage of losses in the layer from 10 to 25 is: LER(25) - LER(10) = R(10) - R(25).
Those who are graphically oriented may find that my Section on Lee Diagrams helps them to
understand these concepts.
Problems:
What is the percentage of total losses in the layer from 10 to 25?
A. less than 29%
E. at least 32%
9.2 (1 point) Four accidents occur of sizes: $230,000, $810,000, $1,170,000, and $2,570,000.
A reinsurer is responsible for the layer of loss from $500,000 to $1,500,000
($1 million excess of 1/2 million.)
How much does the reinsurer pay as a result of these four accidents?
A. $1.7 million B. $1.8 million C. $1.9 million D. $2.0 million E. $2.1 million

• A reinsurer expects 50 accidents per year from a certain book of business.
• Limited Expected Values for this book of business are estimated to be:
E[X ∧ $1 million] = $300,000
E[X ∧ $4 million] = $375,000
E[X ∧ $5 million] = $390,000
E[X ∧ $9 million] = $420,000
E[X ∧ $10 million] = $425,000
9.3 (1 point) If the reinsurer were responsible for the layer of loss from $1 million to $5 million
($4 million excess of $1 million), how much does the reinsurer expect to pay per year as a result of
accidents from this book of business?
A. $4.0 million B. $4.5 million C. $5.0 million D. $5.5 million E. $6.0 million
9.4 (1 point) Let A be the amount the reinsurer would expect to pay per year as a result of
accidents from this book of business, if the reinsurer were responsible for the layer of loss from
$1 million to $5 million ($4 million excess of $1 million). Let B be the amount the reinsurer would
expect to pay per year as a result of accidents from this book of business, if the reinsurer were
instead responsible for the layer of loss from $1 million to $10 million
($9 million excess of $1 million).
What is the ratio of B/A?
A. 1.30 B. 1.35 C. 1.40 D. 1.45 E. 1.50

• A reinsurer expects 50 accidents per year from a certain book of business.
• The average size of accident from this book of business is estimated as $450,000.
• Excess Ratios (Unity minus the Loss Elimination Ratio) for this book of business are:
R($1 million) = 0.100
R($4 million) = 0.025
R($5 million) = 0.015
R($9 million) = 0.006
R($10 million) = 0.005
9.5 (1 point) What is the percentage of total losses in the layer from $1 million to $5 million?
A. less than 6%
E. at least 9%
9.6 (1 point) If the reinsurer were responsible for the layer of loss from $1 million to $5 million
($4 million excess of $1 million), how much does the reinsurer expect to pay per year as a result of
accidents from this book of business?
A. less than $1 million
B. at least $1 million but less than $2 million
C. at least $2 million but less than $3 million
D. at least $3 million but less than $4 million
E. at least $4 million
9.7 (1 point) What is the percentage of total losses in the layer from $1 million to $10 million?
A. less than 6%
E. at least 9%
9.8 (1 point) Let A be the amount the reinsurer would expect to pay per year as a result of
accidents from this book of business, if the reinsurer were responsible for the layer of loss from
$1 million to $5 million ($4 million excess of $1 million). Let B be the amount the reinsurer would
expect to pay per year as a result of accidents from this book of business, if the reinsurer were
instead responsible for the layer of loss from $1 million to $10 million
($9 million excess of $1 million). What is the ratio of B/A?
A. 1.1 B. 1.2 C. 1.3 D. 1.4 E. 1.5
9.9 (2 points) Assume you have Pareto distribution with α = 5 and θ = $1000.
What percentage of total losses are represented by the layer from $500 to $2000?
A. less than 16%
E. at least 19%
9.10 (1 point) There are seven losses of sizes: 2, 5, 8, 11, 13, 21, 32.
What is the percentage of total losses in the layer from 5 to 15?
A. 35% B. 40% C. 45% D. 50% E. 55%
9.11. Use the following information:

• Limited Expected Values for Security Blanket Insurance are estimated to be:
E[X ∧ 100,000] = 40,000
E[X ∧ 200,000] = 50,000
E[X ∧ 300,000] = 57,000
E[X ∧ 400,000] = 61,000
E[X ∧ 500,000] = 63,000
• Security Blanket Insurance buys reinsurance from Plantagenet Reinsurance.
Let A be the amount Plantagenet would expect to pay per year as a result of accidents from
Security Blanket, if the reinsurance had a deductible of 100,000, maximum covered loss of 300,000,
and a coinsurance factor of 90%. Let B be the amount Plantagenet would expect to pay per year
as a result of accidents from Security Blanket, if the reinsurance had a deductible of 100,000, a
maximum covered loss of 400,000, and a coinsurance factor of 80%.
What is the ratio of B/A?
(A) 1.05 (B) 1.10 (C) 1.15 (D) 1.20 (E) 1.25
9.12 (CAS5, 5/07, Q.9) (1 point)

Using the table below, what is the formula for the loss elimination ratio at deductible D?
Loss Limit Number of Losses Total Loss Amount
D and Below N1 L1
Over D N2 L2
Total N1+N2 L1+L2
A. 1 - [L1 + L2 - (N1)(D)] / [L1 + L2]
B. 1 - [L1 + (N2)(D)] / [L1 + L2]
C. 1 - [L2 - (N2)(D)] / [L1 + L2]
D. [L2 + (N2)(D)] / [L1 +(N2)(D)]
E. [L1 + (N1)(D)] / [L1]
9.1. D. (losses in layer from 10 to 25) / total losses =

(0+0+3+12+15+15) / (3+8+13+22+35+62) = 45 / 143 = .315.
9.2. D. The accidents of sizes $230,000, $810,000, $1,170,000, and $2,570,000 contribute to
the layer of loss from $500,000 to $1,500,000:
0 + 310,000 + 670,000 + 1,000,000 = $1,980,000.
9.3. B. (50)(E[X ∧ $5 million] - E[X ∧ $1 million]) = (50)(390,000 - 300,000) = $4.5 million.
9.4. C. A = (50)(E[X ∧ $5 million] - E[X ∧ $1 million]) = (50)(390,000 - 300,000) =

$4.5 million. B = (50)(E[X ∧ $10 million] - E[X ∧ $1 million]) = (50)(425,000 - 300,000) =
$6.25 million. B/A = 6.25 / 4.5 = 1.389.
Comment: One can solve this problem without knowing that 50 accidents are expected per year,
since 50 multiplies both the numerator and denominator. The ratio between two layers of loss
depends on the severity distribution, not the frequency distribution.
9.5. D. R($1 million) - R($5 million) = 0.100 - 0.015 = 0.085.
9.6. B. The annual losses from the layer from $1 million to $5 million =
(number of accidents per year)(mean accident){R($1 million) - R($5 million)} =
(50)($450,000){R($1 million) - R($5 million)} = ($22.5 million){0.100 - 0.015} = $1,912,500.
Alternately, the total expected losses are:
(# of accidents per year)(mean accident) = (50)($450,000) = $22,500,000.
(0.085)($22,500,000) = $1,912,500.
9.7. E. R($1 million) - R($10 million) = 0.100 - 0.005 = 0.095.
9.8. A. B/A = {R($1 million) - R($10 million)} / {R($1 million) - R($5 million)}
= (0.100 - 0.005) / (0.100 - 0.015) = 0.095 / 0.085 = 1.12.
Comment: A = (number of accidents per year)(mean accident){R($1 million) - R($5 million)}.
B = (number of accidents per year)(mean accident){R($1 million) - R($10 million)}.
9.9. D. Use the formula given in Appendix A of Loss Models for the Limited Expected Value of
the Pareto, E[X ∧ x] = {θ/(α−1)}{1−(θ/(θ+x))α−1}.
Percentage of Losses in the Layer $500 to $2000 = ( E[X ∧ 2000] - E[X ∧ 500]) / mean =
(246.9 - 200.6)/250 = 18.5%.
Alternately, use the formula given in a subsequent section for the Excess Ratio of the Pareto, R(x)
={θ/(θ+x)}α−1. Percentage of Losses in the Layer $500 to $2000 = R(500) - R(2000) =
19.75% - 1.23% = 18.5%.
9.10. B. Loss: 2 5 8 11 13 21 32
Contribution to Layer from 5 to 15: 0 0 3 6 8 10 10
(0 + 0 + 3 + 6 + 8 + 10 + 10) / (2 + 5 + 8 + 11 + 13 + 21 + 32) = 37/92 = 40.2%.
9.11. B. A = (0.9)(E[X ∧ 300,000] - E[X ∧ 100,000]) = (.9)(57,000 - 40,000) = 15,300.

B = (.8)(E[X ∧ 400,000] - E[X ∧ 100,000]) = (.8)(61,000 - 40,000) = 16,800.
B/A = 16,800 / 15,300 = 1.098.
Comment: Both A and B have been calculated per accident. Their ratio does not depend on the
expected number of accidents.
9.12. C. The losses eliminated are: L1 + (N2)(D).

Loss Elimination Ratio is: {L1 + (N2)(D)} / (L1 + L2) = 1 - {L2 - (N2)(D)}/(L1 + L2).
Alternately, each loss of size less than D contributes nothing to the excess losses.
Each loss of size x > D, contributes x - D to the excess losses.
Therefore, the excess losses = L2 - (N2)(D).
Excess Ratio = (Excess Losses)/(Total Losses) = {L2 - (N2)(D)}/(L1 + L2).
Loss Elimination Ratio = 1 - Excess Ratio = 1 - {L2 - (N2)(D)}/(L1 + L2).
2016-C-2, Loss Distributions, §10 Avg. Size in Interval HCM 10/21/15, Page 88
Section 10, Average Size of Losses in an Interval
One might want to know the average size of those losses between $10,000 and $25,000 in size.
For the ungrouped data in Section 1, this is calculated as:
sum of losses of size between $10,000 & $25,000
= $197,300 / 13 = $15,177.
# losses between $10,000 & $25,000
Exercise: For the data in Section 1, what is the average size of loss for those losses of size from
$25,000 to $50,000?
[Solution: (29600 + 32200 + 32500 + 33700 + 34300 + 37300 + 39500 + 39900 + 41200 +
42800 + 45900 + 49200) / 12 = $458,100 / 12 = $38,175.
Comment: The answer had to be between 25,000 and 50,000.]
Note that this concept differs from a layer of loss. Here we are ignoring all losses other than those in
a certain size category. In contrast, losses of all sizes contribute to each layer.
Exercise: An insured has losses of sizes: $300, $600, $1200, $1500, and $2800.
Determine the losses in the layer from $500 to $2500.
[Solution: The loss of size 300 contributes nothing. The loss of size 600 contributes 100.
The loss size 1200 contributes 700. The loss of 1500 contributes 1000.
The loss of 2800 contributes the width of the layer or 2000.
0 + 100 + 700 + 1000 + 2000 = 3800.]
Exercise: An insured has losses of sizes: $300, $600, $1200, $1500, and $2800.
Determine the sum of those losses of size from $500 to $2500.
[Solution: 600 + 1200 + 1500 = 3300.
Comment: The average size of these three losses is: 3300/3 = 1100.]
For a discrete size of loss distribution, the dollars from those losses of size ≤ 10,000 is:
Σ xi Prob[X = xi].
xi ≤10000
For a continuous size of loss distribution, the dollars from those losses of size ≤ 10,000 is:46
10,000
∫0 x f(x) dx = E[X ∧ 10,000] - 10,000 S(10,000).
46
The limited expected value = contribution of small losses + contribution of large losses.
Therefore, contribution of small losses = limited expected value - contribution of large losses.
For a continuous size of loss distribution, the average size of loss for those losses of size less than
or equal to 10,000 is:
10,000
∫ x f(x) dx
0 E[X∧10,000] - 10,000 S(10,000)
= .
F(10,000) F(10,000)
Exercise: For an Exponential Distribution with mean = 50,000, what is the average size of those
losses of size less than or equal to 10,000?
[Solution: E[X ∧ x] = θ (1 - e-x/θ). E[X ∧ 10000] = 50000 (1 - e-1/5) = 9063.
S(x) = e-x/θ. S(10000) = e-1/5 = .8187. {E[X ∧ 10000] - 10000S(10000)}/F(10000) = 4832.]
For a continuous size of loss distribution the average size of loss for those losses of size between
10,000 and 25,000 would be written as:
25,000 25,000 10,000
∫ x f(x) dx ∫ x f(x) dx - ∫ x f(x) dx
10,000 0 0
= =
F(25,000) - F(10,000) F(25,000) - F(10,000)
{E[X∧ 25000] - 25000 S(25000)} - {E[X ∧10000] - 10000 S(10000)}

.
F(25000) - F(10000)
Exercise: For an Exponential Distribution with mean = 50,000, what is the average size of those
losses of size between 10,000 and 25,000?
E[X ∧ 25000] = 50000 (1 - e-1/2) = 19,673.
S(x) = e-x/θ. S(10000) = e-1/5 = 0.8187. S(25000) = e-1/2 = 0.6065
({E[X ∧ 25000] - 25000S(25000)} - {E[X ∧ 10000] - 10000S(10000)}) / {F(25000) - F(10000)} =
({19,673 - (25,000)(0.6065)} - {9063 - (10000)(0.8187)}) / {0.3935 - 0.1813} = 17,127.]
In general, the average size of loss for those losses of size between a and b is:
{E[X ∧ b] - b S(b)} - {E[X∧ a] - a S(a)}

.
F(b) - F(a)
The numerator is the dollars per loss contributed by the losses of size a to b =
(contribution of losses of size 0 to b) minus (contribution of losses of size 0 to a).
The denominator is the percent of losses of size a to b =
(percent of losses of size 0 to b) minus (percent of losses of size 0 to a).
For an Exponential with θ = 50,000, here are the average sizes for various size intervals:
Bottom Top S(Top) Average Size
0 10,000 9,063 81.9% 4,833
10,000 25,000 19,673 60.7% 17,126
25,000 50,000 31,606 36.8% 36,463
50,000 100,000 43,233 13.5% 70,901
100,000 250,000 49,663 0.7% 142,141
250,000 Infinity 50,000 0.0% 300,000
For a Pareto Distribution, S(x) = (θ/(θ+x))α, and E[X ∧ x] = {θ/(α-1)}{1 - (θ/(θ+x))α−1}.

A Pareto Distribution with α = 3 and θ = 100,000, has a mean of: θ/(α-1) = 50,000.
For this Pareto Distribution, here are the average sizes for various size intervals:
Bottom Top S(Top) Average Size
0 10,000 8,678 75.1% 4,683
10,000 25,000 18,000 51.2% 16,863
25,000 50,000 27,778 29.6% 35,989
50,000 100,000 37,500 12.5% 70,270
100,000 250,000 45,918 2.3% 148,387
250,000 Infinity 50,000 0.0% 425,000
Notice the difference between the results for the Pareto and the Exponential Distributions.
Proportion of Dollars of Loss From Losses of a Given Size:
Another quantity of interest, is the percentage of the total losses from losses in a certain size interval.
Proportional of Total Losses from Losses in the Interval [a, b] is:
b
∫ x f(x) dx
a {E[X ∧ b] - b S(b)} - {E[X∧ a] - a S(a)}
= .
E[X] E[X]
Exercise: For an Exponential Distribution with mean = 50,000, what percentage of the total dollars of
those losses come from losses of size between 10,000 and 25,000?
E[X ∧ 25000] = 50000 (1 - e-1/2) = 19,673.
S(x) = e-x/θ. S(10000) = e-1/5 = 0.8187. S(25000) = e-1/2 = 0.6065
({E[X ∧ 25000] - 25000S(25000)} - {E[X ∧ 10000] - 10000S(10000)}) / E[X] =
({19,673 - (25,000)(0.6065)} - {9063 - (10,000)(.8187)}) / 50,000 = 7.3%.]
For an Exponential with θ = 50,000, here are the percentages for various size intervals:
Bottom Top S(Top) Percentage of Total Losses

0 10,000 9,063 81.9% 1.8%
10,000 25,000 19,673 60.7% 7.3%
25,000 50,000 31,606 36.8% 17.4%
50,000 100,000 43,233 13.5% 33.0%
100,000 250,000 49,663 0.7% 36.6%
250,000 Infinity 50,000 0.0% 4.0%
A Pareto Distribution with α = 3 and θ = 100,000, here are the percentages for various size intervals:
Bottom Top S(Top) Percentage of Total Losses

0 10,000 8,678 75.1% 2.3%
10,000 25,000 18,000 51.2% 8.1%
25,000 50,000 27,778 29.6% 15.5%
50,000 100,000 37,500 12.5% 24.1%
100,000 250,000 45,918 2.3% 30.2%
250,000 Infinity 50,000 0.0% 19.8%
Notice the difference between the results for the Pareto and the Exponential Distributions.
Problems:
For each of the following three problems, assume you have a Pareto distribution with parameters
α = 5 and θ = $1000.
10.1 (2 points) What is the average size of those losses less than $500 in size?
A. less than $160
B. at least $160 but less than $170
C. at least $170 but less than $180
D. at least $180 but less than $190
E. at least $190
10.2 (2 points) What is the average size of those losses greater than $500 in size but less than
$2000?
A. less than $800
E. at least $875
10.3 (2 points) Assume you expect 100 losses per year. What is the expected dollars of loss paid
on those losses greater than $500 in size but less than $2000?
A. less than $10,500
B. at least $10,500 but less than $11,000
C. at least $11,000 but less than $11,500
D. at least $11,500 but less than $12,000
E. at least $12,000

• A sample of 5,000 losses contains 1800 that are no greater than $100, 2500 that are
greater than $100 but no greater than $1000, and 700 that are greater than $1000.
• The empirical limited expected value function for this sample evaluated at $100 is $73.
• The empirical limited expected value function for this sample evaluated at $1000 is $450.
Determine the total amount of the 2500 losses that are greater than $100 but no greater than $1000.
A. Less than $1.50 million
B. At least $1.50 million, but less than $1.52 million
C. At least $1.52 million, but less than $1.54 million
D. At least $1.54 million, but less than $1.56 million
E. At least $1.56 million
10.5 (3 points) Severity is LogNormal with µ = 5 and σ = 3.

What is the average size of those losses greater than 20,000 in size but less than 35,000?
A. less than 25,000
E. at least 31,000

x F(x) E[X ∧ x]
$20,000 0.75 $7050
$30,000 0.80 $9340
Determine the average size of those losses of size between $20,000 and $30,000.
E. At least $26,500

• A sample of 3,000 losses contains 2100 that are no greater than $1,000, 830 that are
greater than $1,000 but no greater than $5,000, and 70 that are greater than $5,000.
• The total amount of the 830 losses that are greater than $1,000 but no greater than $5,000
is $1,600,000.
• The empirical limited expected value function for this sample evaluated at $1,000 is $560.
Determine the empirical limited expected value function for this sample evaluated at $5,000.
A. Less than $905
B. At least $905, but less than $915
C. At least $915, but less than $925
D. At least $925, but less than $935
E. At least $935
10.8 (2 points) The random variable for a loss, X, has the following characteristics:
0 0.0 0
100 0.2 91
200 0.6 153
1000 1.0 331
Calculate the average size of those losses of size greater than 100 but less than 200.
(A) 140 (B) 145 (C) 150 (D) 155 (E) 160
10.9 (2 points) Aggregate Losses for an insurer follow a LogNormal Distribution with µ = 15.3 and
σ = 0.8, for which:
Limit ($ million) Limited Expected Value ($ million) Distribution Function
4 3.277 0.4512
7 4.476 0.7180
Determine the average size of aggregate losses in the interval 4 to 7 million.
10.10 (3 points) An insurance pays the following for a loss of size X:

⎧ 0, X < 5000
⎪
⎨0.75 X - 3750, 5000 ≤ X ≤ 50,000
⎪ 0.9X - 11,250, 50,000 < X
⎩
You ae given the following values:

E[X] = 40,000. E[X ∧ 3750] = 3730. E[X ∧ 5000] = 4954.
E[X ∧ 11,250] = 10,799. E[X ∧ 12,500] = 11,899. E[X ∧ 50,000] = 32,612.
Determine the average payment per loss for this insurance.
10.11 (160, 5/88, Q.5) (2.1 points) A population experiences mortality consistent with an
exponential distribution with θ = 10. Calculate the average fraction of the interval (x, x+3] lived by
those who die during the interval.
(A) (1 + e-0.1 + e-0.2 - 3e-0.3) e / {6(1 - e-0.3)}
(B) (1 + e-0.1 + e-0.2 - 3e-0.3) / {3(1 - e-0.3)}
(C) 1/3
(D) (13 - 10e-0.3) / {3(1 - e-0.3)}
(E) (10 - 13e-0.3) /{3(1 - e-0.3)}
A large risk has a lognormal claim size distribution with parameters µ = 8.443 and σ = 1.239.
The insurance agent for the risk settles all claims under $5,000.
(Claims of $5,000 or more are settled by the insurer, not the agent.)
Determine the expected value of a claim settled by the insurance agent.
A. Less than 500
B. At least 500 but less than 1,000
C. At least 1,000 but less than 1,500
D. At least 1,500 but less than 2,000
E. At least 2,000
10.13 (4B, 5/93, Q.33) (3 points) The distribution for claim severity follows a Single Parameter
Pareto distribution of the following form:
f(x) = (3/1000)(x/1000)-4, x > 1000
Determine the average size of a claim between $10,000 and $100,000, given that the claim is
between $10,000 and $100,000.
B. At least $18,000 but less than $28,000
C. At least $28,000 but less than $38,000
D. At least $38,000 but less than $48,000
E. At least $48,000
• One hundred claims greater than 3,000 have been recorded as follows:
Interval Number of Claims
(3,000, 5,000] 6
(5,000, 10,000] 29
(10,000, 25,000] 39
(25,000, ∞) 26
• Claims of 3,000 or less have not been recorded.
• The null hypothesis, H0 , is that claim sizes follow a Pareto distribution,
with parameters α = 2 and θ = 25,000 .
If H0 is true, determine the expected claim size for claims in the interval (25,000, ∞).
A. 12,500 B. 25,000 C. 50,000 D. 75,000 E. 100,000
• Losses follow a distribution (prior to the application of any deductible) with mean 2,000.
• The loss elimination ratio (LER) at a deductible of 1,000 is 0.30.
• 60 percent of the losses (in number) are less than the deductible of 1,000.
Determine the average size of a loss that is less than the deductible of 1,000.
A. Less than 350
E. At least 950
10.1. A. The Limited Expected Value of the Pareto, E[X ∧ x] = {θ/(α-1)} {1-(θ/θ+x)α−1}.
500
∫0 x f(x) dx = E[X ∧ 500] - 500S(500) = 200.6 - 500 (1000/1500)5 = 200.6 - 65.8 = 134.8.
Average size of claim = 134.8 / F(500) = 134.8 / .869 = $155.
2000 2000 500

10.2. B. ∫500 x f(x) dx = ∫0 x f(x) dx -
∫0 x f(x) dx =
E[X ∧ 2000] - 2000S(2000) - {E[X ∧ 500] - 500S(500)} =

{246.9 - 2000 (1000/3000)5 } - {200.6 - 500 (1000/1500)5 } = 238.7 - 134.8 = 103.9
Average Size of Claim = 103.9 / (F(2000)-F(500)) = 103.9 / (0.996 - 0.869) = $818.
2000 2000 500

10.3. A. 100 ∫500 x f(x) dx = 100 ∫0 x f(x) dx - 100
∫
0
x f(x) dx =
100{ {E[X ∧ 2000] - 2000S(2000)} - {E[X ∧ 500] - 500S(500)} } =

100{{246.9 - 2000 (1000/3000)5 } - {200.6 - 500 (1000/1500)5 }} = 100{238.7 - 134.8} =
$10,390. Alternately, one expects 100{F(2000)-F(500)} = 100{.996 - .869} = 12.7 such claims per
year, with an average size of $818, based on the previous problem. Thus the expected dollars of
loss on these claims = (12.7)($818) = $10,389.
10.4. B. The average size of those claims of size between 100 and 1,000 equals :
({E[X ∧ 1000] - 1000S(1000)} - {E[X ∧ 100] - 100S(100)})/{F(1000)-F(100)}
= {(450 - (1000)(700/5000)) - (73 - (100)(3200/5000)}/{(4300/5000)-(1800/5000)} =
(310 - 9) / .5 = $602. Thus these 2500 claims total: (2500)($602) = $1,505,000.
Alternately, (Losses Limited to $100) / (Number of Claims) = E[X ∧ 100] = $73.
Since there are 5000 claims, Losses Limited to $100 = ($73)(5000) = $365,000.
Now there are: 2500 + 700 = 3200 claims greater than $100 in size.
Since these claims contribute $100 each to the losses limited to $100, they contribute a total of:
(3200)($100) = $320,000.
Losses limited to $100 = (losses on Claims ≤$100) + (contribution of claims >$100).
Thus losses on Claims ≤ $100 is: $365,000 - $320,000 = $45,000.
(Losses Limited to $1000) / (Number of Claims) = E[X ∧ 1000] = $450.
Since there are 5000 claims, Losses Limited to $1000 = ($450)(5000) = $2,250,000.
Now there are 700 claims greater than $1000 in size.
Since these claims contribute $1000 each to the losses limited to $1000, they contribute a total of:
(700)($1000) = $700,000.
Losses limited to $1000 = (losses on Claims ≤$1000)+(contribution of Claims >$1000).
Thus losses on Claims ≤$1000 = $2,250,000 - $700,000 = $1,550,000.
The total amount of the claims that are greater than $100 but no greater than $1000 is:
(losses on Claims ≤$1000) - (losses on Claims ≤$100) =
$1,550,000 - $45,000 = $1,505,000.
10.5. B. F(x) = Φ[(lnx − µ)/σ]. F(20000) = Φ[1.63]. F(35000) = Φ[1.82].
E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}.
E[X ∧ x] - xS(x) = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ].

E[X ∧ 20,000] - (20000)S(20000) = (e9.5) Φ[(ln20000 - 5 - 32 )/3]) =13360 Φ[-1.37].
E[X ∧ 35,000] - (35000)S(35000) = (e9.5) Φ[(ln35000 - 5 - 32 )/3]) =13360 Φ[-1.18].
The average size of claims of size between $20,000 and $25,000 is:
(E[X ∧ 35000] - 35000S(35000) - {E[X ∧ 20000] - 20000S(20000)}) /{F(35000)-F(20000)} =
13,360 {Φ[-1.18] - Φ[-1.37]} / {Φ[1.82] - Φ[1.63]} =
13,360 (0.1190 - 0.0853) / (0.9656 - 0.9484) = 26,176.
10.6. D. The average size of those claims of size between 20,000 and 30,000 equals:
({E[X ∧ 30000] - 30000S(30000)} - {E[X ∧ 20000] - 20000S(20000)}) / {F(30000)-F(20000)}
= {(9340-(0.2)(30000)) - (7050-(0.25)(20000))} / (0.80-0.75) = (3340-2050)/0.05 = $25,800.
10.7. B. (Losses Limited to $1000) / (Number of Claims) = E[X ∧ 1000] = $560.

Now there are 830+70 = 900 claims greater than $1000 in size.
Since these claims contribute $1000 each to the losses limited to $1000, they contribute a total of
(900)($1000) = $900,000.
Losses limited to $1000 = (losses on Claims ≤ $1000) + (contribution of claims > $1000).
Thus the losses on claims ≤ $1000 = $1,680,000 - $900,000 = $780,000.
Now the losses on claims ≤ $5000 =
(losses on claims ≤ $1000) + (losses on claims > $1000 and ≤ $5000) =
$780,000 + ($1,600,000) = $2,380,000.
Finally, the losses limited to $5000 =
(the losses on claims ≤ $5000) + (Number of Claims > $5000)($5000) =
$2,380,000 + (70)($5000) = $2,730,000.
E[X ∧ 5000] = (Losses limited to $5000)/(Total Number of Claims) =
$2,730,000 / 3000 = $910.
Alternately, the average size of those claims of size between 1,000 and 5,000 equals :
({E[X ∧ 5000] - 5000S(5000)} - {E[X ∧ 1000] - 1000S(1000)})/{F(5000)-F(1000)}.
We are given that: S(1000) = 900/3000 = 0.30, S(5000) = 70/3000 = 0.0233, E[X ∧ 1000] = 560.
The observed average size of those claims of size 1000 to 5000 is: 1,600,000/ 830 = 1927.7
Setting the observed average size of those claims of size 1000 to 5000 equal to the above
formula for the same quantity:
1927.7 = ({E[X ∧ 5000] - 5000S(5000)} - {E[X ∧ 1000] - 1000S(1000)})/{F(5000)-F(1000)} =
({E[X ∧ 5000] - 5000(0.0233)} - {560 - 1000(0.30))/{0.9767 - 0.70} .
Solving, E[X ∧ 5000] = (1927.7)(0.2767) + 116.5 + 560 - 300 = $910.
10.8. D. Average size of losses between 100 and 200 is:

({E[X ∧ 200] - 200S(200)} - {E[X ∧ 100] - 100S(100)}) / (F(200) - F(100)) =
({153 - (200)(1 - 0.6)} - {91 - (100)(1 - 0.2)}) / (0.6 - 0.2) = (73 - 11)/0.4 = 155.
Comment: Same information as in 3, 11/01, Q.35.
10.9. A. The average size of aggregate losses in the interval 4 to 7 million is:
E[X ∧ 7m] - (7m) S(7m) - {E[X ∧ 4m] - (4m) S(4m)}
=
F(7m) - F(4m)
4.476 - (7)(1 - 0.7180) - {3.277 - (4)(1 - 0.4512)}

= 5.32 million.
0.7180 - 0.4512
Comment: The LogNormal Distribution and how to determine limited expected values for named
distributions are discussed in subsequent sections. Here we just use the given values.
10.10. We can divide the payments into two pieces: (0.75)(X - 5000)+,
plus for X > 50,000: 0.9X - 11,250 - (0.75)(X - 5000) = 0.15 X - 7500 = (0.15)(X - 50,000).
So the payment is: (0.75)(X - 5000)+ + (0.15)(X - 50,000)+
Thus the average payment per loss is:
(0.75)(E[X] - E[X ∧ 5000]) + (0.15)(E[X] - E[X ∧ 50,000]) =
0.9 E[X] - 0.75 E[X ∧ 5000] - 0.15 E[X ∧ 50,000] =
(0.9)(40,000) - (0.75)(4954) - (0.15)(32,612) = 27,393.
Alternately, the average loss per payment is:
50,000 ∞
∫ (0.75x - 3750) f(x) dx +

∫ (0.9x - 11,250) f(x) dx =
5000 50,000
50,000 ∞
0.75 ∫ x f(x) dx - 3750 {S(5000) - S(50,000)} + 0.9
∫ x f(x) dx - 11,250 S(50,000) =
5000 50,000
0.75 {E[X ∧ 50,000] - 50,000 S(50,000) - E[X ∧ 5000] + 5000 S(5000)} - 3750 S(5000)
- 7500 S(50,000) + 0.9 {E[X] - E[X ∧ 50,000] + 50,000 S(50,000)} =
0.9 E[X] - 0.75 E[X ∧ 5000] - 0.15 E[X ∧ 50,000] =
(0.9)(40,000) - (0.75)(4954) - (0.15)(32,612) = 27,393.
10.11. E. The average size for losses of size between x and x + 3 is:
{E[X ∧ (x+3)] - (x+3)S(x+3)} - E[X ∧ x] + xS(x)} / {F(x+3) - F(x)} =
{10(1 - e-(x+3)/10 - (x+3)e-(x+3)/10 - 10(1 - e-x/10) + xe-x/10}/(e-x/10 - e-(x+3)/10) =
{10e-x/10 - 13e-(x+3)/10 - xe-(x+3)/10 + xe-x/10}/{e-x/10(1 - e-.3)} =
(10 - 13e-.3 - xe-.3 + x)/(1 - e-.3) = (10 - 13e-.3)/(1 - e-.3) + x.
The average fraction of the interval (x, x+3] lived by those who die during the interval =
{(The average size for losses of size between x and x + 3) - x} / (x + 3 - x) =
(10 - 13e- 0 . 3) / {3(1 - e- 0 . 3)}.
Alternately, the fraction for someone who dies at age x+ t is: t/3. Average fraction is:
3 3
∫t=0 (t / 3) e-(x + t) / 10 / 10 dt / t=0∫ e- (x+ t) / 10 / 10 dt =

3
t=3
∫t=0 (t / 3) e- (x + t) / 10 / 10 dt = (e- x / 10 / 3) (-te - t / 10 - 10e - t / 10 ]
)
t =0
= 10 - 13e-0.3.
3
t= 3
∫ e- (x+ t) / 10 / 10 dt = (e- x / 10 ) (- e - t / 10 )]
t =0
= 3(1 - e-0.3).
t=0
Thus the answer is: (10 - 13e- 0 . 3) / {3(1 - e- 0 . 3)}.
10.12. E. One is asked for the average size of those claims of size less than 5000. This is:
5000
∫0 x f(x) dx / F(5000) = {E[X ∧ 5000] - 5000(1 - F(5000)} / F(5000).
For this LogNormal Distribution:

F(5000) = Φ[{ln(x) − µ} / σ] = Φ[{ln(5000) - 8.443} / 1.239 ] = Φ[0.06] = .5239.
E[X ∧ 5000] = exp(µ + σ2/2) Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx - µ)/σ]} =

exp(8.443 + 1.2392 /2)Φ[(ln5000 - 8.443 - 1.2392 )/1.239] + 5000 {1-.5238} =
10,002 Φ[-1.18] + 2381 = (10002)(0.1190) + 2381 = 3571.
⇒ {E[X ∧ 5000] - 5000(1 - F(5000)} / F(5000) = (3571 - 2381)/0.5239 = 2271.

10.13. A. F(x) = 1 - (x/ 1000)-3, x > 1000. S(10,000) = 0.001. S(100,000) = 0.000001.
The average size of claim between 10000 and 100000 is the ratio of the dollars of loss on such
claims to the number of such claims:
100,000 100,000
∫ x f(x) dx / {F(10,000) - F(100,000)} = (3 x109 ) ∫ x- 3 dx / 0.000999 =

10,000 10,000
(3.003 x 1012) (1/2) (10,000-2 - 100,000-2) = (1.5015)(10,000 - 100) = 14,865.

Comment: One can get the Distribution Function either by integrating the density function from 1000
to x or by recognizing that this is a Single Parameter Pareto Distribution. Note that in this case, as is
common for a distribution skewed to the right, the average size of claim is near the left end of the
interval rather than near the middle.
10.14. D. F(x) = 1 - (θ/(θ+x))α. S(25000) = 1-F(25000) = {25000/(25000+25000)}2 = 1/4.
E[X ∧ x] = {θ/(α−1)}{1−(θ/(θ+x))α−1}.
E[X ∧ 25000] = {25000/(2-1)} {1-(25000/(25000+25000))2-1} = 25000(1/2) = 12,500.
The expected claim size for claims in the interval (25,000, ∞ ) =
(E[X] - {E[X ∧ 25000] -25000S(25000)})/S(25000) = (25000-(12500-(1/4)(25000))/(1/4) =
(18750)(4) = 75,000.
Alternately, the average payment the insurer would make excess of 25,000, per non-zero such
payment, is {E[X] - E[X ∧ 25000]}/S(25000) = 50,000. Then the expected claim size for claims in
the interval (25,000, ∞ ) is this portion excess of 25,000 plus an additional 25,000 per large claim;
50,000 + 25,000 = 75,000.
10.15. A. LER(1000) = .30. E[X] = 2000. F(1000) = 0.60.

E[X ∧ 1000] = LER(1000)E[X] = (0.30)(2000) = 600.
The average size of those losses less than 1000 is:
{E[X ∧ 1000] - 1000S(1000)} / F(1000) = {600-(1000)(1-0.6)} / 0.6 = (600-400)/0.6 = 333.33.
2016-C-2, Loss Distributions, §11 Grouped Data HCM 10/21/15, Page 102
Section 11, Grouped Data
Unlike the ungrouped data in Section 1, often one is called upon to work with data grouped into
intervals.47 In this example, both the number of losses in each interval and the dollars of loss on
those losses are shown. Sometimes the latter information is missing or sometimes additional
information may be available.
Interval ($000) Number of Losses Total of Losses in the Interval ($000)

0-5 2208 5,974
5 -10 2247 16,725
10-15 1701 21,071
15-20 1220 21,127
20-25 799 17,880
25-50 1481 50,115
50-75 254 15,303
75-100 57 4,893
100 - ∞ 33 4,295
SUM 10,000 157,383
The estimated mean is $15,738.
As will be seen, in some cases one has to deal with grouped data in a somewhat different manner
than ungrouped data.
With modern computing power, the actuary is usually better off working with the data in an
ungrouped format if available. The grouping process discards valuable information. The wider
the intervals, the worse is the loss of information.
47
Note that in this example, for simplicity I have not made a big deal over whether for example the 10-15 interval
includes 15 or not. In many real world applications, in which claims cluster at round numbers, that can be important.
2016-C-2, Loss Distributions, §12 Working with Grouped Data HCM 10/21/15, Page 103
Section 12, Working with Grouped Data
For the Grouped Data in Section 11, it is relatively easy to compute the various items discussed
previously.
The Limited Expected Value can be computed provided the limit is the upper boundary of one of
the intervals. One sums the losses for all the intervals less than the limit, and adds the product of the
number of claims greater than the limit times the limit. For example, the Limited Expected Value at
$25,000 is:
82,777 thousand + (25 thousand) (1825 claims)
= 12.84 thousand
10,000 claims
The Loss Elimination Ratio, LER(x), can be computed provided x is a boundary of an interval.
LER(x) = E[X ∧ x] / mean. So LER(25000) = 12.84 / 15.74 = 81.6%.
The Excess Ratio, R(x), can be computed provided x is a boundary of an interval. The excess
losses are the sum of the losses for intervals greater than x, minus the product of x and the number
of claims greater than x. For example, the losses excess of $75 thousand, are:
(4893 + 4295) - (57 + 33)(75) = 2438 thousand.
R(75,000) = 2438 / 157383 = 1.5%. Also, R(x) = 1 - LER(x) = 1 - E[X ∧ x] / mean.
The mean excess loss, e(x), can be computed provided x is a boundary of an interval.
e(x) is the losses excess of x, divided by the number of claims greater than x.
So for example, using the excess losses computed above,
e(75,000) = 2,438,000 / (57+33) = 27.0 thousand.
e(x) can also computed using e(x) = R(x) mean / S(x) .
Exercise: Compute the Limited Expected Values, Loss Elimination Ratios, Excess Ratios, and
Mean Residual Lives for the Grouped Data in Section 11.
[Solution:
Bottom Top # # Loss Cumu- Severity
LEV(x) LER(x) R(x) e(x)
of of claims claims in lative for
Interval Interval in the > Interval Lossesinterval
Interval Interval Cumu-
x 10000 0 0 0 1 15.7
0 5 2208 7792 5,974 5,974 4.5 28.6% 71.4% 14.4
5 10 2247 5545 16,725 22,699 7.8 49.7% 50.3% 14.3
10 15 1701 3844 21,071 43,770 10.1 64.4% 35.6% 14.6
15 20 1220 2624 21,127 64,897 11.7 74.6% 25.4% 15.2
20 25 799 1825 17,880 82,777 12.8 81.6% 18.4% 15.9
25 50 1481 344 50,115 132,892 15.0 95.4% 4.6% 21.2
50 75 254 90 15,303 148,195 15.5 98.5% 1.5% 27.1
75 100 57 33 4,893 153,088 15.6 99.4% 0.6% 30.2
100 Infinity 33 0 4,295 157,383 15.7 100% 0.0%
10000 157,383
For example, the numerator of the limited expected value at 20,000 is:
contribution of small losses + contribution of large losses =
$5,974,000 + $16,725,000 + $21,071,000 + $21,127,000 +
($20,000)(799 + 1481 + 254 + 57 + 33) = $64,897,000 + ($20,000)(2624) = $117,377,000.
E[X ∧ 20,000] = $117,377,000/10,000 = $11,738.
LER(20,000) = E[X ∧ 20,000]/E[X] = $11,738/$15,738 = 74.6%.
e(20,000) = (E[X] - E[X ∧ 20,000]) / S(20,000) = ($15,738 - $11,738) / (2624/10,000) = $15,244.]
Problems:
Use the following information to answer the next four questions:

Range($) # of claims loss
0-100 6300 $300,000
100-200 2350 $350,000
200-300 850 $200,000
300-400 320 $100,000
400-500 110 $50,000
over 500 70 $50,000
10000 $1,050,000
12.1 (1 point) What is the Loss Elimination Ratio, for a deductible of $200?
A. less than 0.83
E. at least 0.89
12.2 (1 point) What is the Limited Expected Value, for a Limit of $300?
A. less than $95
E. at least $125
12.3 (1 point) What is the empirical mean excess loss at $400?

A. less than $140
E. at least $170
12.4 (1 point) What is the Excess Ratio, excess of $500?

A. less than 1.5%
E. at least 1.8%
12.5 (1 point) Calculate the loss elimination ratio for a $500 deductible.
Loss Size Number of Claims Total Amount of Loss
$0-249 5,000 $1,125,000
250-499 2,250 765,000
500-999 950 640,000
1,000-2,499 575 610,000
2500 or more 200 890,000
Total 8,975 $4,030,000

A. Less than 55.0%
E. 70.0% or more
12.6 (2 points) An individual health insurance policy will pay:

• None of the first $500 of annual medical costs.
• 80% of the next $2500 of annual medical costs.
• 100% of annual medical costs excess of $3000.
Annual Medical Costs Frequency Average Amount
$0 20%
$1-$500 20% $300
$501-$1000 10% $800
$1001-$1500 10% $1250
$1501-$2000 10% $1700
$2001-$2500 10% $2150
$2501-$3000 10% $2600
over $3000 10% $4500
What is the average annual amount paid by this policy?
A. $800 B. $810 C. $820 D. $830 E. $840

Range($) # of claims loss($000)
0 2370 0
1-10,000 1496 4,500
10,001-25,000 365 6,437
25,001-100,000 267 13,933
100,001-300,000 99 16,488
300,001-1,000,000 15 7,207
Over 1,000,000 1 2,050
4613 50,615
Determine the loss elimination ratios at 10,000, 25,000, 100,000, 300,000, and 1 million.
12.8 (2 points) You are given the following data on sizes of loss:
Range # of claims loss
0-99 29 1000
100-199 38 6000
200-299 13 3000
300-399 9 3000
400-499 7 3000
500 or more 4 4000
100 20,000
Determine the empirical limited expected value E[X ∧ 300].
A. 145 B. 150 C. 155 D. 160 E. 165

0 2711 0
1-10,000 1124 3,082
10,001-50,000 372 7,851
50,001-100,000 83 5,422
100,001-300,000 51 7,607
300,001-1,000,000 5 2,050
Over 1,000,000 2 3,000
4348 29,012
Determine the loss elimination ratios at 10,000, 50,000, 100,000, 300,000, and 1 million.
12.10 (3 points) Using the following data, determine the mean excess loss at the endpoints of the
intervals.
Interval Number of claims Dollars of Loss
$1- 10,000 1,600 $16,900,000
$10,001 - 30,000 600 $14,000,000
$30,001 - 100,000 250 $12,500,000
$100,001 - 500,000 48 $5,500,000
Over $500,000 2 $1,100,000
Total 2,500 $50,000,000
12.11 (4B, 5/96, Q.22) (2 points) Forty (40) observed losses have been recorded in thousands
of dollars and are grouped as follows:
Interval Number of Total Losses
($000) Losses ($000)
(1, 4/3) 16 20
[4/3, 2) 10 15
[2, 4) 10 35
[4, ∞) 4 20
Determine the empirical limited expected value function evaluated at 2 (thousand).
A. Less than 0.5
E. At least 2.0
• A sample of 2,000 claims contains 1,700 that are no greater than $6,000, 30 that are
greater than $6,000 but no greater than $7,000, and 270 that are greater than $7,000.
• The total amount of the 30 claims that are greater than $6,000 but no greater than $7,000 is
$200,000.
• The empirical limited expected value function for this sample evaluated at $6,000 is $1,810.
Determine the empirical limited expected value function for this sample evaluated at $7,000.
A. Less than $1,900
E. At least $1,975
12.13 (Course 4 Sample Exam 2000, Q.7) Summary statistics of 100 losses are:
Interval Number Sum Sum of
of Losses Squares
(0, 2000] 39 38,065 52,170,078
(2000, 4000] 22 63,816 194,241,387
(4000, 8000] 17 96,447 572,753,313
(8000, 15000] 12 137,595 1,628,670,023
(15,000, ∞) 10 331,831 17,906,839,238
Total 100 667,754 20,354,674,039
Determine the empirical limited expected value E[X ∧ 15,000].
12.14 (CAS5, 5/00, Q.27) (1 point) Calculate the excess ratio at $100.
(The excess ratio is one minus the loss elimination ratio.)
Loss Size ($) Number of Claims Amount of Losses ($)
0-50 600 21,000
51- 100 500 37,500
101 -250 400 70,000
251 - 500 300 120,000
501-1000 200 150,000
Over 1000 100 200,000
Total 2,100 598,500
A. Less than 0.700
E. At least 0.775
12.15 (CAS5, 5/02, Q.14) (1 point) Based on the following full-coverage loss experience,
calculate the excess ratio at $500. (The excess ratio is one minus the loss elimination ratio.)
Loss Size ($) Number of Claims Amount of Losses ($)
0 - 100 1,100 77,000
101 - 250 800 148,000
251 - 500 500 180,000
501 -1000 350 245,000
1001 - 2000 200 300,000
Over 2000 50 150,000
Total 3,000 1,100,000
A. Less than 0.250
D. At least 0.450, but less than 0,550
E. At least 0.550
12.16 (CAS3, 11/04, Q.26) (2.5 points) A sample of size 2,000 is distributed as follows:
Range Count
0≤X≤6,000 1,700
6,000<X≤7,000 30
7,000<X 270
• The sum of the 30 observations between 6,000 and 7,000 is 200,000.
• For the empirical distribution X, E(X ∧ 6,000) = 1,810.
Determine E(X ∧ 7,000).
A. Less than 1,910
E. At least 1,970
12.17 (CAS5, 5/05, Q.19) (1 point)

Given the following information, calculate the loss elimination ratio at a $500 deductible.
Loss Amount Claim Count Total Loss
Below $500 150 $15,000
$500 6 $3,000
Over $500 16 $22,000
A. Less than 0.4
E. At least 0.7
12.1. D. LER(200) = Losses Eliminated / Total Losses =

{(300,000+350,000) + (200)(850+320+110+70)} / 1,050,000 = $920,000 / $1,050,000 =
0.876.
Alternately, E[X ∧ 200] = {(300,000+350,000) + (200)(850 +320+110+70)} /10,000 = $92.
Mean = $105. Thus, LER(200) = E[X ∧ 200] / mean = $92 / $105= 0.876.
12.2. B. E[X ∧ 300] = {(300,000+350,000+200,000) + (300)(320+110+70)} / 10,000 = 100.
12.3. C. e(400) = (dollars on claims excess of 400) / (# claims greater than 400) - 400 =
{(50000 + 50000) / (110 + 70)} - 400 = 555.56 - 400 = $155.56.
Alternately, e(400) = { mean - E[X ∧ 400] } / {1 - F(400)} = ($105 - $102.2) / .018 = $155.56.
12.4. A. R(500) = (dollars excess of 500) / (total dollars) = {50000 - (70)(500)} / 1,050,000 =
0.0143.
Alternately, R(500) = 1 - E[X ∧ 500] / mean = 1 - $103.5 / $105 = 0.0143.
12.5. D. Losses eliminated = 1,125,000 + 765,000 + (500)(950 + 575 + 200) = 2,752,500.

Loss Elimination Ratio = 2,752,500/4,030,000 = 68.3%.
12.6. D. 80% of the layer from 500 to 3000 is paid, plus 100% of the layer excess of 3000.
Annual Frequency Average Layer Layer Paid
Cost Cost 500 to 3000 excess of 3000
0 20% $0 $0 $0
1 to 500 20% $300 $0 $0 $0
501 to 1000 10% $800 $300 $0 $240
1001 to 1500 10% $1250 $750 $0 $600
1501 to 2000 10% $1700 $1200 $0 $960
2001 to 2500 10% $2150 $1650 $0 $1320
2501 to 3000 10% $2600 $2100 $0 $1680
over 3000 10% $4500 $2500 $1500 $3500
Average $830
12.7. The losses eliminated by a deductible of size 10,000 are in thousands:

4500 + 10(365 + 267 + 99 + 15 + 1) = 11,970. LER(10,000) = 11970/50615 = 23.65%.
Losses LER(x)
of of claims claims in lative forEliminated
$ Thous.
$ Thou. $ Thou. Interval Interval $ Thous. $ Thous.
$ Thous.
0 2370 2243 0 0
1 10 1496 747 4,500 4,500 11,970 23.65%
10 25 365 382 6,437 10,937 20,487 40.48%
25 100 267 115 13,933 24,870 36,370 71.86%
100 300 99 16 16,488 41,358 46,158 91.19%
300 1000 15 1 7,207 48,565 49,565 97.93%
1000 Infinity 1 0 2,050 50,615
4613 50,615
Comment: Data taken from AIA Closed Claim Study (1974) in Table IV of “Estimating Pure
Premiums by Layer - An Approach” by Robert J. Finger, PCAS 1976. Finger calculates excess
ratios, which are one minus the loss elimination ratios.
12.8. D. The contribution to the numerator of E[X ∧ 300] from the claims of size less than 300 is the
sum of their losses: 1000 + 6000 + 3000 = 10,000.
Each claim of size 300 or more contributes 300 to the numerator of E[X ∧ 300]; the sum of their
contributions is: (300)(9 + 7 + 4) = 6000.
E[X ∧ 300] = (10,000 + 6000)/100 = 160.
12.9. The losses eliminated by a deductible of size 100,000 are in thousands:

3082 + 7851 + 5422 + 100(51 + 5 + 2) = 22,155. LER(100,000) = 22155/29012 = 76.36%.
Losses LER(x)
of of claims claims in lative forEliminated
$ Thous.
$ Thou. $ Thou. Interval Interval $ Thous. $ Thous.
$ Thous.
0 2711 1637 0 0
1 10 1124 513 3,082 3,082 8,212 28.31%
10 50 372 141 7,851 10,933 17,983 61.98%
50 100 83 58 5,422 16,355 22,155 76.36%
100 300 51 7 7,607 23,962 26,062 89.83%
300 1000 5 2 2,050 26,012 28,012 96.55%
1000 Infinity 2 0 3,000 29,012
4348 29,012
Comment: Data taken from NAIC Closed Claim Study (1975) in Table VII of “Estimating Pure
Premiums by Layer - An Approach” by Robert J. Finger, PCAS 1976. Finger calculates excess
ratios, which are one minus the loss elimination ratios.
12.10. e(0) = E[X] = $50,000,000 / 2500 = $20,000.

e(10,000) =
($14,000,000 + $12,500,000 + $5,500,000 + $1,100,000) / (600 + 250 + 48 + 2) - 10,000 =
$26,778.
e(30,000) = ($12,500,000 + $5,500,000 + $1,100,000) / (250 + 48 + 2) - 30,000 = $33,667.
e(100,000) = ($5,500,000 + $1,100,000) / (48 + 2) - 100,000 = $32,000.
e(500,000) = $1,100,000 / 2 - 500,000 = $50,000.
12.11. D. Include in the Numerator the small losses at their reported value, while limiting the large
losses to 2 (thousand) each. The Denominator is the total number of claims.
Therefore, E[X ∧ 2] = (20+15+ (10)(2)+ (4)(2))/(16 +10+10+4) = 63/40 = 1.575.
12.12. D. (Losses Limited to $6000) / (Number of Claims) = E[X ∧ 6000] = $1810.

Now there are 30+270 = 300 claims greater than $6000 in size.
Since these claims contribute $6000 each to the losses limited to $6000, they contribute a total of
(300)($6000) = $1,800,000.
Losses limited to $6000 = (losses on claims ≤$6000)+(contribution of claims >$6000). Thus the
losses on claims ≤ $6000 = $3,620,000 - $1,800,000 = $1,820,000.
Now the losses on claims ≤ $7000 =
(losses on claims ≤6000) + (losses on claims > $6000 and ≤ $7000) =
$1,820,000 + ($200,000) = $2,020,000. Finally, the losses limited to $7000 =
(the losses on claims ≤ $7000) + (Number of Claims > $7000)($7000) =
$2,020,000 + (270)($7000) = $3,910,000.
E[X ∧ 7000] = (Losses limited to $7000)/(Total Number of Claims) =
$3,910,000 / 2000 = $1955.
Alternately, the average size of those claims of size between 6,000 and 7,000 equals :
({E[X ∧ 7000] - 7000S(7000)} - {E[X ∧ 6000] - 6000S(6000)})/{F(7000)-F(6000)}.
We are given that: S(6000) = 300/2000 = .15, S(7000) = 270/2000 = .135,
E[X ∧ 6000] = 1810. The observed average size of those claims of size 6000 to 7000 is: 200000/
30 = 6666.7. Setting the observed average size of those claims of size 6000 to 7000 equal to the
above formula for the same quantity:
6666.7 = ({E[X ∧ 7000] - 7000S(7000)} - {E[X ∧ 6000] - 6000S(6000)})/{F(7000)-F(6000)} =
({E[X ∧ 7000] - 7000(.135)} - {1810 - 6000(.15))/{.865 - .85} .
Solving, E[X ∧ 7000] = (6666.7)(.015) + 945 + 1810 - 900 = $1955.
Comment: While Lee Diagrams to be discussed in a subsequent section are not on the syllabus,
this question may also be answered via a Lee Diagram, not to scale, as follows:
$7000
D E
$6000
B C
A
.850 .865 1.000
Probability
A + B + C = E[X ∧ 6000] = $1810
D + B = (Losses on claims of size 6000 to 7000) / (total number of claims) =
$200,000 / 2000 = $100. B = ($6000)(30/2000) = ($6000)(.865 - .850) = $90.
Therefore, D = (D+B) - B = $100 - $90 = $10.
E = ($7000-$6000)(270/2000) = ($1000)(1-.865) = $135.
E[X ∧ 7000] = A + B + C + D + E = $1810 + $10 + $135 = $1955.
12.13. The limited losses are: (dollars from small losses) + (15000)(number of large losses) =
(667,754 - 331,831) + (15000)(10) = 485,923.
E[X ∧ 15,000] = losses limited to 15000)/(number of losses) = 485,923/100 = 4859.
12.14. C. E[X ∧ 100] = {(21000 + 37500) + (100)(400+ 300 + 200 + 100)} / 2100 =
158,500/2100 = 75.48. Mean = 598,500/2100 = 285.
Excess Ratio at 100: R(100) = 1 - E[X ∧ 100]/E[X] = 1 - 75.48/285 = 73.5%.
Alternately, the losses excess of 100, are contributed by the last four intervals:
70,000 - (400)(100) + 120,000 - (300)(100) + 150,000 - (200)(100) + 200,000 - (100)(100) =
30,000 + 90,000 + 130,000 + 190,000 = 440,000.
Excess Ratio at 100: Losses Excess of $100 / Total Losses = 440/598.5 = 73.5%.
12.15. C. E[X ∧ 500] = {(77000 + 148000 + 180000) + (500)(350 + 200 + 50)} / 3000 =
705000/3000 = 235. Mean = 1,100,000/3000 = 366.67.
Excess Ratio at 500: R(500) = 1 - E[X ∧ 500]/E[X] = 1 - 235/366.67 = 35.9%.
Alternately, the losses excess of 500, are contributed by the last three intervals:
245,000 - (350)(500) + 300,000 - (200)(500) + (150,000) - (50)(500) = 395,000.
Excess Ratio at 500: Losses Excess of $500 / Total Losses = 395/1100 = 35.9%.
Comments: Here is the calculation of the excess ratios at various amounts:
Bottom Top # # Loss Cumulative
SeverityLEV(x)LER(x) R(x)
of of claims claims in Losses for
Interval Interval in the > Interval interval
Interval Interval
x 3000 0 0 100.0%
0 100 1100 1900 77,000 77,000 89.0 75.7%
5 250 800 1100 148,000 225,000 166.7 54.5%
10 500 500 600 180,000 405,000 235.0 35.9%
15 1000 350 250 245,000 650,000 300.0 18.2%
20 2000 200 50 300,000 950,000 350.0 4.5%
100 Infinity 50 0 150,000 1,100,000 366.7 0.0%
3000 1,100,000
The Loss Elimination Ratio at $500 is: 1 - 35.9% = 64.1% = 235/366.67 = E[X ∧ 500]/E[X].
12.16. D. (X ∧ 7,000) - (X ∧
6,000) = 0 for X ≤ 6000
X - 6000 for 6,000 < X ≤ 7,000
1000 for 7000 < X.
For the 30 observations between 6,000 and 7,000:
∑ (xi - 6000) = ∑ xi - (30)(6000) = 200,000 - 180,000 = 20,000.

The 270 observations greater than 7,000 contribute to this difference: (270)(1000) = 270,000.
⇒ ∑ (xi ∧ 7000) - (xi ∧ 6000) = 20,000 + 270,000 = 290,000.
⇒ E(X ∧ 7,000) - E(X ∧ 6,000) = 290,000 / (1700 + 30 + 270) = 145.
⇒ E(X ∧ 7,000) = E(X ∧ 6,000) + 145 = 1810 + 145 = 1955.
Alternately, 1810 = E(X ∧ 6,000) = {sum of small losses + (300)(6000)}/2000.
⇒ Sum of losses of size less than 6000 is: (1800)(2000) - (300)(6000) = 1,820,000.
E(X ∧ 7,000) = {1,820,000 + 200,000 + (7000)(270)}/2000 = 1955.
12.17. D. Total Losses: 15,000 + 3000 + 22000 = 40,000.

Losses eliminated: 15,000 + (6 + 16)(500) = 26,000.
Loss Elimination Ratio: 26/40 = 0.65.
2016-C-2, Loss Distributions, §13 Uniform Distribution HCM 10/21/15, Page 117
Section 13, Uniform Distribution
If losses are uniformly distributed on an interval [a, b], then it is equally likely that a loss is anywhere in
that interval. The probability of a loss being in any subinterval is proportional to the width of that
subinterval.
Exercise: Losses are uniformly distributed on the interval [3, 7].

What is the probability that a loss chosen at random will be in the interval [3.6, 3.8]?
[Solution: (3.8 - 3.6) / (7 - 3) = 0.05.]
Uniform Distribution
Support: a ≤ x ≤ b Parameters: None
D. f. : F(x) = (x-a) / (b-a)
P. d. f. : f(x) = 1/ (b-a)
bn + 1 - an + 1
Moments: E[Xn] =
(b - a) (n + 1)
Mean = (b+a)/2 Variance = (b-a)2 / 12
b-a
Coefficient of Variation = Standard Deviation / Mean =
(b + a) 3
Skewness = 0 Kurtosis = 9/5
Median = (b+a)/2
2xb - a2 - x 2
Limited Expected Value Function: E[X ∧ x] = , for a ≤ x ≤ b
2(b - a)
(n +1) xn b - an + 1 - n xn + 1
E[(X ∧ x)n ] = , for a ≤ x ≤ b
(n + 1) (b - a)
e(x) = (b - x)/2, for a ≤ x < b
2
R(x) = (x - b) , for a ≤ x ≤ b
b2 - a2
Exercise: Assume losses are uniformly distributed on the interval [20, 25].
What is mean, second moment., third moment, fourth moment, variance, skewness and kurtosis?
[Solution: The mean is: (20+25)/2 = 22.5.
The second moment is: (b3 - a3 ) / {(b-a)(3)} = (253 - 203 ) / {(25-20)(3)} = 508.3333.
The third moment is: (b4 - a4 ) / {(b-a)(4)} = (254 - 204 ) / {(25-20)(4)} = 11531.25.
The fourth moment is: (b5 - a5 ) / {(b-a)(5)} = (255 - 205 ) / {(25-20)(5)} = 262,625.
Therefore the variance is: 508.333 - 22.52 = 2.08 = (25-20)2 / 12.
The skewness is: {11531.025 - (3)(508.333)(22.5) + 2(22.53 )} / 2.081.5 = 0.
The kurtosis is: {262,625 - (4)(11531.25)(22.5) + (6)(508.3333)(22.52 ) - 3(22.54 )} / 2.082 = 1.8.
Comment: The skewness of a uniform distribution is always zero, since it is symmetric.
The kurtosis of a uniform distribution is always 9/5.]
Discrete Uniform Distribution:
The uniform distribution discussed above is a continuous distribution. It is different than a distribution
uniform and discrete on integers.
For example, assume a distribution uniform and discrete on the integers from 10 to 13 inclusive:
f(10) = 1/4, f(11) = 1/4, f(12) = 1/4, and f(13) = 1/4.
It has mean of: (10 + 11 + 12 + 13)/4 = 11.5 = (10 + 13)/2.

It has variance of: {(10 - 11.5)2 + (11 - 11.5)2 + (12 - 11.5)2 + (13 - 11.5)2 }/ 4 = 1.25.
In general, for a distribution uniform and discrete on the integers from i to j inclusive:
Mean = (i + j)/2
Variance = {(j + 1 - i)2 - 1} / 12.
For i = 10 and j = 13, the variance = {(13 + 1 - 10)2 - 1}/12 = 15/12 = 1.25, matching the previous
result. Note that the variance formula is somewhat different for the discrete case than the continuous
case.48
Exercise: What is the variance of a six-sided die?

[Solution: Uniform and discrete from 1 to 6: variance = {(6 + 1 - 1)2 - 1}/12 = 35/12.]
The variance of an S-sided die is: (S2 - 1)/12.
48
I would not memorize the formula for the variance in the discrete case.
Problems:

The size of claims is uniformly distributed on the interval from 3 to 7.
13.1 (1 point) What is the probability density function at 6?

A. 0.20 B. 0.25 C. 0.30 D. 0.35 E. 0.40
13.2 (1 point) What is the distribution function at 6?

A. 0.60 B. 0.65 C. 0.70 D. 0.75 E. 0.80
13.3 (1 point) What is the mean of the severity distribution?

A. less than 4
E. at least 7
13.4 (1 point) What is the variance of the severity distribution?

A. less than 1.4
E. at least 1.7
13.5 (2 points) What is the limited expected value at 6?

A. less than 4.5
E. at least 4.8
13.6 (1 point) What is the excess ratio at 6?

A. less than 2.0%
E. at least 2.3%
13.7 (1 point) What is the skewness of the severity distribution?

(A) -1.0 (B) -0.5 (C) 0 (D) 0.5 (E) 1.0
13.8 (1 point) What is the mean excess loss at 4, e(4)?

(A) 1.0 (B) 1.5 (C) 2.0 (D) 2.5 (E) 3.0
13.9 (2 points) Losses for a coverage are uniformly distributed on the interval 0 to $10,000.
What is the Loss Elimination Ratio for a deductible of $1000?
A. less than 0.16
E. at least 0.22
13.10 (3 points) X is uniformly distributed on the interval 0 to 10,000.

Determine the covariance of X ∧ 1000 and (X - 1000)+.
E. at least 230,000
13.11 (3 points) X is uniformly distributed on the interval 0 to 10,000.

Determine the correlation of X ∧ 1000 and (X - 1000)+.
A. less than 0.3
E. at least 0.6
13.12 (3 points) X is uniform on [0, 20]. Y is uniform on [0, 30].

X and Y are independent.
Z is the maximum of X and Y.
Determine E[Z].
A. less than 16.0
E. at least 17.5
13.13 (1 point) X and Y are independent. X has a uniform distribution on 0 to 100.

Y has a uniform distribution on 0 to ω. eY(40) = ex(40) - 5. What is ω?
A. 90 B. 95 C. 100 D. 105 E. 110
13.14 (2, 5/83, Q.13) (1.5 points) A box is to be constructed so that its height is 10 inches and its
base is X inches by X inches. If X has a uniform distribution over the interval (2, 8), then what is the
expected volume of the box in cubic inches?
A. 80.0 B. 250.0 C. 252.5 D. 255.0 E. 280.0
13.15 (160, 11/86, Q.5) (2.1 points) A population has a survival density function
f(x) = 0.01, 0 < x < 100. Determine the probability that a life now aged 60 will live longer than a life
now aged 50.
(A) 0.1 (B) 0.2 (C) 0.3 (D) 0.4 (E) 0.5
13.16 (2, 5/88, Q.40) (1.5 points) Let X be a random variable with a uniform distribution on the
interval (1, a) where a > 1. If E(X) = 6 Var(X), then what is a?
A. 2 B. 3 C. 3 2 D. 7 E. 8
13.17 (4B, 11/95, Q.28) (2 points) Two numbers are drawn independently from a uniform
distribution on [0,1]. What is the variance of their product?
A. 1/144 B. 3/144 C. 4/144 D. 7/144 E. 9/144
13.18 (Course 160 Sample Exam #1, 1999, Q.4) (1.9 points)
A cohort of eight fruit flies is hatched at time t = 0. You are given:
(i) The survival distribution for fruit flies is known to be uniform over (0, 10].
(ii) Deaths are observed at times 1, 1, 2, 4, 5, 6, 6 and 7.
(ill) y is the number of fruit flies from the cohort observed to survive past e(0).
For any future cohort of eight fruit flies, determine the probability that exactly y will survive beyond
e(0).
(A) 0.22 (B) 0.23 (C) 0.25 (D) 0.27 (E) 0.28
13.19 (Course 1 Sample Exam, Q.35) (1.9 points) Suppose the remaining lifetimes of a
husband and wife are independent and uniformly distributed on the interval [0, 40].
An insurance company offers two products to married couples:
One which pays when the husband dies; and
One which pays when both the husband and wife have died.
Calculate the covariance of the two payment times.
A. 0.0 B. 44.4 C. 66.7 D. 200.0 E. 466.7
13.20 (1, 5/00, Q.38) (1.9 points) An insurance policy is written to cover a loss, X, where X has a
uniform distribution on [0, 1000]. At what level must a deductible be set in order for the expected
payment to be 25% of what it would be with no deductible?
(A) 250 (B) 375 (C) 500 (D) 625 (E) 750
13.21 (1, 11/01, Q.28) (1.9 points) Two insurers provide bids on an insurance policy to a large
company. The bids must be between 2000 and 2200. The company decides to accept the lower
bid if the two bids differ by 20 or more. Otherwise, the company will consider the two bids further.
Assume that the two bids are independent and are both uniformly distributed on the interval from
2000 to 2200.
Determine the probability that the company considers the two bids further.
(A) 0.10 (B) 0.19 (C) 0.20 (D) 0.41 (E) 0.60
13.22 (1, 11/01, Q.29) (1.9 points) The owner of an automobile insures it against damage by
purchasing an insurance policy with a deductible of 250. In the event that the automobile is
damaged, repair costs can be modeled by a uniform random variable on the interval
(0, 1500). Determine the standard deviation of the insurance payment in the event that the
automobile is damaged.
(A) 361 (B) 403 (C) 433 (D) 464 (E) 521
13.23 (3, 11/02, Q.33) (2.5 points) XYZ Co. has just purchased two new tools with independent
future lifetimes. Each tool has its own distinct De Moivre survival pattern.
One tool has a 10-year maximum lifetime and the other a 7-year maximum lifetime.
Calculate the expected time until both tools have failed.
(A) 5.0 (B) 5.2 (C) 5.4 (D) 5.6 (E) 5.8
13.24 (2 points) In the previous question, calculate the expected time until at least one of the two
tools has failed.
(A) 2.6 (B) 2.7 (C) 2.8 (D) 2.9 (E) 3.0
13.25 (CAS3, 11/03, Q.5) (2.5 points) Given:

i) Mortality follows De Moivre's Law.
ii) eº 20 = 30.
Calculate q20.
A. 1/60 B. 1/70 C. 1/80 D. 1/90 E. 1/100
13.26 (SOA3, 11/03, Q.39) (2.5 points) You are given:

(i) Mortality follows DeMoivreʼs law with ω = 105.
(ii) (45) and (65) have independent future lifetimes.
Calculate eº ____ .
45:65
(A) 33 (B) 34 (C) 35 (D) 36 (E) 37

13.1. B. f(x) = 1/(7-3) = 0.25 for 3 < x < 7.
13.2. D. F(x) = (x-3)/(7-3) = for 3 < x < 7. F(6) = (6-3)/(7-3) = 0.75.
13.3. C. Mean = (3+7)/2 = 5.
7 x= 7
13.4. A. second moment = ∫3 x2 (1/ 4) dx = x / 12 ]
3
x=3
= (343 - 27)/12 = 26.33.
mean = (7+3)/2 = 5. Thus variance = 26.33 - 52 = 1.33.

Alternately, one can use the general formula for the variance of a uniform distribution on [a,b]:
Variance = (b-a)2 /12 = (7-3)2 / 12 = 1.333.
6
13.5. E. E[X ∧ 6] = ∫3 x(1/ 4)dx + 6 S(6) = (36 - 9) / 8 + (6)(1/4) = 4.875.
Alternately, one can use the general formula for E[X ∧ x] of a uniform distribution on [a,b]:
2xb - a2 - x 2 (2)(6)(7) - 32 - 62
E[X ∧ x] = . E[X ∧ 6] = = 4.875.
2(b - a) (2) (7 - 3)
13.6. E. The excess ratio R(6) = 1 - E[X ∧ 6] / E[X] = 1 - 4.875 / 5 = 2.5%.
13.7. C. Since the uniform distribution is symmetric, the skewness is zero.
13.8. B. The losses of size larger than 4, are uniform from 4 to 7. The amount by which they
exceed 4 is uniformly distributed from 0 to 3. e(4) = average amount by which those losses of size
greater than 4, exceed 4 = (0 +3)/2 = 1.5.
Comment: For a uniform distribution from a to b, e(x) = (b-x)/2, for a ≤ x < b.
13.9. C. The overall mean is (10000 + 0) / 2 = 5000. E[X ∧ 1000] =

1000
∫0 (x / 10000) dx + (1000)(9000/10000) = 50 + 900 = 950. LER(1000) = 950 / 5000 = 0.19.

13.10. B. E[X ∧ 1000] = (1/10)(500) + (9/10)(1000) = 950.

E[(X - 1000)+] = E[X] - E[X ∧ 1000] = 5000 - 950 = 4050.
(X ∧ 1000)(X - 1000)+ is 0 for x ≤ 1000 and (1000)(x - 1000) for x > 1000.
10,000
E[(X ∧ 1000)(X - 1000)+] = ∫ (1000)(x - 1000) / 10,000 dx = 90002 /20 = 4,050,000.
1000
Cov[(X ∧ 1000) , (X - 1000)+] = E[(X ∧ 1000)(X - 1000)+] - E[X ∧ 1000]E[(X - 1000)+] =

4,050,000 - (950)(4050) = 202,500.
1000 10,000
13.11. C. E[(X ∧ 1000)2 ] = ∫0 x2 / 10,000 dx +
∫ 10002 / 10,000 dx
1000
= 33,333 + 900,000 = 933,333. Var[X ∧ 1000] = 933,333 - 9502 = 30,833.

10,000
E[(X - 1000)+2 ] = ∫ (x -1000)2 / 10,000 dx = 24,300,000.
1000
Var[(X - 1000)+] = 24,300,000 - 40502 = 7,897,500.

Cov[(X ∧ 1000) , (X - 1000)+] 202,500
Corr[(X ∧ 1000) , (X - 1000)+] = =
Var[X ∧ 1000] Var[(X - 1000)+] (30,833)(7,897,500)
= 0.410.
13.12. D. Prob[Z ≤ z] = Prob[X ≤ z]Prob[Y ≤ z]: (z/20)(z/30) for z ≤ 20, and z/30 for 20 ≤ z ≤ 30.
S(z) = 1 - z2 /600 for z ≤ 20, 1 - z/30 for 20 ≤ z ≤ 30, and 0 for z ≥ 30.
20 30 z = 20 z = 30
Mean = ∫ S(z) dz = ∫0 1 - z2 / 600 dz + ∫ 1 - z / 30 dz = z - z 3 / 1800 ]
z =0
+ z - z2 / 60 ]
z = 20
20
= 20 - 4.444 + 10 - 8.333 = 17.22.

Alternately, f(t) = z/300 for z ≤ 20, 1/30 for 20 ≤ z ≤ 30, and 0 for z ≥ 30.
20 30
Mean = ∫ z f(z)dz = ∫0 z2 / 300 dz + 20∫ z / 30 dz = 8.888 + 8.333 = 17.22.
13.13. A. ex(40) = (100 - 40)/2 = 30. ⇒ 30 - 5 = 25 = eY(40) = (ω - 40)/2. ⇒ ω = 90.

8
13.14. E. E[10X2 ] = ∫2 10x2 / 6 dx = 10(83 - 23)/18 = 280.
13.15. D. The future life time of the life aged 60 is uniform from 0 to 40. The future lifetime of the life
aged 50 is uniform from 0 to 50. If the age 60 dies at time = t, then the probability it lived longer
40
than the life aged 50 is: t/50. ∫0 (t / 50) / 40 dt = 0.4.
13.16. B. E[X] = (1 + a)/2. Var[X] = (a - 1)2 /12. (1 + a)/2 = 6(a - 1)2 /12.
⇒ (1 + a) = (a - 1)2 . ⇒ a2 - 3a = 0. ⇒ a = 3.
13.17. D. For X and Y independent: E[XY] = E[X]E[Y], and E[X2 Y 2 ] = E[X2 ]E[Y2 ].
Var[XY] = E[(XY)2 ] - E2 [XY] = E[X2 Y 2 ] - {E[X]E[Y]}2 = E[X2 ]E[Y2 ] - E[X]2 E[Y]2 .
For the uniform distribution on [0,1], E[X] = 1/2, E[X2 ] = integral from 0 to 1 of f(x)x2 dx = 1/3.
Therefore, Var[XY] = (1/3)(1/3) - (1/4)(1/4) = 1/9 - 1/16 = (16 - 9) /144 = 7 /144.
13.18. A. e(0) = mean = (0 + 10)/2 = 5. Three flies survive past 5, so y = 5.

For the uniform, probability 3 out of 8 survive past time 5 is: {(8!)/(3!5!)} 0.53 0.55 = 0.21875.
13.19. C. Let X be the time of death of the husband. Let Y be the time of death of the wife.
The first policy pays at maximum[X]. The second policy pays at maximum[X, Y].
E[X] = 20.
Prob[Max[X, Y] ≤ t] = Prob[X ≤ t and Y ≤ t] = Prob[X ≤ t] Prob[Y ≤ t] = (t/40)(t/40) = t2 /1600.
40
E[Max[X, Y]] = ∫0 1 - t2 / 1600 dt = 40 - 13.333 = 26.667.
X Max[X, Y] = X2 if X ≥ Y and XY if X < Y.

E[X Max[X, Y] | X = x] = (x2 )Prob[Y ≤ x] + xE[Y | Y > x]Prob[Y > x] =
(x2 )(x/40) + x{(x + 40)/2}(1 - x/40) = x3 /80 + 20x.
40 40
E[X Max[X, Y] ] = ∫0 E[X Max[X, Y] | X = x] f(x) dx = ∫0 (x3 / 80 + 20x)(1/ 40) dx = 600.
Cov[X, Max[X, Y]] = 600 - (20)(26.667) = 66.66.
13.20. C. Expected payment to be 25% of what it would be with no deductible. ⇔

75% of losses eliminated.
0.75 = LER[d] = E[X ∧ d]/E[X] = {(d/2)(d/1000) + d(1 - d/1000)}/500. ⇒ 375 = d - d2 /2000.
⇒ d2 - 2000d + 750,000 = 0. ⇒ d = {2000 ± 4,000,000 - 3,000,000 }/2 = 500.

Comment: The other root of 1500 is greater than 1000 and thus not a solution to the question.
13.21. B. The two variables are within 20 of each other when they are in the southwest to northeast
strip. Area of strip = 2002 - 1802 /2 - 1802 /2 = 7600. 7600/2002 = 19%.
2200
2020
2000
13.22. B. Mean payment is: (1/6)(0) + (5/6)(1250/2) = 520.83.

1500
Second moment of payment is: (1/6)(02 ) +
∫250 (x - 250)2 / 1500 dx = 434028.
Variance of payment is: 434028 - 520.832 = 162764.
Standard Deviation of payment is: 162,764 = 403.
Comment: I have included the probability of a payment of zero due to the deductible.
13.23. E. X is uniform on [0, 10] and Y is uniform on [0, 7].

Probability that both are dead by time t is: (t/10)(t/7) for t ≤ 7, and t/10 for 7 ≤ t ≤ 10.
The corresponding survival function S(t) = 1 - t2 /70, t ≤ 7, 1 - t/10 for 7 ≤ t ≤ 10, 0 for t ≥ 10.
7 10
t= 7 t = 10
Mean = ∫ S(t) dt = ∫0 1 - t2 / 70 dt +
∫7 1 - t / 10 dt = t - t3 / 210 ]
t =0
+ t - t2 / 20]
t =7
= 5.82.
Alternately, the corresponding density function f(t) = t/35, t ≤ 7, 1/10 for 7 ≤ t ≤ 10, 0 for t ≥ 10.
7 10
t= 7 t = 10
Mean = ∫ t f(t) dt = ∫0 t2 / 35 dt +
∫7 t / 10 dt = t3 / 105]
t= 0
+ t2 / 20]
t =7
= 3.267 + 5 - 2.45 = 5.82.
Comment: Last survivor status is discussed in Section 9.4 of Actuarial Mathematics.

One could instead use equation 9.5.4 in Actuarial Mathematics: e° xy = e° x + e° y - e° xy .
13.24. B. X is uniform on [0, 10] and Y is uniform on [0, 7].

Probability that X has not failed by time t is: 1 - t/10, for t ≤ 10.
Probability that Y has not failed by time t is: 1 - t/7, for t ≤ 7.
Probability that neither tool has failed by time t is: (1 - t/10)(1 - t/7) for t ≤ 7.
Corresponding survival function, S(t) = (1 - t/10)(1 - t/7) = 1 - .24286t + t2 /70, for t ≤ 7.
7
t= 7
Mean = ∫ S(t) dt = ∫0 1 - 0.24286t + t2 / 70 dt = t - 0.12143t2 + t3 / 210 ]
t= 0
= 2.68.
One could instead use equation 9.5.4 in Actuarial Mathematics: e° xy = e° x + e° y - e° xy , and the
solution to the previous question: e° xy = e° x + e° y - e° xy = 5 + 3.5 - 5.82 = 2.68.
13.25. A. De Moivre's Law ⇒ the age of death for a life aged 20 is uniform from 20 to ω.
30 = e° 20 = the average future lifetime = (ω - 20)/2 = ω/2 - 10. ⇒ ω = 80.

q20 = {F(21) - F(20)}/S(20) = {21/80 - 20/80}/(60/80) = 1/60.
13.26. B. The life aged 45 has a future lifetime uniform from 0 to 60, e° 45 = 30.
The life aged 65 has a future lifetime uniform from 0 to 40, e° = 20. 65
For t < 40, Prob[both alive] = S(t) = (1 - t/60) (1 - t/40) = 1 - 0.04167t + t2 /2400.
40 40
t = 40
e° 45:65 = ∫0 S(t) dt =
∫0 1 - 0.04167t + t2 / 2400 dt = t - 0.02083t2 + t3 / 7200]
t= 0
= 15.56.
e° = e° 45 + e° 65 - e° 45:65 = 30 + 20 - 15.56 = 34.44.

4 5 :6 5
2016-C-2, Loss Distributions, §14 Statistics Grouped Data HCM 10/21/15, Page 130
Section 14, Statistics of Grouped Data
Since the grouped data in Section 11 displays the losses in each interval, one estimates the mean
as the total losses divided by the total number of claims, 157,383,000 / 10000 = 15,738. If the
losses were not given one could estimate the mean by assuming that each claim in an interval was at
the center of the interval49.
If one were given additional information, such as the sum of squares of the claim sizes, then one
could directly compute the second moment and variance
Exercise: Summary statistics of 100 losses are:

of Losses Squares
(0,2000] 39 38,065 52,170,078
(2000,4000] 22 63,816 194,241,387
(4000,8000] 17 96,447 572,753,313
(8000, 15000] 12 137,595 1,628,670,023
(15,000 ∞) 10 331,831 17,906,839,238
Total 100 667,754 20,354,674,039
Estimate the mean, second moment and variance.
[Solution: The observed mean is: 667,754/100 = 6677.54.
The observed second moment is: (sum of squared loss sizes)/(number of losses) =
20,354,674,039/100 = 203,546,740.39.
The observed variance is: 203,546,740.39 - 6677.542 = 158,957,200.
Comment: In this case, when it comes to the first two moments, we have enough information to
proceed in exactly the same manner as if we had ungrouped data.50]
More generally, we can estimate moments by assuming the losses are uniformly distributed on
each interval.
49
In the case of skewed distributions this will lead to an underestimate of the mean. Also, one would have to guess
what value to assign to the claims in a final interval [c,∞).
50
See Course 4 Sample Examination, Q.8.
Exercise: Given the following grouped data, assuming the losses in each interval are uniformly
distributed, calculate the mean, second moment, third moment and fourth moment.
0 -10 6
10-20 11
20-25 3
[Solution: For each interval [a,b], the nth moment is (bn+1 - an+1) / {(b-a)(n+1)}. (Those for the
interval [20,25] match those calculated in the previous exercise.) Then we weight together the
moments for each interval by the number of claims observed in each interval.
Lower Upper Number First Second Third Fourth
Endpoint Endpoint of Claims Moment Moment Moment Moment
0 10 6 5.00 33.33 250.00 2,000.00
10 20 11 15.00 233.33 3,750.00 62,000.00
20 25 3 22.50 508.33 11,531.25 262,625.00
20 13.12 214.58 3,867.19 74,093.75
For example, {(33.33)(6) + (233.33)(11) + (508.33)(3)} / 20 = 214.58.
Thus the estimated mean, second moment, third moment and fourth moment are : 13.12, 214.58,
3867.19, 74093.75.]
As long as the final interval in which there are claims has a finite upper endpoint, this technique can be
applied to estimate the moments of any grouped data set. The estimates of second and higher
moments may be poor when the intervals are wide and/or the distribution is highly skewed.
These estimates of the moments can then be used to estimate the variance, CV, skewness and
kurtosis.
Exercise: Given the following grouped data, assuming the losses in each interval are uniformly
distributed, calculate the variance, CV, skewness and kurtosis.
0 -10 6
10-20 11
20-25 3
[Solution: From the solution to the previous exercise, the estimated mean, second moment, third
moment and fourth moment are : 13.12, 214.58, 3867.19, 74093.75. Therefore the variance is:
214.58 - 13.122 = 42.45. CV = 42.45.5 / 13.12 = .497.
The skewness is: {3867.19 - (3)(214.58)(13.12) + 2(13.123 )} / 42.451.5 = -.226.
The kurtosis is: {74,093.75- (4)(3867.19)(13.12) + (6)(214.58)(13.122 ) - 3(13.124 )} / 42.452 =
2.15.]
Estimating Statistics of Grouped Data When Given the Losses in Each Interval:
While the syllabus does not discuss how to estimate higher moments for grouped data, when given
the losses in each interval, here is an example of how one can estimate the variance of grouped
data. First one can compute the between interval variance by assuming that all claims in an interval
are at the average. Clearly this underestimates the total variance, because it ignores the variance
within intervals. For narrow intervals this will not produce a major error.
For a uniform distribution over an interval from a to b, the variance is (b-a)2 /12. Thus one can
estimate the within interval variance by computing the weighted average squared width of the
intervals, and dividing by 12.
The total variance can be estimated by adding the between interval variance to the within interval
variance. As an illustrative example, for the Grouped Data in Section 11 the variance could be
estimated as follows:
A B C D E F G H
Interval # claims Loss Severity Square of Col. B x Square of Col. B x
Severity Col. E Interval Width Col. G
0-5 2208 5974 2.7 7.3 16163 25 55200
5 -10 2247 16725 7.4 55.4 124488 25 56175
10-15 1701 21071 12.4 153.4 261015 25 42525
15-20 1220 21127 17.3 299.9 365861 25 30500
20-25 799 17880 22.4 500.8 400118 25 19975
25-50 1481 50115 33.8 1145.1 1695823 625 925625
50-75 254 15303 60.2 3629.8 921976 625 158750
75-100 57 4893 85.8 7368.9 420025 625 35625
over 100 33 4295 130.2 16939.4 559001 10000 330000
10000 157383 476 165
First the between variance is estimated as the difference between the average squared severity
minus the square of the average severity :
Estimated Variance Between Intervals = 476 million - (15,7382) = 228 million.
Next the within variance is estimated by calculating the average squared width of interval. For the
over $100,000 interval we select an equivalent width of about $100,000 (This is based on the
average severity for this interval being $130,000 only 30 thousand more than the lower bound of
the interval; therefore, this is not a very heavier-tailed distribution. For heavier-tailed distributions, the
rare large claims can contribute a significant portion of the overall mean and an even more significant
portion of the variance.)
Estimated Variance Within Intervals = 165 million / 12 = 14 million.
Then the estimated total variance is the sum of the between and within variances:
Estimated Variance = 228 million + 14 million = 242 million.
The estimated coefficient of variation is the estimated standard deviation divided by the estimated
mean. Estimated Coefficient of Variation = 15.6 / 15.7 = 0.99.
While the estimated variance and thus the estimated coefficient of variation is dependent to some
extent on exactly how one corrects for the grouping, particularly how one deals with the last interval,
for this grouped data the coefficient of variation is clearly close to one. (The adjustment for the within
interval variance did not have much effect due to the width of the intervals and the relatively light tail
of this loss distribution.)
Similarly, using the average values for each interval, one can estimate the third moment and thus the
skewness. Taking the weighted average of the cubes of the severities for each interval, using the
number of claims in each interval as the weight, gives an estimate for the third moment of the
grouped data in Section 11: m3 = 2.41 x 1013.
However, by comparing for a uniform distribution the integral over an interval of x3 versus the cube
of the average severity, one can derive a correction term. This correction term which should be
added to the previous estimate of the third moment is:
(square of the width of each interval) (mean severity for each interval) / 4,
taking the weighted average over all intervals, using the number of claims as the weights.
For the grouped data in Section 11, one gets a correction term of 0.22 x 1013, where the last interval
has been assigned an equivalent width of 100,000.
Adding in the correction term gives an estimate of: m3 = 2.63 x 1013.
Using the formula Skewness = {m3 - (3 m1 m2 ) + (2 m1 3 )} / STDDEV3 , and the estimates
m1 = 1.57 x 104 , m2 = variance + mean2 = 4.88 x 108 , standard deviation = 1.56 x 104 , gives an
estimate for the skewness of: 1.10 / 0.38 = 2.9.
Exercise: What is the estimate of the skewness, if in the correction to the third moment one assumes
an equivalent width of 150,000 rather than 100,000?
[Solution: Estimated skewness is 3.2 rather than 2.9.]
The estimated skewness is even more affected than the estimated coefficient of variation by the
loss of information inherent in the grouping of data. However, it is clear that the skewness for the
grouped data in Section 11 is somewhere around three.51
51
One could use the Single Parameter Pareto Distribution in order to sharpen the estimate of the contribution of the
last interval. This can be particularly useful when dealing with more highly skewed distributions.
Problems:

• You are given the following grouped data:
0 -1 16
1-5 54
5-25 23
25-100 7
• There are no reported losses of size greater than 100.
• Assume the losses in each interval are uniformly distributed.
14.1 (1 point) Estimate the mean.

A. less than 10
E. at least 13
14.2 (3 points) Estimate the variance.

A. 260 B. 270 C. 280 D. 290 E. 300
14.3 (3 points) Estimate the skewness.

A. less than 3.5
E. at least 5.0
14.4 (3 points) Estimate the kurtosis.

A. less than 8
E. at least 14
14.5 (2 points) Estimate the limited expected value at 50, E[X ∧ 50].
A. less than 7.0
E. at least 8.5
14.6 (3 points) Estimate the limited second moment at 50, E[(X ∧ 50)2 ].
(A) 195 (B) 200 (C) 205 (D) 210 (E) 215
14.7 (3 points) You are given the following grouped data:

Claim Size Number of Claims
0 to 10 82
10 to 25 133
25 to 50 65
50 to 100 20
There are no reported losses of size greater than 100.
Assume a uniform distribution of claim sizes within each interval.
Estimate the second raw moment of the claim size distribution.
A. less than 500
E. at least 800
Use the following information for the next 3 questions

• Four hundred loss amounts are grouped into four intervals:
Interval Number of Values
0 to 1000 180
1001 to 3000 120
3001 to 5000 60
5001 to 10,000 40
• Assume that losses are uniformly distributed on each interval.
14.8 (2 points) Estimate the expected payment per loss with a policy limit of 2500.
A. 1000 B. 1100 C. 1200 D. 1300 E. 1400
14.9 (3 points) Estimate the expected payment per payment with a deductible of 500.
A. 2000 B. 2200 C. 2400 D. 2600 E. 2800
14.10 (3 points) Estimate the standard deviation.

A. 2000 B. 2200 C. 2400 D. 2600 E. 2800
14.11 (6 points)
You are given the following grouped data on alumni donations to a college in a year:
Donation Size Number of Donations
1 to 99 4300
100 to 499 1850
500 to 999 184
1000 to 2499 173
2500 to 4999 42
5000 to 9999 96
10,000 to 24,999 62
25,000 to 49,999 20
50,000 to 99,999 23
100,000 to 249,999 8
250,000 to 999,999 7
There are no donations of size 1 million or greater.

Estimate the coefficient of variation of the size distribution.
14.12 (3, 11/00, Q.31 & 2009 Sample Q.117) (2.5 points) For an industrywide study of patients
admitted to hospitals for treatment of cardiovascular illness in 1998, you are given:
(i) Duration In Days Number of Patients Remaining Hospitalized
0 4,386,000
5 1,461,554
10 486,739
15 161,801
20 53,488
25 17,384
30 5,349
35 1,337
40 0
(ii) Discharges from the hospital are uniformly distributed between the durations shown in the table.
Calculate the mean residual time remaining hospitalized, in days, for a patient who has been
hospitalized for 21 days.
(A) 4.4 (B) 4.9 (C) 5.3 (D) 5.8 (E) 6.3
14.13 (2 points) In the previous question, 3, 11/00, Q.31, what is the Excess Ratio at 21 days?
(Excess Ratio = 1 - Loss Elimination Ratio.)
(A) 0.5% (B) 0.7% (C) 0.9% (D) 1.1% (E) 1.3%
14.14 (4, 11/01, Q.2 & 2009 Sample Q. 58) (2.5 points) You are given:
0-25 30
25-50 32
50-100 20
100-200 8
Estimate the second raw moment of the claim size distribution.
(A) Less than 3300
(E) At least 3900

(i) Claim Size Number of Claims
(0, 50] 30
(50, 100] 36
(100, 200] 18
(200, 400] 16
(ii) Claim sizes within each interval are uniformly distributed.
(iii) The second moment of the uniform distribution on (a, b] is (b3 - a3 ) / {3(b - a)}.
Estimate E[(X ∧ 350)2 ], the second moment of the claim size distribution subject to a limit of 350.
(A) 18,362 (B) 18,950 (C) 20,237 (D) 20,662 (E) 20,750
14.16 (2 points) In the previous question, estimate Var[X ∧ 350].

14.1. A. For each interval [a,b], the nth moment is: (bn+1 - an+1) / {(b-a)(n+1)}.
Then we weight together the moments for each interval by the number of claims observed in each
interval.
Lower Upper Number First Second Third Fourth
Endpoint Endpoint of Claims Moment Moment Moment Moment
0 1 16 0.50 0.33 0.25 0.20
1 5 54 3.00 10.33 39.00 156.20
5 25 23 15.00 258.33 4,875.00 97,625.00
25 100 7 62.50 4,375.00 332,031.25 26,640,625.00
100 9.525 371.30 24,384.54 1,887,381.88
For example, {(0.33)(16) + (10.33)(54) + (258.33)(23) + (4375)(7)} / 100 = 371.3.
Thus the estimated mean, second moment, third moment and fourth moment are: 9.525, 371.3,
24,384.54, and 1,887,381.88.
14.2. C. The estimated variance = 371.3 - 9.5252 = 280.6.
14.3. A. The estimated skewness = {µ3 ′ - (3 µ1 ′ µ2 ′) + (2 µ1 ′3)} / Variance1.5 =
{24384.54 - (3)(9.525)(371.3) + (2)(9.5253 )}/280.61.5 = 3.30.
14.4. E. The estimated kurtosis = {µ4 ′ - 4µ1 ′µ3 ′ + 6µ1 ′2µ2 ′ - 3µ1 ′4} / Variance2 =
{1887382- (4)(9.525)(24384.54) + (6)(9.5252 )(371.3) - (3)(9.5254 )} / 280.62 = 14.4.
14.5. D. & 14.6. E. Due to the assumption of a uniform distribution from 25 to 100, on average
1/3 of the losses in that interval are from 25 to 50. There are 7 losses in the interval 25 to 100, so
we assume 7/3 losses in the interval 25 to 50 and 14/3 losses in the interval 50 to 100.
The losses of size 50 or more contribute 50 to E[X ∧ 50].
The losses of size 50 or more contribute 502 to E[(X ∧ 50)2 ].
Lower Endpoint Upper Endpoint # Claims 1st Limited Moment at 50 2nd Limited Moment at 50
0 1 16 0.50 0.33
1 5 54 3.00 10.33
5 25 23 15.00 258.33
25 50 2.333 37.50 1,458.33
50 100 4.667 50 2500
100 8.358 215.74
E[X ∧ 50] = {(0.5)(16) + (3)(54) + (15)(23) + (37.5)(7/3) + (50)(14/3)} / 100 = 8.358.
E[(X ∧ 50)2 ] = {(.33)(16) + (10.33)(54) + (258.33)(23) + (1458.33)(7/3) + (2500)(14/3)} / 100 =
215.74.
14.7. E. For each interval [a, b], we assume the losses are uniformly distributed, and therefore the
bn + 1 - an + 1
nth moment is: .
(b - a) (n+ 1)
b3 - a3
The second moment is: .
(b - a) (3)
For example, for the second interval: (253 - 103 ) / {(25 - 10)3} = 325.
Then we weight together the moments for each interval by the number of claims observed in each
interval: {(82)(33.3) + (133)(325) + (65)(1458.3) + (20)(5833.3)}/300 = 858.
Lower Upper Number First Second
Endpoint Endpoint of Claims Moment Moment
0 10 82 5 33.3
10 25 133 18 325.0
25 50 65 38 1,458.3
50 100 20 75 5,833.3
300 22.25 858.1
Comment: Estimated variance = 858.1 - 22.252 = 363.
14.8. D. E[X ∧ 2500] =

(180)(500) + (120)(3 / 4)(1000 + 2500)/ 2 + (2500)(120 / 4 + 60 + 40)
= 1431.
400
(180 / 2)(250) + (500)(180 / 2 + 120 + 60 + 40)

14.9. B. E[X ∧ 500] = = 443.75.
400
(180)(500) + (120)(2000) + (60)(4000) + (40)(7500)
E[X] = = 2175.
400
S(500) = (180/2 + 120 + 60 + 40) / 400 = 0.775.
(E[X] - E[X ∧ 500]) / S(500) = (2175 - 443.75) / 0.775 = 2234.
14.10. B. For each interval [a, b], we assume the losses are uniformly distributed, and therefore the
bn + 1 - an + 1 b3 - a3
nth moment is: . The second moment is: .
(b - a) (n+ 1) (b - a) (3)
For example, for the second interval: (30003 - 10013 ) / {(3000 - 1001)(3)} = 4,335,000.
Then we weight together the moments for each interval
by the number of claims observed in each interval.
0 1,000 180 500.00 333,333
1,001 3,000 120 2,000.50 4,335,000
3,001 5,000 60 4,000.50 16,337,000
5,001 10,000 40 7,500.50 58,340,000
400 2,175.28 9,735,050
The estimated standard deviation is: 9,735,050 - 2175.282 = 2237.
14.11. For each interval [a, b], we assume the losses are uniformly distributed, and therefore the nth
bn + 1 - an + 1 b3 - a3
moment is: . The second moment is: .
(b - a) (n+ 1) (b - a) (3)
For example, for the second interval: (4993 - 1003 ) / {(499 - 100)(3)} = 102,967.
Then we weight together the moments for each interval
by the number of claims observed in each interval.
0 99 4300 50 3,267
100 499 1850 300 102,967
500 999 184 750 582,500
1,000 2,499 173 1,750 3,248,000
2,500 4,999 42 3,750 14,579,167
5,000 9,999 96 7,500 58,325,000
10,000 24,999 62 17,500 324,980,000
25,000 49,999 20 37,500 1,458,291,667
50,000 99,999 23 75,000 5,833,250,000
100,000 249,999 8 175,000 32,499,800,000
250,000 999,999 7 625,000 437,499,250,000
6765 1,688 519,298,986
1 + CV2 = E[X2 ] / E[X]2 = 519,298,986 / 16882 = 182.3.
Estimated CV = 181.3 = 13.5.
14.12. A. Since discharges are uniform from 20 to 25, there are assumed to be:
(4/5)(53488 - 17384) = 28883.2 discharges from 21 to 25.
For a discharge at time t > 21, the contribution to the excess is of 21 is: t - 21.
t is assumed uniform on [25, 30], with mean (25 + 30)/2 = 27.5.
Thus the average contribution to the excess of 21 from the interval [25, 30] is:
E[t - 21] = E[t] - 21 = 27.5 - 21 = 6.5.
If discharges are uniformly distributed on [a, b], with a > 21, then the average contribution to the time
excess of 21 from those patients discharged between a and b is: (b+a)/2 - 21.
For each interval [a, b], the contribution to the time excess of 21 is:
(# who are discharged between a and b)(average contribution to the excess).
Bottom of Top of Average Number Contribution to
Interval Interval Contribution Discharged Time Excess of 21
21 25 2 28,883.2 57,766.4
25 30 6.5 12,035.0 78,227.5
30 35 11.5 4,012.0 46,138.0
35 40 16.5 1,337.0 22,060.5
Sum 46,267.2 204,192.4
e(21) = (total time excess of 21) / (# patients staying more than 21 days) =
204192.4 days / 46267.2 = 4.4 days.
Comment: The number discharged between 30 and 35, is:
(the number remaining at time 30) - (number remaining at time 35) = 5349 - 1337 = 4012.
14.13. C. From the previous solution, the total time excess of 21 days is 204,192 days.
Assuming uniformity on each interval, one can calculate the total number of days as 21,903,260:
Bottom of Top of Number Contribution to
Interval Interval Average Discharged Total Time
0 5 2.5 2,924,446 7,311,115
5 10 7.5 974,815 7,311,112
10 15 12.5 324,938 4,061,725
15 20 17.5 108,313 1,895,478
20 25 22.5 36,104 812,340
25 30 27.5 12,035 330,962
30 35 32.5 4,012 130,390
35 40 37.5 1,337 50,138
Sum 4,386,000 21,903,260
R(21) = (time excess of 21) / total time = 204,192 / 21,903,260 = 0.93%.
14.14. E. The second moment for a uniform distribution on [a, b] is:

b
∫a x2 / (b - a) dx = (b3 - a3)/(3(b-a)). Weight together the 2nd moments for each interval:
0 25 30 12.50 208.33
25 50 32 37.50 1,458.33
50 100 20 75.00 5,833.33
100 200 8 150.00 23,333.33
90 47.500 3,958.33
{(30)(208.33) + (32)(1458.33) + (20)(5833.33) + (8)(23,333.33)}/90 = 3958.33.
Comment: The estimated variance is: 3958.33 - 47.52 = 1702.08.
14.15. E. Since we assume a uniform distribution from 200 to 400, we assume 12 of the 16 claims
in this interval are from 200 to 350, while the remaining 4 claims are from 350 to 400.
Interval Second Moment of Uniform Distribution Number of Claims
0 to 50 (503 - 03 ){(3)(50 - 0)} = 833.33 30
50 to 100 (1003 - 503 ){(3)(100 - 50)} = 5833.33 36
100 to 200 (2003 - 1003 ){(3)(200 - 100)} = 23,333.33 18
200 to 350 (3503 - 2003 ){(3)(350 - 200)} = 77,500 12
The 4 claims of size greater than 350 each contribute 3502 to the numerator of E[(X ∧ 350)2 ].
E[(X ∧ 350)2 ] = {(833.33)(30) + (5833.33)(36) + (23333.33)(18) + (77500)(12) + (3502 )(4)} /
(30 + 36 + 18 + 12 + 4) = 2,075,000/100 = 20,750.
14.16. Again, we assume 12 of the 16 claims in the final interval are from 200 to 350, while the
remaining 4 claims are from 350 to 400.
E[X ∧ 350] = {(25)(30) + (75)(36) + (150)(18) + (275)(12) + (350)(4)} / (30 + 36 + 18 + 12 + 4)
= 10850/100 = 108.5.
Var[X ∧ 350] = E[(X ∧ 350)2 ] - E[X ∧ 350]2 = 20,750 - 108.52 = 8977.75.
2016-C-2, Loss Distributions, §15 Policy Provisions HCM 10/21/15, Page 144
Section 15, Policy Provisions
Insurance policies may have various provisions which determine the amount paid, such as
deductibles, maximum covered losses, and coinsurance clauses.
(Ordinary) Deductible:
An ordinary deductible is a provision which states that when the loss is less than or equal to the
deductible there is no payment, and when the loss exceeds the deductible the amount paid is the
loss less the deductible.52 Unless specifically stated otherwise, assume a deductible is ordinary.
Unless stated otherwise assume the deductible operates per loss.
In actual applications, deductibles can apply per claim, per person, per accident,
per occurrence, per event, per location, per annual aggregate, etc.53
Exercise: An insured suffers losses of size: $3000, $8000 and $17,000.

If the insured has a $5000 (ordinary) deductible, what does the insurer pay for each loss?
[Solution: Nothing, $8000 - $5000 = $3000, and $17,000 - $5000 = $12,000.]
Here is a graph of the payment under an ordinary deductible of 5000:
Payment
20000
15000
10000
5000
Loss
5000 10000 15000 20000 25000
52
53
An annual aggregate deductible is discussed in the section on Stop Loss Premiums in
“Mahlerʼs Guide to Aggregate Losses”
Maximum Covered Loss:54
Maximum Covered Loss ⇔ u

⇔ size of loss above which no additional payments are made
⇔ censorship point from above.
Exercise: An insured suffers losses of size: $2,000, $13,000, $38,000.

If the insured has a $25000 maximum covered loss, what does the insurer pay for each loss?
[Solution: $2,000, $13,000, $25,000.]
Most insurance policies have a maximum covered loss or equivalent. For example, a liability policy
with a $100,000 per occurrence limit would pay at most $100,000 in losses from any single
occurrence, regardless of the total losses suffered by any claimants.
An automobile collision policy will never pay more than the total value of the covered
automobile minus any deductible, thus it has an implicit maximum covered loss.
An exception is a Workersʼ Compensation policy, which provides unlimited medical coverage
to injured workers.55
Coinsurance:
A coinsurance factor is the proportion of any loss that is paid by the insurer after any other
modifications (such as deductibles or limits) have been applied. A coinsurance is a provision
which states that a coinsurance factor is to be applied.
For example, a policy might have a 80% coinsurance factor. Then the insurer pays 80% of what it
would have paid in the absence of the coinsurance factor.
54
See Section 8.5 of Loss Models. Professor Klugman made up the term “maximum covered loss.”
55
While benefits for lost wages are frequently also unlimited, since they are based on a formula in the specific
workersʼ compensation law, which includes a maximum weekly benefit, there is an implicit maximum benefit for lost
wages, assuming a maximum possible lifetime.
Policy Limit:56
Policy Limit ⇔ maximum possible payment on a single claim.
Policy Limit = c (u - d),

where c = coinsurance factor, u = maximum covered loss, and d = deductible.
If c = 90%, d = 1000, and u = 5000, then the policy limit = (90%)(5000 - 1000) = 3600;
if a loss is of size 5000 or greater, the insurer pays 3600.
With a coinsurance factor, deductible, and Policy Limit L:

u = d + L/c.
In the above example, 1000 + 3600/.9 = 5000.
With no deductible and no coinsurance, the policy limit is the same as the maximum
covered loss.
Exercise: An insured has a policy with a $25,000 maximum covered loss, $5000 deductible, and a
80% coinsurance factor. The insured suffers a losses of: $5000, $15,000, $38,000.
How much does the insurer pay?
[Solution: Nothing for the loss of $5000. (.8)(15000 - 5000) = $8000 for the loss of $15,000.
For the loss of $38,000, first the insurer limits the loss to $25,000. Then it reduces the loss by the
$5,000 deductible, $25,000 - $5,000 = $20,000. Then the 80% coinsurance factor is applied:
(80%)($20,000) = $16,000.
Comment: The maximum possible amount paid for any loss, $16,000, is the policy limit.]
If an insured with a policy with a $25,000 maximum covered loss, $5000 deductible, and a
coinsurance factor of 80%, suffers a loss of size x, then the insurer pays:
0, if x ≤ $5000
0.8(x-5000), if $5000 < x ≤ $25,000
$16,000, if x ≥ $25,000
More generally, if an insured has a policy with a maximum covered loss of u, a deductible of d, and a
coinsurance factor of c, suffers a loss of size x, then the insurer pays:
0, if x ≤ d
c(x-d), if d < x ≤ u
c(u-d), if x ≥ u
56
See Section 8.5 of Loss Models. This definition of a policy limit differs from that used by many actuaries.
If an insured has a policy with a policy limit of L, a deductible of d, and a coinsurance factor of c,
suffers a loss of size x, then the insurer pays:
0, if x ≤ d
c(x-d), if d < x ≤ d + L/c
L, if x ≥ d + L/c
Exercise: There is a deductible of $10,000, policy limit of $100,000, and a coinsurance factor of
90%. Let Xi be the individual loss amount of the ith claim and Yi be the claim payment of the ith
claim. What is the relationship between Xi and Yi?
[Solution: The maximum covered loss, u = 10000 + 100000/0.9 = $121,111.
0 Xi ≤ 10,000
Y i = 0.90(Xi - 10,000) 10,000 < Xi ≤ 121,111
100,000 Xi > 121,111.]
Order of Operations:
If one has a deductible, maximum covered loss, and a coinsurance, then on this exam unless stated
otherwise, in order to determine the amount paid on a loss, the order to operations is:
1. Limit the size of loss to the maximum covered loss.

2. Subtract the deductible. If the result is negative, set the payment equal to zero.
3. Multiply by the coinsurance factor.
Franchise Deductible:57
Besides an ordinary deductible, there is the “franchise deductible.” Unless specifically stated
otherwise, assume a deductible is ordinary.
Under a franchise deductible the insurer pays nothing if the loss is less than the deductible
amount, but ignores the deductible if the loss is greater than the deductible amount.
Exercise: An insured suffers losses of size: $3000, $8000 and $17,000. If the insured has a $5000
franchise deductible, what does the insurer pay for each loss?
[Solution: Nothing, $8000, and $17,000.]
57
In Definition 8.2 in Loss Models.
Under a franchise deductible with deductible amount is d, if the insured has a loss of size x, then the
insurer pays:
0 x≤d
x x>d
Thus data from a policy with a franchise deductible is truncated from below at the
deductible amount.58
Therefore under a franchise deductible, the average nonzero payment is:

e(d) + d = {E[X] - E[X ∧ d]}/S(d) + d.59
The average cost per loss is: (average nonzero payment)(chance of nonzero payment) =
{(E[X] - E[X ∧ d])/S(d) + d}S(d) = (E[X] - E[X ∧ d]) + dS(d).60
Here is a graph of the payment under a franchise deductible of 5000:
Payment
25000
20000
15000
10000
5000
Loss
5000 10000 15000 20000 25000
58
See the next section for a discussion of truncation from below (truncation from the right.)
59
60
Definitions of Loss and Payment Random Variables:61
Name Description
ground-up loss62 Losses prior to the impact of any deductible or maximum covered loss;
the full economic value of the loss suffered by the insured
regardless of how much the insurer is required to pay
in light of any deductible, maximum covered loss, coinsurance, etc.
amount paid Undefined when there is no payment due to a deductible or

per payment other policy provision. Otherwise it is the amount paid by
the insurer. Thus for example, data truncated and shifted
from below consists of the amounts paid per payment.
amount paid Defined as zero when the insured suffers a loss but there is
per loss no payment due to a deductible or other policy provision.
Otherwise it is the amount paid by the insurer.
The per loss variable is: 0 if X ≤ d, X if X > d.
The per payment variable is: undefined if X ≤ d, X if X > d.
Loss Models uses the notation YL for the per loss variable and YP for the per payment variable.
Unless stated otherwise, assume a distribution from Appendix A of Loss Models will be used to
model ground-up losses, prior to the effects of any coverage modifications. The effects on
distributions of coverage modifications will be discussed in subsequent sections.
61
62
Sometimes referred to as ground-up unlimited losses.
Problems:

The ABC Bookstore has an insurance policy with a $100,000 maximum covered loss,
$20,000 per loss deductible, and a 90% coinsurance factor.
During the year, ABC Bookstore suffers three losses of sizes: $17,000, $60,000 and $234,000.
15.1 (1 point) How much does the insurer pay in total?

E. at least $110,000
15.2 (1 point) What is the amount paid per loss?

E. at least $50,000
15.3 (1 point) What is the amount paid per payment?

E. at least $50,000
15.4 (2 points) The size of loss is uniform on [0, 400].

Policy A has an ordinary deductible of 100.
Policy B has a franchise deductible of 100.
What is the ratio of the expected losses paid under Policy B to the expected losses paid under
Policy A?
A. 7/6 B. 5/4 C. 4/3 D. 3/2 E. 5/3
15.5 (1 point) An insured suffers 4 losses of size: $2500, $7700, $10,100, and $23,200.
The insured has a $10,000 franchise deductible. How much does the insurer pay in total?
A. less than 32,500
E. at least 34,000

An insurance policy has a deductible of 10,000, policy limit of 100,000, and a coinsurance factor of
80%. (The policy limit is the maximum possible payment by the insurer on a single loss.) During the
year, the insured suffers six losses of sizes: 3000, 8000, 14,000, 80,000, 120,000, and 200,000.
15.6 (2 points) How much does the insurer pay in total?

E. at least 250,000
15.7 (1 point) What is the amount paid per loss?

A. less than 45,000
E. at least 60,000
15.8 (1 point) What is the amount paid per payment?

A. less than 45,000
E. at least 60,000
Use the following size of loss distribution for the next 2 questions:
Size of Loss Probability
100 70%
1000 20%
10,000 10%
15.9 (2 points) If there is an ordinary deductible of 500, what is the coefficient of variation of the
nonzero payments?
A. less than 1.0
E. at least 1.3
15.10 (2 points) If there is a franchise deductible of 500, what is the coefficient of variation of the
nonzero payments?
A. less than 1.0
E. at least 1.3

The Mockingbird Tequila Company buys insurance from the Atticus Insurance Company, with a
deductible of $5000, maximum covered loss of $250,000, and coinsurance factor of 90%.
Atticus Insurance Company buys reinsurance from the Finch Reinsurance Company.
Finch will pay Atticus for the portion of any payment in excess of $100,000.
Let X be an individual loss amount suffered by the Mockingbird Tequila Company.
15.11 (2 points) Let Y be the amount retained by the Mockingbird Tequila Company.
What is the relationship between X and Y?
15.12 (2 points) Let Y be the amount paid by the Atticus Insurance Company to the Mockingbird
Tequila Company, prior to the impact of reinsurance. What is the relationship between X and Y?
15.13 (2 points) Let Y be the payment made by the Finch Reinsurance Company to the Atticus
Insurance Company. What is the relationship between X and Y?
15.14 (2 points) Let Y be the net amount paid by the Atticus Insurance Company after the impact
of reinsurance. What is the relationship between X and Y?
15.15 (2 points) Assume a loss of size x.

Policy A calculates the payment based on limiting to a maximum of 10,000, then subtracting a
deductible of 1000, and then applying a coinsurance factor of 90%.
Policy B instead calculates the payment based on subtracting a deductible of 1000, then limiting it to
a maximum of 10,000, and then applying a coinsurance factor of 90%.
What is the difference in payments between that under Policy A and Policy B?
15.16 (CAS6, 5/94, Q.21) (1 point) Last year, an insured in a group medical plan incurred charges
of $600. This year, the same medical care resulted in a charge of $660. The group comprehensive
medical care plan provides 80% payment after a $100 deductible. Determine the increase in the
insured's retention under his or her comprehensive medical care plan.
A. Less than 7.0%
E. 13.0% or more
15.17 (CAS9, 11/94, Q11) (1 point) Given the following:

W = The amount paid by the insurer.
d = The insured's deductible amount.
X = The total amount of loss
Which of the following describes W, if X > d, and the insured has a franchise deductible?
A. W = X B. W = X - d C. W = (X - d) / d D. W = d E. None of A, B, C, or D
15.18 (CAS6, 5/96, Q.41) (2 points)

You are given the following full coverage experience:
Loss Size Number of Claims Amount of Loss
$ 0-99 1,400 $76,000
$100-249 400 $80,000
$250-499 200 $84,000
$500-999 100 $85,000
$1,000 or more 50 $125,000
Total 2,150 $450,000
(a) (1 point) Calculate the expected percentage reduction in losses for a $250 ordinary deductible.
(b) (1 point) Calculate the expected percentage reduction in losses for a $250 franchise deductible.

Full Coverage Loss Data
Loss Size Number of Claims Amount of Loss
0 - 250 1,500 375,000
250 - 500 1,000 450,000
500 - 750 750 487,500
750-1,000 500 400,000
1,000-1,500 250 312,500
1,500 or more 100 300,000
Total 4,100 2,325,000
15.19 (CAS6, 5/97, Q.32a) (1 point) Calculate the percentage reduction in the loss costs for a
$500 franchise deductible compared to full coverage.
15.20 (1 point) Calculate the percentage reduction in the loss costs for a $500 ordinary deductible
compared to full coverage.
15.21 (CAS9 11/97, Q.36a) (2 points)

An insured is trying to decide which type of policy to purchase:
• A policy with a franchise deductible of $50 will cost her $8 more than a policy with
a straight deductible of $50.
• A policy with a franchise deductible of S100 will cost her $10 more than a policy with
a straight deductible of $100.
An expected ground-up claim frequency of 1.000 is assumed for each of the policies
described above.
Calculate the probability that the insured will suffer a loss between $50 and $100. Show all work.

Loss Size Number of Claims Total Amount of Loss
$0-249 5,000 $1,125,000
250-499 2,250 765,000
500-999 950 640,000
1,000-2,499 575 610,000
2500 or more 200 890,000
Total 8,975 $4,030,000
15.22 (CAS6, 5/98, Q.7) (1 point) Calculate the percentage reduction in loss costs caused by the
introduction of a $500 franchise deductible. Assume there is no adverse selection or padding of
claims to reach the deductible.
A. Less than 25.0%
E. 70.0% or more
15.23 (1 point) Calculate the percentage reduction in loss costs caused by the introduction of a
$500 ordinary deductible. Assume there is no adverse selection or padding of claims to reach the
deductible.
15.24 (1, 5/03, Q.25) (2.5 points) An insurance policy pays for a random loss X subject to a
deductible of C, where 0 < C < 1. The loss amount is modeled as a continuous random variable
with density function f(x) = 2x for 0 < x < 1.
Given a random loss X, the probability that the insurance payment is less than 0.5
is equal to 0.64. Calculate C.
(A) 0.1 (B) 0.3 (C) 0.4 (D) 0.6 (E) 0.8
15.25 (CAS5, 5/03, Q.9) (1 point) An insured has a catastrophic health insurance policy with a
$1,500 deductible and a 75% coinsurance clause. The policy has a $3,000 maximum retention.
If the insured incurs a $10,000 loss, what amount of the loss must the insurer pay?
Note: I have rewritten this past exam question in order to match the current syllabus.
15.26 (CAS3, 5/04, Q.35) (2.5 points) The XYZ Insurance Company sells property insurance
policies with a deductible of $5,000, policy limit of $500,000, and a coinsurance factor of 80%.
Let Xi be the individual loss amount of the ith claim and Yi be the claim payment of the ith claim.
Which of the following represents the relationship between Xi and Yi?
0 Xi ≤ 5,000
A. Yi = .80(Xi - 5,000) 5,000 < Xi ≤ 625,000
500,000 Xi > 625,000
0 Xi ≤ 4,000
B. Yi = .80(Xi - 4,000) 4,000 < Xi ≤ 500,000
500,000 Xi > 500,000
0 Xi ≤ 5,000
C. Yi = .80(Xi - 5,000) 5,000 < Xi ≤ 630,000
500,000 Xi > 630,000
0 Xi ≤ 6,250
D. Yi = .80(Xi - 6,250) 6,250 < Xi ≤ 631,250
500,000 Xi > 631,250
0 Xi ≤ 5,000
E. Yi = .80(Xi - 5,000) 5,000 < Xi ≤ 505,000
500,000 Xi > 505,000
15.27 (SOA M, 5/05, Q.32 & 2009 Sample Q.168) (2.5 points) For an insurance:
(i) Losses can be 100, 200, or 300 with respective probabilities 0.2, 0.2, and 0.6.
(ii) The insurance has an ordinary deductible of 150 per loss.
(iii) YP is the claim payment per payment random variable.
Calculate Var(YP).
(A) 1500 (B) 1875 (C) 2250 (D) 2625 (E) 3000
15.1. D. First the insurer limits each loss to $100,000: 17, 60, 100. Then it reduces each loss by
the $20,000 deductible: 0, 40, 80. Then the 90% coinsurance factor is applied: 0, 36, 72.
The insurer pays a total of 0 + 36 + 72 = $108 thousand.
15.2. B. There are three losses and 108,000 in total is paid: $108,000/3 = $36,000.
15.3. E. There are two (non-zero) payments and 108,000 in total is paid: $108,000/2 = $54,000.
15.4. E. Under Policy A one pays x - 100 for x > 100. 3/4 of the losses are greater than 100, and
those losses have average size (100 + 400)/2 = 250.
Thus under Policy A the expected payment per loss is: (3/4)(250 -100) = 112.5
Under Policy B, one pays x for x > 100.
Thus the expected payment per loss is: (3/4)(250) = 187.5. Ratio is: 187.5/112.5 = 5/3.
15.5. C. The insurer pays: 0 + 0 + $10,100 + $23,200 = $33,300.
15.6. D. Subtract the deductible: 0, 0, 4000, 70,000, 110,000, 190,000.

Multiply by the coinsurance factor: 0, 0, 3200, 56,000, 88,000, 152,000.
Limit each payment to 100,000: 0, 0, 3200, 56,000, 88,000, 100,000.
0 + 0 + 3200 + 56,000 + 88,000 + 100,000 = 247,200.
Alternately, the maximum covered loss is: 10000 + 100000/.8 = 135,000.
Limit each loss to the maximum covered loss: 3000, 8000, 14,000, 80,000, 120,000, 135,000.
Subtract the deductible: 0, 0, 4000, 70,000, 110,000, and 125,000.
Multiply by the coinsurance factor: 0, 0, 3200, 56,000, 88,000, and 100,000.
0 + 0 + 3200 + 56,000 + 88,000 + 100,000 = 247,200.
15.7. A. 247,200/6 = 41,200.
15.8. E. 247,200/4 = 61,800.
15.9. D. The nonzero payments are: 500@2/3 and 9500@1/3.

Mean = (2/3)(500) + (1/3)(9500) = 3500.
2nd moment = (2/3)(5002 ) + (1/3)(95002 ) = 30,250,000.
variance = 30,250,000 - 35002 = 18,000,000.
CV = 18,000,000 / 3500 = 1.212.
15.10. B. The nonzero payments are: 1000@2/3 and 10,000@1/3.

Mean = (2/3)(1000) + (1/3)(10000) = 4000.
2nd moment = (2/3)(10002 ) + (1/3)(100002 ) = 34,000,000.
variance = 34,000,000 - 40002 = 18,000,000. CV = 18,000,000 / 4000 = 1.061.
15.11. Mockingbird retains all of any loss less than $5000.

For a loss of size greater than $5000, it retains $5000 plus 10% of the portion above $5000.
Mockingbird retains the portion of any loss above the maximum covered loss of $250,000.
Y = X, for X ≤ 5000.
Y = 5000 + (0.1)(X - 5000) = 4500 + 0.1X, for 5000 ≤ X ≤ 250,000.
Y = 4500 + (0.1)(250,000) + (X - 250,000) = X - 220,500, for 250,000 ≤ X.
Comment: The maximum amount that Atticus Insurance Company retains on any loss is:
(.9)(250,000 - 5000) = 220,500. Therefore, for a loss X of size greater than 250,000, Mockingbird
retains X - 220,500.
15.12. Atticus Insurance pays nothing for a loss less than $5000. For a loss of size greater than
$5000, Atticus Insurance pays 90% of the portion above $5000.
For a loss of size 250,000, Atticus Insurance pays: (.9)(250,000 - 5000) = 220,500.
Atticus Insurance pays no more for a loss larger than the maximum covered loss of $250,000.
Y = 0, for X ≤ 5000.
Y = (0.9)(X - 5000) = 0.9X - 4500, for 5000 ≤ X ≤ 250,000.
Y = 220,500, for 250,000 ≤ X.
Comment: The amount retained by Mockingbird, plus the amount paid by Atticus to Mockingbird,
equals the total loss.
15.13. Finch Reinsurance pays something when the loss results in a payment by Atticus of more
than $100,000. Solve for the loss that results in a payment of $100,000:
100000 = (0.9)(X - 5000). ⇒ x = 116,111.
Y = 0, for X ≤ 116,111.
Y = (0.9)(X - 116,111) = 0.9X - 104,500, for 116,111 < X ≤ 250,000.
Y = 120,500, for 250,000 ≤ X.
15.14. For a loss greater than 116,111, Atticus pays 100,000 net of reinsurance.
Y = 0, for X ≤ 5000.
Y = (0.9)(X - 5000) = 0.9X - 4500, for 5000 ≤ X ≤ 116,111.
Y = 100,000, for 116,111 < X.
15.15. Policy A: (0.9) (Min[x, 10,000] - 1000)+ = (0.9) (Min[x - 1000, 9000])+
= (Min[0.9x - 900, 8100])+
Policy B: (0.9) Min[(x - 1000)+, 10,000] = Min[(0.9x - 900)+, 9000]
= (Min[0.9x - 900, 9000])+.
Policy A - Policy B = (Min[0.9x - 900, 8100])+ - (Min[0.9x - 900, 9000])+.
If x ≥ 11,000, then this difference is: 8100 - 9000 = -900.
If 11,000 > x > 10,000, then this difference is: 8100 - (0.9x - 900) = 9000 - 0.9x.
If 10,000 ≥ x > 1000, then this difference is: (0.9x - 900) - (0.9x - 900) = 0.
If x ≤ 1000, then this difference is: 0 - 0 = 0.
Comment: Policy A follows the order of operations you should follow on your exam, unless
specifically stated otherwise.
A graph of the difference in payments between that under Policy A and Policy B:
2000 4000 6000 8000 10000 12000 14000
- 200
- 400
- 600
- 800
I would attack this type of problem by just trying various values for x.
Here are some examples:
x Payment Under A Payment Under B Difference
12,000 8100 9000 -900
10,300 8100 8370 -270
8000 6300 6300 0
700 0 0 0
15.16. A. Last year, the insured gets paid: (80%)(600 - 100) = 400.
Insured retains: 600 - 400 = 200.
This year, the insured gets paid: (80%)(660 - 100) = 448.
Insured retains: 660 - 448 = 212.
Increase in the insured's retention is: 212 / 200 - 1 = 6.0%.
15.17. A. For a loss greater than the deductible amount, the franchise deducible pays the full loss.
15.18. (a) Losses eliminated: 76,000 + 80,000 + (250)(200 + 100 + 50) = $243,500.
$243,500/$450,000 = 0.541 = 54.1% reduction in expected losses.
(b) Under the franchise deductible, we pay the whole loss for a loss of size greater than 250.
Losses eliminated: 76,000 + 80,000 = $156,000.
$156,000/$450,000 = 0.347 = 34.7% reduction in expected losses.
15.19. (375,000 + 450,000) / 2,325,000 = 35.5%.
15.20. {375,000 + 450,000 + (500)(750 + 500 + 250 + 100)} / 2,325,000 = 69.9%.
15.21. When a payment is made, the $50 franchise deductible pays $50 more than the $50
straight deductible. Therefore, $8 = (1.000) S(50) ($50). ⇒ S(50) = 16%.
When a payment is made, the $100 franchise deductible pays $100 more than the $100 straight
deductible. Therefore, $10 = (1.000) S(100) ($100). ⇒ S(100) = 10%.
The probability that the insured will suffer a loss between $50 and $100 is:
S(50) - S(100) = 16% - 10% = 6%.
15.22. C. (1125000 + 765000)/4030000 = 46.9%.
15.23. {1125000 + 765000 + (500)(950 + 575 + 200)}/4030000 = 68.3%.
15.24. B. F(x) = x2 .
0.64 = Prob[payment < 0.5] = Prob[X - C < 0.5] = Prob[X < 0.5 + C] = (0.5 + C)2 .
⇒ C = 0.8 - 0.5 = 0.3.
15.25. Without the maximum retention the insurer would pay: (10,000 - 1500)(0.75) = 6375.
In that case the insured would retain: 10,000 - 6375 = 3625.
However, the insured retains at most 3000, so the insurer pays: 10,000 - 3000 = 7000.
15.26. C. The policy limit is: c(u-d), where u is the maximum covered loss.
Therefore, u = d + (policy limit)/c = 5000 + 500,000/0.8 = 630,000.
Therefore the payment is: 0 if X ≤ 5,000, 0.80(X - 5,000) if 5,000 < X ≤ 630,000,
and 500,000 if X > 630,000.
Comment: For a loss of size 3000 nothing is paid. For a loss of size 100,000, the payment is:
.8(100,000 - 5000) = 76,000. For a loss of size 700,000, the payment would be:
.8(700,000 - 5000) = 556,000, except that the maximum payment is the policy limit of 500,000.
Increasing the size of loss above the maximum covered loss of 630,000, results in no increase in
payment beyond 500,000.
15.27. B. Prob[YP = 50] = 0.2/(0.2 + 0.6) = 1/4. Prob[YP = 150] = 0.6/(0.2 + 0.6) = 3/4.
E[YP] = (50)(1/4) + (150)(3/4) = 125. E[(YP)2 ] = (502 )(1/4) + (1502 )(3/4) = 17,500.
Var[YP] = 17,500 - 1252 = 1875.
2016-C-2, Loss Distributions, §16 Truncated Data HCM 10/21/15, Page 162
Section 16, Truncated Data
The ungrouped data in Section 1 is assumed to be ground-up (first dollar) unlimited losses, on all
loss events that occurred. By “first dollar”, I mean that we start counting from the first dollar of
economic loss, in other words as if there were no deductible. By “unlimited” I mean we count every
dollar of economic loss, as if there were no maximum covered loss.
Sometimes some of this information is not reported, most commonly due to a deductible and/or
maximum covered loss.
There are four such situations likely to come up on your exam, each of which has two names:
left truncation ⇔ truncation from below
left truncation and shifting ⇔ truncation and shifting from below
left censoring and shifting ⇔ censoring and shifting from below
right censoring ⇔ censoring from above.
In the following, ground-up, unlimited losses are assumed to have distribution function F(x).
G(x) is what one would see after the effects of either a deductible or a maximum covered loss.
Left Truncation / Truncation from Below:63
Left Truncation ⇔ Truncation from Below at d ⇔

deductible d and record size of loss for size > d.
For example, the same data as in Section 1, but left truncated or truncated from below at
$10,000, would have no information on the first eight losses, each of which resulted in less than
$10,000 of loss. The actuary would not even know that there were eight such losses.64 The same
information would be reported as shown in Section 1 on the remaining 122 large losses.
When data is truncated from below at the value d, losses of size less than d are not in the reported
data base.65 This generally occurs when there is an (ordinary) deductible of size d, and the insurer
records the amount of loss to the insured.
How is this data reported, if it is left truncated / truncated from below at $1000?
[Solution: $1,200, $1,500 and $2,800 .
Comment: The two smaller losses are never reported to the insurer.]
63
The terms “left truncation” and “truncation from below” are synonymous.
64
This would commonly occur in the case of a $10,000 deductible.
65
Note that the Mean Excess Loss, e(x), is unaffected by truncation from below at d, provided x > d.
The distribution function and the probability density functions are revised as follows:
F(x) - F(d)
G(x) = ,x>d
S(d)
g(x) = f(x) / S(d), x > d
x ⇔ the size of loss.
Thus the data truncated from below has a distribution function which is zero at d and 1 at infinity.
The revised probability density function has been divided by the original chance of having a loss of
size greater than d. Thus for the revised p.d.f. the probability from d to infinity integrates to unity as it
should.
Note that G(x) = {F(x) - F(d)} / S(d) = (S(d) - S(x))/S(d) = 1 - S(x)/S(d), x> d.
1 - G(x) = S(x)/S(d). The revised survival function after truncation from below is the survival function
prior to truncation divided by the survival function at the truncation point.
Both data truncated from below and the mean excess loss exclude the smaller losses. In order to
compute the mean excess loss, we would take the average of the losses greater than d, and then
subtract d. Therefore, the average size of the data truncated from below at d, is d plus the mean
excess loss at d, e(d) + d.
What is the average size of the data reported, if it is truncated from below at $1000?
[Solution: ($1,200+ $1,500 + $2,800) /3 = 1833.33 = 1000 + 833.33 = 1000 + e(1000).]
Franchise Deductible:
Under a franchise deductible the insurer pays nothing if the loss is less than the deductible amount,
but ignores the deductible if the loss is greater than the deductible amount. If the deductible amount
is d and the insured has a loss of size x, then the insurer pays:
0 x≤d
x x>d
Thus data from a policy with a franchise deductible is truncated from below at the
deductible amount.
Left Truncation and Shifting / Truncation and Shifting from Below:
Left Truncation & Shifting at d ⇔ Truncation & Shifting from Below at d

⇔ Excess Loss Variable ⇔ deductible d and record non-zero payment.
If the data in Section 1 were truncated and shifted from below at $10,000, the data on these
remaining 122 losses would have each amount reduced by $10,000. For the ninth loss with
$10,400 of loss, $400 would be reported66. When data is truncated and shifted from below at the
value d, losses of size less than d are not in the reported data base, and larger losses have their
reported values reduced by d. This generally occurs when there is an (ordinary) deductible of size
d, and the insurer records the amount of payment to the insured.
How is this data reported, if it is truncated and shifted from below at $1000?
[Solution: $200, $500 and $1,800.]
The distribution, survival, and the probability density functions are revised as follows:
F(x + d) - F(d)
G(x) = , x > 0. New survival function = 1 - G(x) = S(x+d)/S(d), x > 0
S(d)
g(x) = f(x+d) / S(d), x > 0
x ⇔ the size of (non-zero) payment. x+d ⇔ the size of loss.
As discussed previously, the Excess Loss Variable for d is defined for X > d as X-d and is
undefined for X ≤ d, which is the same as the effect of truncating and shifting from below at d.
Exercise: Prior to a deductible, losses are Weibull with τ = 2 and θ = 1000.

What is the probability density function of the excess loss variable corresponding to d = 500?
[Solution: For the Weibull, F(x) = 1 - exp(-(x/θ)τ) and f(x) = τ xτ−1 exp(-(x/θ)τ) / θτ.
S(500) = exp[-(500/1000)2 ] = .7788. Let Y be the truncated and shifted variable.
Then, g(y) = f(500 + y)/S(500) = (y+500) exp[-((y+500)/1000)2 ] / 389,400, y > 0.]
The average size of the data truncated and shifted from below at d, is the mean excess loss (mean
residual life) at d, e(d).
66
With a $10,000 deductible, the insurer would pay $400 while the insured would have to absorb $10,000.
What is the average size of the data reported, if it is truncated and shifted from below at $1000?
[Solution: ($200+ $500 + $1,800) /3 = 833.33 = e(1000).]
Complete Expectation of Life:67
These ideas are mathematically equivalent to ideas discussed in Life Contingencies.
The expected future lifetime for an age x is e° x , the complete expectation of life.
e° x is the mean residual life (mean excess loss) at x, e(x).
e° x is the mean of the lives truncated and shifted from below at x.
Exercise: Three people die at ages: 55, 70, 80. Calculate e° 65 .

[Solution: Truncate and shift the data at 65; eliminate any ages ≤ 65 and subtract 65:
70 - 65 = 5, 80 - 65 = 15.
Average the truncated and shifted data: e° 65 = (5 + 15)/2 = 10.]
The survival function at age x + t for the data truncated and shifted at x is: S(x+t)/S(x) = tp x.
As will be discussed in a subsequent section, one can get the mean by integrating the survival
function.
∞
e° x = mean of the data truncated and shifted at x = ∫0 tpx dt .
68
67
See Actuarial Mathematics, not on the syllabus of this exam.
68
See equation 3.5.2 in Actuarial Mathematics.
Right Truncation / Truncation from Above:69
In the case of the data in Section 1, right truncated or truncated from above at $25,000, there
would be no information on the 109 losses larger than $25,000. Truncation from above contrasts
with data censored from above at $25,000, which would have each of the 109 large losses in
Section 1 reported as being $25,000 or more.70
When data is right truncated or truncated from above at the value L, losses of size greater than L are
not in the reported data base.71
How is this data reported, if it is truncated from above at $1000?
[Solution: $300 and $600.]
G(x) = F(x) / F(L), x ≤ L

g(x) = f(x) / F(L), x ≤ L
The average size of the data truncated from above at L, is the average size of losses from 0 to L,
E[X ∧ L] - L S(L)
.
F(L)
What is the average size of the data reported, if it is truncated from above at $1000?
[Solution: ($300 + $600)/2 = $450.]
Truncation from above would be appropriate where one thought there was a maximum
possible loss.72
69
70
Under censoring from above, one would not know the total size of loss for each of these large losses. This is quite
common for data reporting when there are maximum covered losses.
71
Right truncation can happen when insufficient time has elapsed to receive all of the data. For example, one might
be doing a mortality study based on death records, which would exclude from the data anyone who has yet to die.
Right truncation can also occur when looking at claim count development. One might not have data beyond a given
“report” and fit a distribution function (truncated from above) to the available claim counts by report.
See for example, “Estimation of the Distribution of Report Lags by the Method of Maximum Likelihood,”
by Edward W. Weisner, PCAS 1978.
72
See for example, “A Note on the Upper-Truncated Pareto Distribution,” by David R. Clark,
Winter 2013 CAS E-Forum. The size of loss from any type of event has some upper limit, even if very big.
Truncation from Both Above and Below:
When data is both truncated from above at L and truncated from below at the value d, losses of size
greater than L or less than or equal to d are not in the reported data base.
Exercise: Events occur at times: 3, 6, 12, 15, 24, and 28.

How is this data reported, if it is truncated from below at 10 and truncated from above at 20?
[Solution: 12 and 15.
Comment: One starts observing at time 10 and stops observing at time 20.]
F(x) - F(d)
G(x) = , d<x≤L
F(L) - F(d)
f(x)
g(x) = , d<x≤L
F(L) - F(d)
Note that whenever we have truncation, the probability remaining after truncation is the denominator
of the altered density and distribution functions.
The average size of the data truncated from below at d and truncated from above at L, is the
{E[X ∧L] - L S(L)} - {E[X ∧ d] - d S(d)}
average size of losses from d to L, .
F(L) - F(d)
Exercise: Events occur at times: 3, 6, 12, 15, 24, and 28. What is the average size of the data
reported, if it is truncated from below at 10 and truncated from above at 20?
[Solution: (12 + 15)/2 = 13.5.]
Problems:

There are 6 losses: 100, 400, 700, 800, 1200, and 2300.
16.1 (1 point) If these losses are truncated from below (left truncated) at 500, what appears in the
data base?
16.2 (1 point) If these losses are truncated and shifted from below (left truncated and shifted) at 500,
what appears in the data base?
16.3 (1 point) If these losses are truncated from above (right truncated) at 1000, what appears in
the data base?
16.4 (1 point) If these losses are truncated from below (left truncated) at 500 and truncated from
above (right truncated) at 1000, what appears in the data base.

Losses are uniformly distributed from 0 to 1000.
16.5 (1 point) What is the distribution function for these losses left truncated at 100?
16.6 (1 point) What is the distribution function for these losses left truncated and shifted at 100?
1
16.7 (1 point) Assume that claims follow a distribution, F(x) = 1 - .
{1 + (x / θ)γ}α
Which of the following represents the distribution function for the data truncated from below at d?
⎛ θ γ + d γ⎞ α
A. ⎜ ⎟ x>d
⎝ θ γ + x γ⎠
⎛ θ γ + d γ⎞ α
B. 1 - ⎜ ⎟ x>d
⎝ θ γ + x γ⎠
⎛ θγ ⎞ α
C. ⎜ ⎟ x>d
⎝ θ γ + x γ⎠
⎛ θγ ⎞ α
D. 1 - ⎜ ⎟ x>d
⎝ θ γ + x γ⎠
16.8 (1 point) The report lag for claims is assumed to be exponentially distributed:
F(x) = 1 - e-λx , where x is the delay in reporting.
What is the probability density function for data truncated from above at 3?
A. 3e-3x B. λe-3λ C. e-λx / (1 - e-3λ) D. λe-λx / (1 - e-3λ) E. λe-λ(x-3)

• Losses follow a Distribution Function F(x) = 1 - {2000 / (2000+x)}3 .
16.9 (1 point) If the reported data is truncated from below at 400, what is the Density Function at
1000?
A. less than 0.0004
E. at least 0.0007
16.10 (1 point) If the reported data is truncated from above at 2000, what is the Density Function at
1000?
A. less than 0.0004
E. at least 0.0007
16.11 (1 point) If the reported data is truncated from below at 400 and truncated from above at
2000, what is the Density Function at 1000?
A. less than 0.0004
E. at least 0.0007

There are 3 losses: 800, 2500, 7000.
16.12 (1 point) If these losses are truncated from below (left truncated) at 1000, what appears in the
data base?
16.13 (1 point) If these losses are truncated and shifted from below (left truncated and shifted) at
1000, what appears in the data base?

The probability density function is: f(x) = x/50, 0 ≤ x ≤ 10.
16.14 (2 points) Determine the mean of this distribution left truncated at 4.

A. 7.3 B. 7.4 C. 7.5 D. 7.6 E. 7.7
16.15 (2 points) Determine the median of this distribution left truncated at 4

A. 7.3 B. 7.4 C. 7.5 D. 7.6 E. 7.7
16.16 (2 points) Determine the variance of this distribution left truncated at 4

A. 2.8 B. 3.0 C. 3.2 D. 3.4 E. 3.6
16.17 (4, 5/85, Q.56) (2 points) Let f be the probability density function of x, and let F be the
distribution function of x. Which of the following expressions represent the probability density
function of x truncated and shifted from below at d?
⎧ 0, x ≤ d
A. ⎨
⎩f(x) / {1 - F(d)}, d < x
⎧ 0, x ≤ 0
B. ⎨
⎩f(x) / {1 - F(d)}, 0 < x
⎧ 0, x ≤ d
C. ⎨
⎩f(x - d) / {1 - F(d)}, d < x
⎧ 0, x ≤ - d
D. ⎨
⎩f(x + d) / {1 - F(d)}, - d < x
⎧ 0, x ≤ 0
E. ⎨
⎩f(x + d) / {1 - F(d)}, 0 < x
• Based on observed data truncated from above at $10,000,
the probability of a claim exceeding $3,000 is 0.30.
• Based on the underlying distribution of losses, the
probability of a claim exceeding $10,000 is 0.02.
Determine the probability that a claim exceeds $3,000.
A. Less than 0.28
E. At least 0.34
16.1. If these losses are truncated from below (left truncated) at 500, the two small losses do not
appear: 700, 800, 1200, 2300.
16.2. If these losses are truncated and shifted from below at 500, the two small losses do not
appear and the other losses have 500 subtracted from them: 200, 300, 700, 1800.
Comment: The (non-zero) payments with a $500 deductible.
16.3. If these losses are truncated from above (right truncated) at 1000, the two large losses do not
appear: 100, 400, 700, 800.
16.4. Neither the two small losses nor the two large losses appear: 700, 800.
16.5. Losses of size less than 100 do not appear.

G(x) = (x - 100)/900 for 100 < x < 1000.
Alternately, F(x) = x/1000, 0 < x < 1000 and G(x) = {F(x) - F(100)} / S(100) =
{(x/1000) - 100/1000)} / (1 - 100/1000) = (x - 100)/900 for 100 < x < 1000.
16.6. Losses of size less than 100 do not appear. We record the payment amount with a $100
deductible. G(x) = x/900 for 0 < x < 900.
Alternately, F(x) = x/1000, 0 < x < 1000 and G(x) = {F(x + 100) - F(100)}/S(100) =
{((x + 100)/1000) - 100/1000)}/(1 - 100/1000) = x/900 for 0 < x < 900.
Comment: A uniform distribution from 0 to 900.
16.7. B. The new distribution function is for x > d: {F(x) - F(d)} / {1 - F(d)} =
[{θγ / (θγ + dγ)}α - {θγ / (θγ + xγ)}α] / {θγ / (θγ + dγ)}α = 1- {(θγ + dγ) / (θ γ + xγ)}α.
Comment: A Burr Distribution.
16.8. D. The Distribution Function of the data truncated from above at 3 is G(x) = F(x)/F(3) =
(1 - e-λx) / (1 - e-3λ). The density function is g(x) = Gʼ(x) = λe-λx / (1 - e-3λ).
16.9. C. Prior to truncation the density function is: f(x) = (3)(2000)3 / (2000+x)4 .
After truncation from below at 400, the density function is: g(x) = f(x) / {1 - F(400)} =
f(x) / {2000 / (2000+400)}3 = f(x) / 0.5787.
f(1000) = (3)(2000)3 / (2000+1000)4 = 0.000296.
g(1000) = 0.000296 / 0.5787 = 0.00051.
16.10. A. Prior to truncation the density function is: f(x) = (3)(2000)3 / (2000+x)4 .
After truncation from above at 2000, the density function is:
g(x) = f(x) / F(2000) = f(x) / {1 - (2000 / (2000+2000))3 } = f(x) / 0.875.
f(1000) = (3)(2000)3 / (2000 +1000)4 = 0.000296.
g(1000) = 0.000296 / 0.875 = 0.00034.
16.11. D. Prior to truncation the density function is: f(x) = (3)(2000)3 / (2000+x)4 .
After truncation from below at 400 and from above at 2000, the density function is:
g(x) = f(x) / {F(2000)-F(400)}. F(2000) = 0.875. F(400) = 0.4213.
f(1000) = (3)(20003 ) / (2000+1000)4 = 0.000296.
g(1000) = 0.000296 / (0.875 - 0.4213) = 0.00065.
16.12. If these losses are truncated from below (left truncated) at 1000, then losses of size less than
or equal to 1000 do not appear. The data base is: 2500, 7000.
16.13. If these losses are truncated and shifted from below at 1000, then losses of size less than or
equal to 1000 do not appear, and the other losses have 1000 subtracted from them. The data base
is: 1500, 6000.
Comment: The (non-zero) payments with a $1000 deductible.
16.14. B. & 16.15. D. & 16.16. A. For the original distribution: F(4) = 42 /100 = 0.16.
Therefore, the density left truncated at 4 is: (x/50)/(1 - 0.16) = x/42, 4 ≤ x ≤ 10.
10
The mean of the truncated distribution is: ∫4 (x / 42) x dx = (103 - 43)/126 = 7.429.
For x > 4, by integrating the truncated density, the truncated distribution is: (x2 - 42 ) / 84.
Set the truncated distribution equal to 50%: 0.5 = (x2 - 42 ) / 84. ⇒ x = 7.616.
10
The second moment of the truncated distribution is: ∫4 (x / 42) x2 dx = (104 - 44)/168 = 58.
The variance of the truncated distribution is: 58 - 7.4292 = 2.81.
Comment: The density right truncated at 4 is: (x/50)/0.16 = x/8, 0 ≤ x ≤ 4.
16.17. E. The p.d.f. is: 0 for x ≤ 0, and f(x+d)/[1-F(d)] for 0 < x

Comment: Choice A is the p.d.f for the data truncated from below at d, but not shifted.
16.18. C. P(x ≤ 3000 | x ≤ 10000) = P(x ≤ 3000) / P(x ≤ 10000).

Thus, 1 - 0.3 = P(x≤3000) / (1 - 0.02).
P(x ≤ 3000) = (1 - 0.3)(0.98) = 0.686.
P(x > 3000) = 1 - 0.686 = 0.314.
Alternately, let F(x) be the distribution of the untruncated losses.
Let G(x) be the distribution of the losses truncated from above at 10,000.
Then G(x) = F(x) / F(10,000), for x ≤ 10000.
We are given that 1 - G(3000) = 0.3, and that 1 - F(10,000) = 0.02.
Thus F(10,000) = 0.98.
Also, 0.7 = G(3000) = F(3000) / F(10,000) = F(3000) / 0.98.
Thus F(3000) = (0.7)(0.98) = 0.686.
1 - F(3000) = 1 - 0.686 = 0.314.
2016-C-2, Loss Distributions, §17 Censored Data HCM 10/21/15, Page 175
Section 17, Censored Data
Censoring is somewhat different than truncation. With truncation we do not know of the existence of
certain losses. With censoring we do not know the size of certain losses.
The most important example of censoring is due to the effect of a maximum covered loss.
Right Censored / Censored from Above:73
Right Censored at u ⇔ Censored from Above at u ⇔ X ∧ u ⇔ Min[X, u] ⇔

Maximum Covered Loss u and donʼt know exact size of loss, when ≥ u.
When data is right censored or censored from above at the value u, losses of size more than u
are recorded in the data base as u. This generally occurs when there is a maximum covered loss of
size u. When a loss (covered by insurance) is larger than the maximum covered loss, the insurer
pays the maximum covered loss (if there is no deductible) and may neither know nor care how much
bigger the loss is than the maximum covered loss.
How is this data reported, if it is censored from above at $1000?
[Solution: $300, $600, $1000, $1000, $1000.
Comment: The values recorded as $1000 are $1000 or more. They may be shown as 1000+.]
The revised Distribution Function under censoring from above at u is:
⎧F(x) x < u
G(x) = ⎨
⎩1 x = u
⎧ f(x) x < u
g(x) = ⎨
⎩point mass of probability S(u) x = u
The data censored from above at u is the limited loss variable, X ∧ u ≡ Min[X, u], discussed
previously. The average size of the data censored from above at u, is the Limited Expected Value
at u, E[X ∧ u].
73
The terms “right censored” and “censored from above” are synonymous. “From the right” refers to a graph with
the size of loss along the x-axis with the large values on the righthand side. “From above” uses similar terminology as
“higher layers of loss.” “From above” is how the effect of a maximum covered loss looks in a Lee Diagram.
What is the average size of the data reported, if it is censored from above at $1000?
[Solution: ($300+ $600+ $1000+ $1000+ $1000) /5 = $780 = E[X ∧ 1000].]
Truncation from Below and Censoring from Above:
When data is subject to both a maximum covered loss and a deductible, and one records the loss
by the insured, then the data is censored from above and truncated from below.
For example, with a deductible of $1000 and a maximum covered loss of $25,000:
As Recorded after Truncation from Below at 1000 and
Loss Size Censoring from Above at 25000
600 Not recorded
4500 4500
37000 25000
For truncation from below at d and censoring from above at u,74 the data are recorded as follows:
Loss by Insured Amount Recorded by Insurer
x≤d Not Recorded
d<x≤L x
u≤x u
How is this data reported, if it is truncated from below at $1000 and censored from above at
$2000?
[Solution: $1200, $1500, $2000.]
The revised Distribution Function under censoring from above at u and truncation from below at d is:
⎧F(x) - F(d)
⎪ d < x < u
G(x) = ⎨ S(d)
⎩⎪ 1 x = u
⎧f(x) / S(d) d < x < u

g(x) = ⎨
⎩ point mass of probability S(u)/ S(d) x = u
74
For the example u = $25,000 and d = $1000.
The total losses of the data censored from above at u and truncated from below at d, is the losses in
the layer from d to u, plus d times the number of losses in the data set. The number of losses in the
data set is S(d). Therefore, the average size of the data censored from above at u and truncated
from below at d, is:
(E[X ∧ u] - E[X ∧ d]) + d S(d) E[X ∧ u] - E[X ∧ d]
= + d.
S(d) S(d)
What is the average size of the data reported, if it is truncated from below at $1000 and censored
from above at $2000?
[Solution: ($1200+ $1500 + $2000) /3 = $1567.
Comment: (E[X ∧ 2000] - E[X ∧ 1000]) / S(1000) + 1000 = (1120 - 780)/0.6 + 1000 = $1567.]
Truncation and Shifting from Below and Censoring from Above:
When data is subject to both a maximum covered loss and a deductible, and one records the
amount paid by the insurer, then the data is censored from above and truncated and shifted from
below.
For example, with a deductible of $1000 and a maximum covered loss of $25,000:
As Recorded after Truncation and Shifting from Below at 1000 and
Loss Size Censoring from Above at 25000
600 Not recorded
4500 3500
37000 24000
For truncation and shifting from below at d and censoring from above at u, the data are recorded as
follows:
Loss by Insured Amount Recorded by Insurer
x≤d Not Recorded
d<x≤u x-d
u≤x u-d
The revised Distribution Function under censoring from above at u and truncation and shifting from
below at d is:
⎧F(x + d) − F(d)
⎪ 0 < x < u - d
⎪ S(d)
G(x) = ⎨
⎪
⎪ 1 x = u - d
⎩
⎧f(x + d) / S(d) 0 < x < u - d

g(x) = ⎨
⎩ point mass of probability S(u) / S(d) x = u - d
How is this data reported, if it is truncated and shifted from below at $1000 and censored from
above at $2000?
[Solution: $200, $500, $1000.]
The total losses of the data censored from above at u and truncated and shifted from below at d, is
the losses in the layer from d to u. The number of losses in the data base is S(d). Therefore the
average size of the data censored from above at u and truncated and shifted from below at d, is:
E[X ∧ u] - E[X ∧ d]
.
S(d)
What is the average size of the data reported, if it is truncated and shifted from below at $1000 and
censored from above at $2000?
[Solution: ($200+ $500 + $1000) /3 = $567.
Comment: (E[X ∧ 2000] - E[X ∧ 1000])/S(1000) = (1120 - 780)/.6 = $567.]
Left Censored and Shifted / Censored and Shifted from Below:
For example, the same data as in Section 1, but left censored and shifted or censored and
shifted from below at $10,000, would have each of the 8 small losses in Section 1 reported as
resulting in no payment in the presence of a $10,000 deductible, but we would not know their exact
sizes. For the remaining 122 large losses, the payment of $10,000 less than their size would be
reported.
left censored and shifted variable75 at d ⇔ (X - d)+ ⇔
0 when X ≤ d, X - d when X > d ⇔ the amounts paid to insured with a deductible of d
⇔ payments per loss, including when the insured is paid nothing due to the deductible of d
⇔ amount paid per loss.
When data is left censored and shifted at the value d, losses of size less than d are recorded in
the data base as 0. Losses of size x > d, are recorded as x - d.
What appears in the data base is (X - d)+.
The revised Distribution Function under left censoring and shifting at d is:
G(x) = F(x + d) x≥0
⎧point mass of probability F(d) x = 0

g(x) = ⎨
⎩ f(x + d) x > 0
x ⇔ the size of payment. x+d ⇔ the size of loss.
How is this data reported, if it is left censored and shifted at $1000?
[Solution: $0, $0, $200, $500 and $1,800.]
The mean of the left censored at d and shifted variable =

the average payment per loss with a deductible of d = E[X] - E[ X ∧ d] ⇔ Layer from d to ∞.
∞
E[(X - d)+] = E[X] - E[ X ∧ d] = ∫ (x- d) f(x) dx .
d
∞
E[ (X - d)n+ ] = ∫ (x- d)n f(x) dx .
d
What is the average of the data reported, if it is left censored and shifted at $1000?
[Solution: (0 + 0 + $200 + $500 + $1800)/5 = $500.
E[X ] - E[X ∧ 1000] = $1280 - $780 = $500 = average payment per loss. In contrast, the average
payment per non-zero payment is: ($200 + $500 + $1800)/3 = $833.33. ]
75
Discussed previously. See Definition 3.4 in Loss Models.
Left Censored / Censored from Below:76
Sometimes data is censored from below, so that one only knows how many small values there are,
but does not know their exact sizes.77
For example, an actuary might have access to detailed information on all Workersʼ Compensation
losses of size greater than $2000, including the size of each such loss, but might only know how
many losses there were of size less than or equal to $2000. Such data has been censored from
below at $2000.
For example, the same data as in Section 1, but censored from below at $10,000, would have each
of the 8 small losses in Section 1 reported as being $10,000 or less. The same information would
be reported as shown in Section 1 on the remaining 122 losses. When data is censored from
below at the value d, losses of size less than d are recorded in the data base as d.
The revised Distribution Function under censoring from below at d is:
⎧0 x < d
G(x) = ⎨
⎩F(x) x ≥ d
⎧point mass of probability F(d) x < d

g(x) = ⎨
⎩ f(x) x ≥ d
How is this data reported, if it is censored from below at $1000?
[Solution: $1000, $1000, $1,200, $1,500 and $2,800.
Comment: The values recorded as $1000 are actually $1000 or less.]
The average size of the data censored from below at d, is (E[X ] - E[X ∧ d]) + d.
The losses are those in the layer from d to ∞, plus d per loss.
What is the average of the data reported, if it is censored from below at $1000?
[Solution: ($1000 + $1000 + $1200 + $1500 + $2800) / 5 = $1500.
Comment: (E[X ] - E[X ∧ 1000]) + 1000 = (1280-780) + 1000 = 1500. ]
76
77
Problems:

Losses are uniformly distributed from 0 to 1000.
17.1 (1 point) What is the distribution function for these losses left censored at 100?
17.2 (1 point) What is the distribution function for these losses left censored and shifted at 100?
17.3 (1 point) What is the distribution function for these losses right censored at 800?
17.4 (1 point) What is the distribution function for these losses left truncated and shifted at 100 and
right censored at 800?

There are 6 losses: 100, 400, 700, 800, 1200, and 2300.
17.5 (1 point) If these losses are left censored (censored from below) at 500, what appears in the
data base?
17.6 (1 point) If these losses are left censored and shifted at 500, what appears in the data base?
17.7 (1 point) If these losses are censored from above (right censored) at 2000, what appears in
the data base?
17.8 (1 point) If these losses are left truncated and shifted at 500 and right censored at 2000, what
appears in the data base?
17.9 (1 point) There are five accidents with losses equal to:
$500, $2500, $4000, $6000, and $8000.
Which of the following statements are true regarding the reporting of this data?
1. If the data is censored from above at $5000, then the data is reported as:
$500, $2500, $4000.
2. If the data is truncated from below at $1000, then the data is reported as:
$2500, $4000, $6000, $8000.
3. If the data is truncated and shifted at $1000, then the data is reported as:
$1500, $3000, $5000, $7000.
A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C or D
17.10 (2 points) It can take many years for a Medical Malpractice claim to be reported to an insurer
and can take many more years to be closed, in other words resolved.
You are studying how long it takes Medical Malpractice claims to be reported to your insurer. You
have data on incidents that occurred two years ago and how long they took to be reported. You are
also studying how long it takes for Medical Malpractice claims to be closed once they are reported.
You have data on all incidents that were reported two years ago and how long it took to close those
that are not still open. For each of these two sets of data, state whether it is truncated and/or
censored and briefly explain why.
17.1. We only know that small losses are of size at most 100.
G(100) = .1; G(x) = x/1000 for 100 < x < 1000.
17.2. G(0) = .1; G(x) = (x + 100)/1000 for 0 < x < 900.
17.3. All losses are limited to 800. G(x) = x/1000 for 0 < x < 800; G(800) = 1.
17.4. Losses less than 100 do not appear. Other losses are limited to 800 and then have 100
subtracted. G(x) = x/900 for x < 700; G(700) = 1.
Alternately, F(x) = x/1000, 0 < x < 1000 and G(x) = {F(x + 100) - F(100)}/S(100) =
{((x + 100)/1000) - 100/1000)}/(1 - 100/1000) =
x/900 for 0 < x < 800 - 100 = 700; G(700) = 1.
17.5. The two smaller losses appear as 500: 500, 500, 700, 800, 1200, 2300.
17.6. (X - 500)+ = 0, 0, 200, 300, 700, 1800.

Comment: The amounts the insured receives with a $500 deductible.
17.7. The large loss is limited to 2000: 100, 400, 700, 800, 1200, 2000.
Comment: Payments with a 2000 maximum covered loss.
Right censored observations might be indicated with a plus as follows:
100, 400, 700, 800, 1200, 2000+. The 2000 corresponds to a loss of 2000 or more.
17.8. The two small losses do not appear; the other losses are limited to 2000 and then have 500
subtracted: 200, 300, 700, 1500.
Comment: Payments with a 500 deductible and 2000 maximum covered loss.
Apply the maximum covered loss first and then the deductible; therefore, apply the censorship first
and then the truncation.
17.9. C. If the data is censored by a $5000 limit, then the data is reported as: $500, $2500,
$4000, $5000, $5000. Statement 1 would be true if it referred to truncation from above rather than
censoring. Under censoring the size of large accidents is limited in the reported data to the maximum
covered loss. Under truncation from above, the large accidents do not even make it into the reported
data. Statements 2 and 3 are each true.
17.10. The data on incidents that occurred two years ago is truncated from above at two years.
Those incidents, if any, that will take more than 2 years to be reported will not be in our data base
yet. We donʼt know how many of them there may be nor how long they will take to be reported.
The data on claims that were reported two years ago is censored from above at two years. Those
claims that are still opened, we know will be closed eventually. However, while we know it will take
more than 2 years to close each of them, we donʼt know exactly how long it will take.
2016-C-2, Loss Distributions, §18 Average Sizes HCM 10/21/15, Page 185
Section 18, Average Sizes
For each of the different types of data there are corresponding average sizes. The most important
cases involve a deductible and/or a maximum covered loss; one should know well the average
payment per loss and the average payment per (non-zero) payment.
Average Amount Paid per Loss:
Exercise: When there is a deductible of size 1000, a maximum covered loss of 25,000, and thus a
policy limit of 25,000 - 1000 = 24,000, what is the average amount paid per loss?
[Solution: The average amount paid per loss are the average losses in the layer from 1000 to
25,000: E[X ∧ 25000] - E[X ∧ 1000].]
Situation Average Amount Paid per Loss
No Maximum Covered Loss, No Deductible E[X]
Maximum Covered Loss u, No Deductible E[X ∧ u]
No Maximum Covered Loss, (ordinary) Deductible d E[X] - E[X ∧ d]
Maximum Covered Loss u, (ordinary) Deductible d E[X ∧ u] - E[X ∧ d]
Recalling that E[X ∧ ∞] = E[X] and E[X ∧ 0] = 0, we have a single formula that covers all four
situations:
Assuming the usual order of operations, with Maximum Covered Loss of u and an (ordinary)
deductible of d, the average amount paid by the insurer per loss is: E[X ∧ u] - E[X ∧ d].
Note that the average payment per loss is just the layer from d to u. As discussed previously, this
layer can also be expressed as: (layer from d to ∞) - (layer from u to ∞) =
E[(X - d)+] = E[(X - u)+] = {E[X] - E[X ∧ d]} - {E[X] - E[X ∧ u]} = E[X ∧ u] - E[X ∧ d].
Average Amount Paid per Non-Zero Payment:
Exercise: What is the average non-zero payment when there is a deductible of size 1000 and no
maximum covered loss?
[Solution: The average non-zero payment when there is a deductible of size 1000 is the ratio of the
losses excess of 1000, E[X] -E[X ∧ 1000], to the probability of a loss greater than 1000, S(1000).
Thus the expected non-zero payment is: (E[X] -E[X ∧ 1000]) / S(1000).]
With a deductible, some losses to the insured are too small to result in a payment by the insurer.
Thus there are fewer non-zero payments than losses. In order to convert the average amount paid
per loss to the average amount paid per non-zero payment, one needs to divide by S(d).
Assuming the usual order of operations, with Maximum Covered Loss of u and an (ordinary)
deductible of d, the average amount paid by the insurer per non-zero payment to the
E[X ∧ u] - E[X ∧ d]
insured is: .
S(d)
If u = ∞, in other words there is no maximum covered loss, then this is e(d).
Coinsurance Factor:
For example, an insurance policy might have a 80% coinsurance factor.

Then the insurer pays 80% of what it would have paid in the absence of the coinsurance factor.
Thus the average payment, either per loss or per non-zero payment would be multiplied by 80%.
In general, a coinsurance factor of c, multiplies the average payment, either per loss or per non-zero
payment by c.
With Maximum Covered Loss of u, an (ordinary) deductible of d, and a coinsurance factor of c, the
average amount paid by the insurer per loss by the insured is: c (E[X ∧ u] - E[X ∧ d]).
With Maximum Covered Loss of u, an (ordinary) deductible of d, and a coinsurance factor of c, the
average amount paid by the insurer per non-zero payment to the insured is:
E[X ∧ u] - E[X ∧ d]
c .
S(d)
Exercise: Prior to the application of any coverage modifications, losses follow a Pareto Distribution,
as per Loss Models, with parameters α = 3 and θ = 20,000.
An insured has a policy with a $100,000 maximum covered loss, a $5000 deductible, and a 90%
coinsurance factor. Thus the policy limit is: (0.9)(100,000 - 5000) = 85,500.
Determine the average amount per non-zero payment.
[Solution: For the Pareto Distribution, as shown in Appendix A of Loss Models,
⎛ θ ⎞α θ ⎧ ⎛ θ ⎞ α − 1⎫
S(x) = ⎜ ⎟ . E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬.
⎝ θ + x⎠ α −1 ⎩ ⎝ θ + x⎠ ⎭
S(5000) = (20/25)3 = 0.512. E[X ∧ 5000] = 10,000 {1 -(20/25)2 } = 3600.

E[X ∧ 100,000] = 10,000 {1 -(20/120)2 } = 9722.
E[X ∧ 100,000] - E[X ∧ 5000] (90%)(9722 - 3600)
90% = = $10,761.]
S(5000) 0.512
Average Sizes for the Different Types of Data Sets:
Type of Data Average Size
Ground-up, Total Limits E[X]
Censored from Above at u E[X ∧ u]
E[X] - E[X ∧ d]
Truncated from Below at d e(d) + d = +d
S(d)
E[X] - E[X ∧ d]
Truncated and Shifted from Below at d e(d) =
S(d)
E[X ∧ L] - L S(L)
Truncated from Above at L
F(L)
Censored from Below at d (E[X ] - E[X ∧ d]) + d
Left Censored and Shifted E[(X - d)+] = E[X] - E[X ∧ d]
E[X ∧ u] - E[X ∧ d]
Censored from Above at u +d
S(d)
and Truncated from Below at d
E[X ∧ u] - E[X ∧ d]
Censored from Above at u and
S(d)
Truncated and Shifted from Below at d
{E[X ∧L] - L S(L)} - {E[X ∧ d] - d S(d)}

Truncated from Above at L and
F(L) - F(d)
Truncated from Below at d
{E[X ∧ L] - L S(L)} - {E[X ∧ d] - d S(d)}

Truncated from Above at L -d
F (L) - F (d)
and Truncated and Shifted
from Below at d
Problems:

There are five losses of size: $2000, $6000, $12,000, $27,000, and $48,000.
18.1 (1 point) With a $5000 deductible, what is the average payment per loss?
A. less than 12,000
E. at least 15,000
18.2 (1 point) With a $5000 deductible, what is the average payment per (non-zero) payment?
A. less than 17,000
E. at least 20,000
18.3 (1 point) With a $25,000 policy limit, what is the average payment?
A. less than 14,000
E. at least 17,000
18.4 (1 point) With a $5000 deductible and a $25,000 maximum covered loss,
what is the average payment per loss?
A. less than 10,000
E. at least 13,000
18.5 (1 point) With a $5000 deductible and a $25,000 maximum covered loss,
what is the average payment per (non-zero) payment?
A. less than 11,000
E. at least 14,000

Loss Size (x) F(x) E[X ∧ x]
10,000 0.325 8,418
25,000 0.599 16,198
50,000 0.784 23,544
100,000 0.906 30,668
250,000 0.978 37,507
∞ 1.000 41,982
18.6 (1 point) With a 25,000 deductible, determine E[YL ].

A. less than 22,000
E. at least 25,000
18.7 (1 point) With a 25,000 deductible, determine E[YP].

A. less than 62,000
E. at least 65,000
18.8 (1 point) With a 100,000 policy limit, what is the average payment?
A. less than 27,000
E. at least 30,000
18.9 (1 point) With a 25,000 deductible and a 100,000 maximum covered loss,
A. less than 15,000
E. at least 18,000
18.10 (1 point) With a 25,000 deductible and a 100,000 maximum covered loss, determine E[YP].
A. 32,000 B. 33,000 C. 34,000 D. 35,000 E. 36,000

There are six accidents of size: $800, $2100, $3300, $4600, $6100, and $8900.
18.11 (1 point) If the reported data is truncated from below at $1000,

what is the average size of claim in the reported data?
A. 4800 B. 4900 C. 5000 D. 5100 E. 5200
18.12 (1 point) If the reported data is truncated and shifted from below at $1000,
A. less than 3900
E. at least 4200
18.13 (1 point) If the reported data is left censored and shifted at $1000,
A. less than 3700
E. at least 4000
18.14 (1 point) If the reported data is censored from above at $5000,

A. less than 3400
E. at least 3700.
18.15 (1 point) If the reported data is truncated from above at $5000,

A. less than 2300
E. at least 2600

Losses follow a uniform distribution from 0 to 20,000.
18.16 (2 points) If there is a deductible of 1000,

what is the average payment by the insurer per loss?
A. less than 8800
E. at least 9100
18.17 (2 points) If there is a policy limit of 15,000,

what is the average payment by the insurer per loss?
A. less than 9300
E. at least 9600
18.18 (2 points) There is a maximum covered loss of 15,000 and a deductible of 1000.
What is the average payment by the insurer per loss?
(Include situations where the insurer pays nothing.)
A. less than 8200
E. at least 8500
18.19 (2 points) There is a maximum covered loss of 15,000 and a deductible of 1000.
What is the average value of a non-zero payment by the insurer?
A. less than 8700
E. at least 9000
18.20 (2 points) An insurance policy has a maximum covered loss of 2000 and a deductible of 100.
For the ground up unlimited losses: F(100) = 0.20, F(2000) = 0.97, and
2000
∫100 x f(x) dx = 400.

What is the average payment per loss?
A. 360 B. 380 C. 400 D. 420 E. 440
18.21 (1 point) A policy has a policy limit of 50,000 and deductible of 1000.
What is the expected payment per loss?
A. E[X ∧ 49,000] - E[X ∧ 1000]
B. E[X ∧ 50,000] - E[X ∧ 1000]
C. E[X ∧ 51,000] - E[X ∧ 1000]
D. E[X ∧ 49,000] - E[X ∧ 1000] + 1000
E. E[X ∧ 51,000] - 1000
18.22 (2 points) You are given:

• In the absence of a deductible the average loss is 15,900.
• With a 10,000 deductible, the average amount paid per loss is 7,800.
• With a 10,000 deductible, the average amount paid per nonzero payment is 13,300.
What is the average of those losses of size less than 10,000?
(A) 5000 (B) 5200 (C) 5400 (D) 5600 (E) 5800
18.23 (1 point) E[(X - 1000)+] = 3500. E[(X - 25,000)+] = 500.

There is a 1000 deductible and a 25,000 maximum covered loss.
Determine the average payment per loss.

• The average payment per loss with a deductible of d is 450.
• The average payment per payment with a deductible of d is 1000.
• The average payment per loss with a franchise deductible of d is 1080.
Determine d.
A. 1000 B. 1100 C. 1200 D. 1300 E. 1400

• Flushing Reinsurance reinsures a certain book of business.
• Limited Expected Values for this book of business are estimated to be:
E[X ∧ $1 million] = $300,000
E[X ∧ $4 million] = $375,000
E[X ∧ $5 million] = $390,000
E[X ∧ $9 million] = $420,000
E[X ∧ $10 million] = $425,000
• The survival functions, S(x) = 1 - F(x), for this book of business are estimated to be:
S($1 million) = 3.50%
S($4 million) = 1.70%
S($5 million) = 1.30%
S($9 million) = 0.55%
S($10 million) = 0.45%
• Flushing Reinsurance makes a nonzero payment, y, on this book of business.
18.25 (1 point) If Flushing Reinsurance were responsible for the layer of loss from $1 million to
$5 million ($4 million excess of $1 million), what is the expected value of y?
18.26 (1 point) If Flushing Reinsurance were responsible for the layer of loss from $1 million to
$10 million ($9 million excess of $1 million), what is the expected value of y?
18.27 (3 points) Losses are distributed uniformly from 0 to ω.

There is a deductible of size d < ω.
Determine the variance of the payment per loss.

• The distribution of losses suffered by insureds is estimated to have the following
limited expected values:
E[X ∧ 5,000 ] = 3,600
E[X ∧ 20,000] = 7,500
E[X ∧ 25,000] = 8,025
E[X ∧ ∞] = 10,000
• The survival functions, S(x), for the distribution of losses suffered by insureds
is estimated to have the following values:
S(5,000) = 51.2%
S(20,000) = 12.5%
S(25,000) = 8.8%
18.28 (1 point) What is average loss suffered by the insureds?

A. 9,600 B. 9,700 C. 9,800 D. 9,900 E. 10,000
18.29 (1 point) What is the average size of data truncated from above at 25,000?
A. less than 6,300
E. at least 6,600
18.30 (1 point) What is the average size of data truncated and shifted from below at 5000?
A. 12,500 B. 12,600 C. 12,700 D. 12,800 E. 12,900
18.31 (1 point) What is the average size of data censored from above at 25,000?
A. 7800 B. 7900 C. 8000 D. 8100 E. 8200
18.32 (1 point) What is the average size of data censored from below at 5,000?
A. less than 10,700
E. at least 11,000
18.33 (1 point) What is the average size of data left censored and shifted at 5,000?
A. 6200 B. 6300 C. 6400 D. 6500 E. 6600
18.34 (2 points) What is the average size of data truncated from below at 5,000 and truncated from
above at 25,000?
A. less than 11,200
E. at least 11,500.
18.35 (1 point) What is the average size of data truncated from below at 5,000 and censored from
above at 25,000?
A. less than 13,500
E. at least 13,800
18.36 (2 points) What is the average size of data censored from below at 5,000 and censored from
above at 25,000?
A. 9100 B. 9200 C. 9300 D. 9400 E. 9500
18.37 (2 points) What is the average size of data truncated and shifted from below at 5,000 and
truncated from above at 25,000?
A. less than 6,100
E. at least 6,400
18.38 (2 points) What is the average size of data truncated and shifted from below at 5,000 and
censored from above at 25,000?
A. less than 8,700
E. at least 9,000
18.39 (1 point) What is the average size of data truncated from below at 5000?
A. 17,000 B. 17,500 C. 18,000 D. 18,500 E. 19,000
18.40 (2 points) The size of loss distribution has the following characteristics:
(i) S(100) = 0.65.
(ii) E[X | X > 100] = 345.
There is an ordinary deductible of 100 per loss.
Determine the average payment per loss.
(A) 160 (B) 165 (C)170 (D) 175 (E) 180
18.41 (3 points) A business has obtained two separate insurance policies that together provide full
coverage. You are given:
(i) The average ground-up loss is 27,000.
(ii) Policy B has no deductible and a maximum covered loss of 25,000.
(iii) Policy A has an ordinary deductible of 25,000 with no maximum covered loss.
(iv) Under policy A, the expected amount paid per loss is 10,000.
(v) Under policy A, the expected amount paid per payment is 22,000.
Given that a loss less than or equal to 25,000 has occurred, what is the expected payment under
policy B?
A. Less than 11,000
E. At least 14,000
18.42 (2 points) X is the size of loss prior to the effects of any policy provisions.
Given the following information, calculate the average payment per loss under a policy with a 1000
deductible and a 25,000 maximum covered loss.
x e(x) F(x)
1000 30,000 72.7%
25,000 980,000 99.7%
A. 4250 B. 4500 C. 4750 D. 5000 E. 5250
18.43 (2 points) For a certain policy, in order to determine the payment on a claim, first the
deductible of 500 is applied, and then the payment is capped at 10,000.
What is the expected payment per loss?
A. E[X ∧ 10,000] - E[X ∧ 500]
B. E[X ∧ 10,500] - E[X ∧ 500]
C. E[X ∧ 10,000] - E[X ∧ 500] + 500
D. E[X ∧ 10,500] - E[X ∧ 500] + 500
E. None of A, B, C, or D
18.44 (2 points) If an ordinary deductible of 5000 is applied, then the average payment per
payment is 40,000.
If a franchise deductible of 5000 is applied, then the average payment per loss is 33,750.
If an ordinary deductible of 5000 is applied, determine the average payment per loss.
A. 28,000 B. 29,000 C. 30,000 D. 31,000 E. 32,000
18.45 (2 points) The size of losses has the following density:

⎧ 0.007, 0 < x ≤ 100
⎪
f(x) = ⎨ 0.001, 100 < x ≤ 300 .
⎪
⎩0.0002, 300 < x ≤ 800
If there is an ordinary deductible of 200, determine the expected payment per loss.
A. 40 B. 45 C. 50 D. 55 E. 60
18.46 (4B, 5/92, Q.20) (1 point) Accidents for a coverage are uniformly distributed on the interval
0 to $5,000. An insurer sells a policy for the coverage which has a $500 deductible.
Determine the insurer's expected payment per loss.
A. $1,575 B. $2,000 C. $2,025 D. $2,475 E. $2,500
• Losses follow a Pareto distribution, with parameters θ = 1000 and α = 2.
• 10 losses are expected each year.
• The number of losses and the individual loss amounts are independent.
• For each loss that occurs, the insurer's payment is equal to the entire amount of the loss
if the loss is greater than 100.
The insurer makes no payment if the loss is less than or equal to 100.
A. Less than 8,000
E. At least 9,900
18.48 (4B, 11/95, Q.13 & 4B, 5/98 Q.9) (3 points) You are given the following:
• Losses follow a uniform distribution on the interval from 0 to 50,000.
• There is a maximum covered loss of 25,000 per loss and a deductible of 5,000 per loss.
• The insurer applies the maximum covered loss prior to applying the deductible
(i.e., the insurerʼs maximum payment is 20,000 per loss).
• The insurer makes a nonzero payment p.
Determine the expected value of p.
A. Less than 15,000
E. At least 21,000
18.49 (CAS9, 11/97, Q.40a) (2.5 points) You are the large accounts actuary for Pacific International
Group, and you have a risk with a $1 million limit.
The facultative underwriters from AnyRe have indicated that they are willing to reinsure
the following layers:
from $100,000 to $200,000 ($100,000 excess of $100,000)
from $200,000 to $500,000 ($300,000 excess of $200,000)
from $500,000 to $1 million ($500,000 excess of $500,000).
You have gathered the following information:
Limit E[X ∧ x] F(x)
100,000 58,175 0.603
200,000 89,629 0.748
500,000 139,699 0.885
1,000,000 179,602 0.943
Expected frequency = 100 claims.
Calculate the frequency, severity, and expected losses for each of the facultative layers.
Show all work.
• Losses follow a distribution (prior to the application of any deductible) with
cumulative distribution function and limited expected values as follows:
10,000 0.60 6,000
15,000 0.70 7,700
22,500 0.80 9,500
∞ 1.00 20,000
• There is a deductible of 15,000 per loss and no maximum covered loss.
Determine the expected value of p.
A. Less than 15,000
E. At least 60,000
10,000 0.60 6,000
15,000 0.70 7,700
22,500 0.80 9,500
32,500 0.90 11,000
∞ 1.00 20,000
• The insurer makes a payment on a loss only if the loss exceeds the deductible.
The deductible is raised so that half the number of losses exceed the new deductible compared to
the old deductible of 10,000.
Determine the percentage change in the expected size of a nonzero payment made by the insurer.
A. Less than -37.5%
B. At least -37.5%, but less than -12.5%
C. At least -12.5%, but less than 12.5%
E. At least 37.5
18.52 (Course 3 Sample Exam, Q.5) You are given the following:
• The probability density function for the amount of a single loss is
f(x) = 0.01(1 - q + 0.01qx)e-0.01x, x >0.
• If an ordinary deductible of 100 is imposed, the expected payment
(given that a payment is made) is 125.
Determine the expected payment (given that a payment is made) if the deductible is increased to
200.
18.53 (4, 5/00, Q.6) (2.5 points) A jewelry store has obtained two separate insurance policies that
together provide full coverage. You are given:
(ii) Policy A has an ordinary deductible of 5,000 with no maximum covered loss
(iii) Under policy A, the expected amount paid per loss is 6,500.
(iv) Under policy A, the expected amount paid per payment is 10,000.
(v) Policy B has no deductible and a maximum covered loss of 5,000.
Given that a loss less than or equal to 5,000 has occurred, what is the expected payment under
policy B?
(A) Less than 2,500
(B) At least 2,500, but less than 3,000
(C) At least 3,000, but less than 3,500
(D) At least 3,500, but less than 4,000
(E) At least 4,000
18.54 (4, 11/00, Q.18) (2.5 points) A jewelry store has obtained two separate insurance policies
that together provide full coverage.
You are given:
(ii) Policy A has an ordinary deductible of 5,000 with no maximum covered loss.
(iii) Under policy A, the expected amount paid per loss is 6,500.
(iv) Under policy A, the expected amount paid per payment is 10,000.
(v) Policy B has no deductible and a maximum covered loss of 5,000.
Given that a loss has occurred, determine the probability that the payment under policy B is 5,000.
(A) Less than 0.3
(E) At least 0.6
18.55 (CAS3, 11/03, Q.22) (2.5 points) The severity distribution function of claims data for
automobile property damage coverage for Le Behemoth Insurance Company is given by an
exponential distribution, F(x).
F(x) = 1 - exp(-x/5000).
To improve the profitability of this portfolio of policies, Le Behemoth institutes the following policy
modifications:
i) It imposes a per-claim deductible of 500.
ii) It imposes a per-claim limit of 25,000.
(The maximum paid per claim is 25,000 - 500 = 24,500.)
Previously, there was no deductible and no limit.
Calculate the average savings per (old) claim if the new deductible and policy limit had been in
place.
A. 490 B. 500 C. 510 D. 520 E. 530
(i) Losses have density function
⎧ 0.02x 0 < x < 10
fX(x) = ⎨
⎩ 0 elsewhere
(ii) The insurance has an ordinary deductible of 4 per loss.

(iii) YP is the claim payment per payment random variable.
Calculate E[YP].
(A) 2.9 (B) 3.0 (C) 3.2 (D) 3.3 (E) 3.4
Loss amounts have the distribution function
⎧(x /100)2 , 0 ≤ x ≤ 100
F(x) = ⎨
⎩ 1 , 100 < x
An insurance pays 80% of the amount of the loss in excess of an ordinary deductible of 20,
subject to a maximum payment of 60 per loss.
Calculate the conditional expected claim payment, given that a payment has been made.
(A) 37 (B) 39 (C) 43 (D) 47 (E) 49
18.58 (CAS5, 5/07, Q.47) (2.0 points) You are given the following information:
Claim Ground-up Uncensored Loss Amount
A $250,000
B $300,000
C $450,000
D $750,000
E $1,200,000
F $2,500,000
G $4,000,000
H $7,500,000
I $9,000,000
J $15,000,000
a. (1.25 points) Calculate the ratio of the limited expected value at $5 million to
the limited expected value at $1 million
b. (0.75 points) Calculate the average payment per payment with a deductible of $1 million and a
maximum covered loss of $5 million.
Comment: I have reworded this exam question in order to match the syllabus of your exam.
18.1. D. The payments are: 0, 1000, 7000, 22000 and 43000.

Average payment per loss is: (0 + 1000 + 7000 + 22,000 + 43,000)/5 = 14,600.
18.2. C. Average payment per payment is: (1000 + 7000 + 22,000 + 43,000)/4 = 18,250.
18.3. B. The payments are: $2000, $6000, $12,000, $25,000, and $25,000.
Average payment is: (2000 + 6000 + 12000 + 25000 + 25000)/5 = 14,000.
18.4. A. The payments are: 0, 1000, 7000, 20000 and 20000.

Average payment per loss is: (0 + 1000 + 7000 + 20000 + 20000)/5 = 9,600.
Alternately, E[X ∧ 5000] = (2000 + 5000 + 5000 + 5000 + 5000 + 5000)/6 = 4400.
E[X ∧ 25000] = (2000 + 6000 + 12,000 + 25,000 + 25,000)/6 = 14,000.
E[X ∧ 25000] - E[X ∧ 5000] = 14,000 - 4400 = 9,600.
Comment: The layer from 5000 to 25,000.
18.5. C. Average payment per payment is: (1000 + 7000 + 20000 + 20000)/4 = 12,000.
Alternately, (E[X ∧ 25000] - E[X ∧ 5000])/S(5000) = (14,000 - 4400)/0.8 = 12,000.
18.6. E. E[X] - E[X ∧ 25000] = 41,982 - 16,198 = 25,784.

Comment: Based on a LogNormal distribution with µ = 9.8 and σ = 1.3.
18.7. D. Average payment per payment is:

(E[X] - E[X ∧ 25000]) / S(25000) = (41,982 - 16,198) / (1 - 0.599) = 64,299.
18.8. E. E[X ∧ 100000] = 30,668.
18.9. A. E[X ∧ 100000] - E[X ∧ 25000] = 30,668 - 16,198 = 14,470.

Comment: The layer from 25,000 to 100,000.
18.10. E. (E[X ∧ 100000] - E[X ∧ 25000])/S(25000) = (30,668 - 16,198)/(1 - .599) = 36,085.
18.11. C. ($2100 + $3300 + $4600 + $6100 + $8900) / 5 = 5000.
18.12. C. ($1100 + $2300 + $3600 + $5100 + $7900) / 5 = 4000.

Alternately, one can subtract 1000 from the solution to the previous question.
18.13. A. ($0 + $1100 + $2300 + $3600 + $5100 + $7900) / 6 = 3333.

18.14. B. ($800 + $2100 + $3300 + $4600 + $5000 + $5000) / 6 = 3467.
18.15. E. ($800 + $2100 + $3300 + $4600 ) / 4 = 2700.
18.16. D. For this uniform distribution, f(x) = 1/20,000 for 0≤x≤20,000.

The payment by the insurer depends as follows on the size of loss x:
Insurerʼs Payment
0 x ≤ 1000
x - 1000 1000 ≤ x ≤ 20,000
We need to compute the average dollars paid by the insurer per loss:
20,000 20,000 x = 20,000
∫ ∫ ]
2
(x - 1000) f(x) dx = {(x - 1000) / 20,000} dx = (x -1000) / 40,000 = 9025.
x = 1000
1000 1000
18.17. B. f(x) = 1/20,000 for 0≤ x ≤ 20,000.

Insurerʼs Payment
x x ≤ 15,000
15,000 15,000 ≤ x ≤ 20,000
We need to compute the average dollars paid by the insurer per loss, the sum of two terms
corresponding to 0 ≤ x ≤ 15000 and 15000 ≤ x ≤ 20000:
15,000 15,000
∫0 x f(x) dx + 15,000 S(15,000)} =

∫0 x / 20000 dx + (15,000)(1 - 0.75) =
x = 15,000
x / 40,000]
2 + 3750 = 5625 + 3750 = 9375.
x =0
18.18. D. For this uniform distribution, f(x) = 1/20000 for 0≤x≤20000. The payment by the insurer
depends as follows on the size of loss x:
Insurerʼs Payment
0 x ≤ 1000
x - 1000 1000 ≤ x ≤ 15,000
14,000 x ≥ 15,000
We need to compute the average dollars paid by the insurer per loss and the probability that a
loss as the sum of two terms corresponding to 1000 ≤ x ≤ 15,000 and 20,000 ≥x ≥ 15,000:
15,000 15,000
∫ (x -1000) f(x) dx + 140,00 S(15,000) =

∫ {(x -1000) / 20,000} dx + 14,000(1 - 0.75) =
1000 1000
x = 20,000
2
{(x - 1000) / 40,000} ] + 3500 = 4900 + 3500 = 8400.
x = 1000
18.19. C. We need to compute the ratio of two quantities, the average dollars paid by the insurer
per loss and the probability that a loss will result in a non-zero payment. The latter is the chance that
x>1000, which is for the uniform distribution: 1 - (1000/20000) = .95. The former is the solution to the
previous question: 8400. Therefore, the average non-zero payment is 8400 / 0.95 = 8842.
Comment: Similar to 4B, 11/95, Q.13.
18.20. B. Average payment per loss = E[X ∧ u] - E[X ∧ d] =

2000 100
E[X ∧ 2000] - E[X ∧ 100] = ∫0 x f(x) dx + 2000S(2000) - {
∫0 x f(x) dx + 100S(100)} =
2000
∫100 x f(x) dx + 2000S(2000) - 100S(100) = 400 + (2000)(1 - 0.97) - (100)(1 - 0..20) = 380.
Comment: Can be done via a Lee Diagram.
18.21. C. Policy limit = maximum covered loss - deductible.

Thus the maximum covered loss = 51,000.
Expected payment per loss = E[X ∧ u] - E[X ∧ d] = E[X ∧ 51,000] - E[X ∧ 1000].
18.22. C. E[X] = 15,900.

With a 10,000 deductible, the average amount paid per loss = E[X] - E[X ∧ 10000] = 7800.
Therefore, E[X ∧ 10000] = 15900 - 7800 = 8,100.
With a 10,000 deductible, the average amount paid per nonzero payment =
(E[X] - E[X ∧ 10000])/S(10000) = 13,300.
Therefore, S(10000) = 7800/ 13300 = 0.5865.
Average loss of size between 0 and 10000 =
10,000
∫0 x f(x) dx / F(10000) = {E[X ∧ 10000] - 10000S(10000)} / F(10000) =
{8100 - (10000)(0.5865)} / (1-0.5865) = 2235/0.4135 = 5405.
18.23. E[(X - 1000)+] - E[(X - 25,000)+] = {E[X] - E[X ∧ 1000]} - {E[X] - E[X ∧ 25,000]} =
E[X ∧ 25,000] - E[X ∧ 1000] = average payment per loss.
Thus, the average payment per loss is: 3500 - 500 = 3000.
18.24. E. 450 = E[X] - E[X ∧ d].

1000 = (E[X] - E[X ∧ d]) / S(d). ⇒ S(d) = 450/1000 = 0.45.
The average payment per payment with a franchise deductible is d more than for an ordinary
deductible d: d + (E[X] - E[X ∧ d]) / S(d).
Thus the average payment per loss with a franchise deductible is:
S(d) {d + (E[X] - E[X ∧ d]) / S(d)} = d S(d) + (E[X] - E[X ∧ d]).
Thus, 1080 = d 0.45 + 450. ⇒ d = (1080 - 450) / 0.45 = 1400.
18.25. C. The reinsurer pays the dollars in the layer of loss from $1 million to
$5 million, which are: E[X ∧ $5 million] - E[X ∧ $1 million]. The number of nonzero payments is
1 - F(1 million) = S(1 million). Thus the average nonzero payment is:
(E[X ∧ $5 million] - E[X ∧ $1 million])/ S($1 million) = (390,000 - 300,000)/.035 = 2,571,429.
18.26. D. The reinsurer pays the dollars in the layer of loss from $1 million to
$10 million, which are: E[X ∧ $10 million] - E[X ∧ $1 million]. The number of nonzero payments is
1 - F(1 million) = S(1 million). Thus the average nonzero payment is:
(E[X ∧ $10 million] - E[X ∧ $1 million])/ S($1 million) = (425,000 - 300,000)/0.035 = 3,571,429.
18.27. The payment per loss of size x is: 0 for x ≤ d, and x - d for x > d.
ω
Mean payment per loss is: ∫d (x- d) (1/ ω) dx = (ω - d)2 / (2ω).
ω
Second Moment of the payment per loss is: ∫d (x- d)2 (1/ ω) dx = (ω - d)3 / (3ω).
Thus, the variance of the payment per loss is:
(ω - d)3 / (3ω) - (ω - d)4 / (2ω)2 = (ω - d)3 (ω + 3d) / (12ω2).
18.28. E. E[X] = E[X ∧ ∞] = 10,000.
18.29. B. {E[X ∧ 25000] - 25000S(25000)} / F(25000) =

{8025 - (25000)(0.088)} / (1-0.088) = 6387.
18.30. A. e(5000) = {E[X] - E[X ∧ 5000]}/S(5000) = (10,000-3600)/0.512 = 12,500.
18.31. C. E[X ∧ 25000] = 8025.
18.32. E. The small losses are each recorded at 5000. Subtracting 5000 from every recorded
loss, we would get the layer from 5000 to ∞. Thus the average loss is:
(E[X ] - E[X ∧ 5000]) + 5000 = (10000 - 3600) + 5000 = 11,400.
18.33. C. E[(X - 5000)+] = E[X ] - E[X ∧ 5000] = 10000 - 3600 = 6,400.
18.34. B. This is the average size of losses in the interval from 5000 to 25000:
({E[X ∧ 25000] - 25000S(25000)}-{E[X ∧ 5000] - 5000S(5000)})/(F(25000)-F(5000)} =
8025 - (25000)(0.088) - {3600 - (5000)(0.512)}
= (5825 - 1040)/0.424 = 11,285.
0.512 - 0.088
18.35. C. {E[X ∧ 25000] - E[X ∧ 5000]}/S(5000) + 5000 = (8025-3600)/.512 + 5000 = 13,643.
18.36. D. The losses are 5000 per loss plus the layer of losses from 5000 to 25,000. Thus the
average loss is: (E[X ∧ 25000] - E[X ∧ 5000]) + 5000 =
(8025 -3600) + 5000 = 9425. Alternately, the average size of loss is reduced compared to data
just censored from below, by E[X] - E[X ∧ 25000] = 10000 - 8025 = 1975. Since from a previous
solution, the average size of data censored from below at 5,000 is 11,400, the solution to this
question is: 11,400 - 1975 = 9425.
18.37. C. Using a previous solution, where there was no shifting, this is 5000 less than the average
size of data truncated from below at 5,000 and truncated from above at 25,000 = 11,285 - 5000 =
6,285. Alternately, the dollars of loss are those for the layer from 5,000 to 25,000, less the width of
the layer times the number of losses greater than 25,000:
{E[X ∧ 25000] - E[X ∧ 5000]} - (20000)S(25000). The number of losses in the database is:
F(25000) - F(5000) = S(5000) - S(25000). Thus the average size is:
{(8025-3600) - (20000)(.088)} / (0.512 - 0.088) = 6,285.
18.38. A. {E[X ∧ 25000] - E[X ∧ 5000]} / S(5000) = (8025-3600)/.512 = 8,643.
18.39. B. e(5000) + 5000 = {E[X] - E[X ∧ 5000]}/S(5000) + 5000 =

(10,000-3600)/0.512 + 5000 = 17,500.
∞
18.40. A. E[X | X > 100] = ∫100 x f(x) dx / S(100). ⇒
∞
∫100 x f(x) dx = S(100)E[X | X > 100] = (.65)(345) = 224.25.

With a deductible of 100 per loss, the average payment per loss is:
∞ ∞ ∞
∫100 (x - 100) f(x) dx = 100∫ x f(x) dx - 100 100∫ f(x) dx = 224.25 - (100)(0.65) = 159.25.
Alternately, average payment per loss = S(100) (average payment per payment) =
S(100) E[X - 100 | X > 100] = S(100) {E[X | X > 100] - 100} = (.65)(345 - 100) = 159.25.
18.41. A. Average ground-up loss = E[X] = 27,000.

Under policy A, average amount paid per loss = E[X] - E[X ∧ 25000] = 10000.
Therefore, E[X ∧ 25000] = 27000 - 10000 = 17000.
Under policy A, average amount paid per payment = (E[X] - E[X ∧ 25000]) / S(25000) = 22,000.
Therefore, S(25000) = 10000/22000 = 0.4545.
Given that a loss less than or equal to 25,000 has occurred, the expected payment under policy B =
25,000
average loss of size between 0 and 25000 = ∫0 x f(x) dx / F(25000) =
{E[X ∧ 25000] - 25000S(25000)} / F(25000) = {17000 - (25000)(0.4545)} / (1 - 0.4545) =

10,335.
18.42. E. E[(X - 1000)+] = e(1000) S(1000) = (30,000)(1 - 0.727) = 8190.

E[(X - 25,000)+] = e(25,000) S(25,000) = (980,000)(1 - 0.997) = 2940.
The average payment per loss is:
E[X ∧ 25,000] - E[X ∧ 1000] = E[(X - 1000)+] - E[(X - 25,000)+] = 8190 - 2940 = 5250.
18.43. B. 10,000 = maximum payment = Policy limit = maximum covered loss - deductible.
Thus the maximum covered loss = 10,500.
Expected payment per loss = E[X ∧ u] - E[X ∧ d] = E[X ∧ 10,500] - E[X ∧ 500].
Alternately, let X be the size of loss.
If for example x = 11,000, then the payment is: Min[10,500, 10,000] = 10,000.
If for example x = 10,200, then the payment is: Min[9700, 10,000] = 9700.
If for example x = 7000, then the payment is: Min[6500, 10,000] = 6500.
If for example x = 300, then the payment is: Min[0, 10,000] = 0.
⎧ 10,000, x ≥ 10,500
⎪
payment = ⎨x - 500, 500 < x < 10,500 .
⎪ 0, x ≤ 500
⎩
These are the same payments as if there were a 10,500 maximum covered loss (applied first) and
a 500 deductible (applied second). Proceed as before.
Alternately, we can compute the average payment per loss:
10,500
∫ (x - 500) f(x) dx + 10,000 S(10,500) =

500
10,500 10,500
∫ x f(x) dx - 500
∫ f(x) dx + 10,000 S(10,500) =
500 500
E[X ∧ 10,500] - 10,500 S(10,500) - {E[X ∧ 500] - 500 S(500)}

- 500{F(10,500) - F(500)} + 10,000 S(10,500) =
E[X ∧ 10,500] - 500 S(10,500) - E[X ∧ 500] + 500 S(500) - 500 F(10,500) + 500 F(500) =
E[X ∧ 10,500] - E[X ∧ 500] + 500{F(500) + S(500) - F(10,500) - S(10,500)} =
E[X ∧ 10,500] - E[X ∧ 500] + (500)(1 - 1) = E[X ∧ 10,500] - E[X ∧ 500].
Comment: As mentioned in the section on Policy Provisions, the default on the exam is to apply the
maximum covered loss first and then apply the deductible.
What is done in this question is mathematically the same as first applying a maximum covered loss
of 10,500 and then applying a deductible of 500.
18.44. C. 40,000 = (E[X] - E[X ∧ 5000]) / S(5000). ⇒ E[X] - E[X ∧ 5000] = 40,000 S(5000).
When there is a positive payment with a franchise deductible then it is 5000 more than that when
there is an ordinary deductible. The probability of a positive payment if S(5000).
Thus, 33,750 = E[X] - E[X ∧ 5000] + 5000 S(5000) = 40.000 S(5000) + 5000 S(5000).
⇒ S(5000) = 33,750 / 45,000 = 0.75.
With an ordinary deductible of 5000, the average payment per loss is:
E[X] - E[X ∧ 5000] = 40,000 S(5000) = (40,000)(0.75) = 30,000.
18.45. A. Expected payment per loss is:

300 800
∫ (x - 200) 0.001 dx +
∫ (x - 200) 0.0002 dx = 5 + 35 = 40.
200 300
Comment: The mean size of loss is 130; the loss elimination ratio is: 1 - 40/130 = 0.692.
18.46 C. For an accident that does not exceed $500 the insurer pays nothing.
For an accident of size x > 500, the insurer pays x - 500.
The density function for x is f(x) = 1/5000 for 0≤x≤5000.
Thus the insurerʼs average payment per accident is:
5000 5000 x = 5000
∫ (x - 500) f(x) dx =
∫ (x - 500) (1/ 5000) dx = (x - 500)2 (1/ 10,000) ]
x = 500
= 2025.
500 500
18.47. E.
∞ ∞ 100
Expected amount paid per loss = ∫100 x f(x) dx = ∫0 x f(x) dx - ∫0 x f(x) dx =
Mean - {E[X ∧ 100] - 100S(100)}.
S(100) = {θ/(θ+100)}2 = (1000/1100)2 = 0.8264.
E[X ∧ 100] = {θ/(α−1)}{1−(θ/(θ+100))α−1} = {1000/(2-1)} { 1- (1000/1100)2-1} = 90.90.

Mean = θ/(α−1) = 1000.
Therefore, expected amount paid per loss is: 1000 - {90.90 - 82.64} = 991.74.
Expect 10 losses per year, so the average cost per year is: (10)(991.7) = $9917.
Alternately the expected cost per year of 10 losses is:
∞ ∞
10 ∫100 x f(x) dx = (10)(2)(10002) 100∫ x (1000 + x)- 3 dx =
x =∞ ∞
10 7 {-x ]
(1000 + x)- 2 }
x = 100
+ 107 ∫100 (1000 + x)- 2 dx = 107 (100/11002 + 1/1100) = 9917.
Alternately, the average severity per loss > $100 is:
100 + e(100) = 100 + (θ+100)/(α -1) = 1100 + 100 = $1200.
Expected number of losses > $100 = 10S(100) = 8.2645.
Expected annual payment = $1200(8.2645) = $9917.
Comment: Almost all questions involve the ordinary deductible, in which for a loss X larger than d,
X - d is paid. For these situations the average payment per loss is: E[X] - E[X ∧ d].
Instead, here for a large loss the whole amount is paid. This is a franchise deductible, as discussed in
the section on Policy Provisions. In this case, the average payment per loss is d S(d) more than for
the ordinary deductible or: E[X] - E[X ∧ d] + d S(d).
One can either compute the expected total amount paid per year by an insurer either as (average
payment insured receives per loss)(expected losses the insured has per year) or as (average
payment insurer makes per non-zero payment)(expected non-zero payments the insurer makes
per year). The former is ($991.7)(10) = $9917; the latter is ($1200)(8.2645) = $9917. Thus
whether one looks at it from the point of view of the insurer or the insured, one gets the same result.
18.48. B. For this uniform distribution, f(x) = 1/50000 for 0≤x≤50000. The payment by the insurer
depends as follows on the size of loss x:
Insurerʼs Payment
0 x ≤ 5000
x - 5000 5000 ≤ x ≤ 25000
20000 x ≥ 25000
We need to compute the ratio of two quantities, the average dollars paid by the insurer per loss and
the probability that a loss will result in a nonzero payment. The latter is the chance that x > 5000,
which is: 1 - (5000/50000) = 0.9. The former is the sum of two terms corresponding to
5000 ≤ x ≤ 25000 and x > 25000:
25,000 25,000
∫ (x - 5000) f(x) dx + 20000 S(25000) =

∫ (x - 5000) (1/ 50,000) dx x + 20,000(1 - 0.5) =
5000 5000
x = 25,000
2
(x - 5000) / 100,000 ] + 10,000 = 4000 + 10,000 = 14,000.
x = 5000
Thus the average nonzero payment by the insurer is: 14,000 / 0.9 = 15,556.
Alternately, S(x) = 1 - x/50,000, x < 50,000.
The average payment per (nonzero) payment is:
(E[X ∧ L] - E[X ∧ d]) / S(d) = (E[X ∧ 25000] - E[X ∧ 5000]) / S(5000) =
25,000 25,000
∫ S(x) dx / S(5000) =
∫ (1 - x / 50,000) dx / 0.9 = (20000 - 6250 + 250)/0.9 = 15,556.
5000 5000
18.49. For the layer from $100,000 to $200,000, the expected number of payments is:
100 S(100,000) = 39.7.
The expected losses are: (100) (E[X ∧ 200,000] - E[X ∧ 100,000]) = $3,145,400.
The average payment per payment in the layer is: 3,145,400/39.7 = $79,229.
For the layer from $200,000 to $500,000, the expected number of payments is:
100 S(200,000) = 25.2.
The expected losses are: (100) (E[X ∧ 500,000] - E[X ∧ 200,000]) = $5,007,000.
For the layer from $500,000 to $1,000,000, the expected number of payments is:
100 S(500,000) = 11.5.
The expected losses are: (100) (E[X ∧ 1,000,000] - E[X ∧ 500,000]) = $3,990,300.
18.50. C. The insurer pays the dollars of loss excess of $15,000, which are:
E[X] - E[X ∧ 15000] = E[X ∧ ∞] - E[X ∧ 15000]. The number of non-zero payments is
1 - F(15000). Thus the average nonzero payment is:
(E[X] - E[X ∧ 15000])/ (1 - F(15000)) = (20,000 - 7700)/(1-.7) = 12300 / 0.3 = 41,000.
18.51. E. Since 40% of the losses exceed a deductible of 10,000 and half of 40% is 20%, the
new deductible is 22,500 which is exceeded by 20% of the losses.
In other words, S(22,500) = 20% = 40%/ 2 = S(10,000) / 2.
For a deductible of size d, the expected size of a nonzero payment made by the insurer is (E[X] -
E[X ∧ d])/{ 1 - F(d) } = e(d) = the mean excess loss at d.
e(10,000) = (20,000 - 6000) / (1-.6) = 35,000.
e(22,500) = (20,000 - 9500) / (1-.8) = 52,500.
52,500 / 35,000 = 1.5 or a 50% increase.
Comment: One can do the problem without using the specific numbers in the Loss Size column.
18.52. For this density, the survival function is:

∞
t =∞
S(x) = ∫x 0.01(1 - q + 0.01qt) e- 0.01t dt = -e- 0.01t - 0.01qt e - ]
0.01t
t=x
= e-0.01x + 0.01qxe-0.01x. We will also need integrals of xf(x):

∞ ∞
∫x t f(t) dt = 0.01 ∫x (1 - q)t e- 0.01t + 0.01q t2 e- 0.01t dt =

t =∞
-0.01e- 0.01t {(1- q)(100t + 1002) + q(t2 + 2t100 + (2)1002)} ] =
t=x
e-0.01x {(1-q)(x+100) + q(.01x2 +2x + 200)} = e-0.01x {((x+100) + q(.01x2 +x + 100)}.

First, given q, calculate the average value of a non-zero payment, given a deductible of 100.
∞ ∞
We need to compute: ∫100 (t - 100) f(t) dt / S(100) = 100∫ t f(t) dt / S(100) - 100 =
{e-1 {200 + q(300)}} / {(1 + q)e-1} -100 = {200 + 300q} / (1 + q) -100.
Setting this equal to 125, one can solve for q: {200 + 300q} /(1 + q) -100 = 125.
225(1 + q) = 200 + 300q. ⇒ q = 25/75 = 1/3.
Now the average non-zero payment, given a deductible of 200 is:
∞
∫200 t f(t) dt / S(200) - 200 = {e-2 {300 + (700/3)}} / {(1 + 2/3)e-2} - 200 = (1600/3)/(5/3) - 200 =
= 320 - 200 = 120.
Alternately, the given density is a mixture of an Exponential with θ = 100, given weight 1-q, and a
Gamma Distribution with parameters α = 2 and θ =100, given weight q.
The mean for the Exponential Distribution is 100.
The mean for this Gamma Distribution is (20)(100) = 200.
Thus, the mean for the mixed distribution is: (1-q)(100) + 200q = 100 + 100q.
For this Exponential Distribution, E[X ∧ x] = 100 (1- e-0.01x).
For this Gamma Distribution, E[X ∧ x] = 200 Γ(3; .01x) + x {1 - Γ(2; .01x)}.
Making use of Theorem A.1 in Appendix A of Loss Models,
Γ(3; 0.01x) = 1 - e-0.01x{1 + 0.01x + .00005x2 } and
Γ(2; 0.01x) = 1 - e-0.01x{1 + 0.01x}. Therefore, for this Gamma Distribution,

E[X ∧ x] = 200 - e-0.01x{200 + 2x + 0.01x2 } + e-0.01x{x + 0.01x2 } = 200 - e-0.01x(200 + x).
Thus, for the mixed distribution, E[X ∧ x] =
q{200 - e-0.01x(200 + x)} +(1-q)100(1- e-0.01x) =
100(1-e-0.01x) + q(100 - 100e-0.01x - xe-0.01x).
For this Exponential Distribution, S(x) = e-0.01x.
For this Gamma Distribution, S(x) = 1 - Γ(2; .01x) = e-0.01x{1 + .01x}.
Thus for the mixed distribution, S(x) = q{e-0.01x(1 + .01x)} +(1-q) e-0.01x =
e-0.01x + q.01x e-0.01x.
The expected non-zero payment given a deductible of size x is:
(E[X] - E[X ∧ x] )/S(x) = {100e-0.01x + q(100+x)e-0.01x}/{ e-0.01x + q.01x e-0.01x} =
{100 +q(100+x)} / {1 + 0.01xq}.
Thus for a deductible of 100, the average non-zero payment is:
(100+200q)/(1+q). Setting this equal to 125 and solving for q,
125 = (100+200q)/(1+q). ⇒ q = 25/75 = 1/3.
Thus for a deductible of 200, the average non-zero payment is:
{100+(300/3)} / (1+2/3) = 200/(5/3) = 120.
18.53. D. Average ground-up loss = E[X] = 11,100.

Under policy A, average amount paid per loss = E[X] - E[X ∧ 5000] = 6500.
Therefore, E[X ∧ 5000] = 11100 - 6500 = 4600.
Under policy A, average amount paid per payment =
(E[X] - E[X ∧ 5000])/S(5000) = 10000.
Therefore, S(5000) = 6500/ 10000 = .65.
Given that a loss less than or equal to 5,000 has occurred, the expected payment under policy B =
average loss of size between 0 and 5000 =
5000
∫0 x f(x) dx / F(5000) = {E[X ∧ 5000] - 5000S(5000)}/F(5000) =
{4600 - (5000)(0.65)} / (1 - 0.65) = 1350/.35 = 3857.

Comment: F, S, f, the mean, and the Limited Expected Value, are all for the ground-up unlimited
losses of the jewelry store, whether or not it has insurance.
18.54. E. Under policy A, with an ordinary deductible of 5,000 with no maximum covered loss, the
expected amount paid per loss is: E[X] - E[X ∧ 5000] = 6,500. Under policy A, the
the expected amount paid per payment is: (E[X] - E[X ∧ 5000])/S(5000) = 10,000.
Therefore, S(5000) = 6500/10000 = 0.65. Given that a loss has occurred, the payment under
policy B, with no deductible and a policy limit of 5,000, is 5,000 if and only if the original loss is 5000
or more. The probability of this is S(5000) = 0.65.
18.55. C. An Exponential Distribution with θ = 5000. E[X ∧ x] = θ(1 - e-x/θ) = 5000(1 - e-x/5000).
E[X ∧ 500] = 5000(1 - e-0.1) = 475.8. E[X ∧ 25000] = 5000(1 - e-5) = 4966.3. E[X] = θ = 5000.
Average payment per loss before: E[X] = 5000.
Average payment per loss after: E[X ∧ 25000] - E[X ∧ 500] = 4966.3 - 475.8 = 4490.5.
Average savings per loss: 5000 - 4490.5 = 509.5.
18.56. E. By integrating f(x), F(x) = .01x2 , 0 < x < 10. S(4) = 1 - (0.01)(42 ) = 0.84.
10 10
x = 10
E[X] = ∫0 S(x) dx =
∫0 (1 - 0.01x2) dx = x - 0.01x3 / 3 ]
x =0
= 6.667
4 4
x=4
E[X ∧ 4] = ∫0 ∫0
S(x) dx = (1 - 0.01x2 ) dx = x - 0.01x3 / 3 ]
x =0
= 3.787
E[YP] = (E[X] - E[X ∧ 4])/S(4) = (6.667 - 3.787)/0.84 = 3.43.
18.57. B. Let m be the maximum covered loss. 60 = 0.8(m - 20). ⇒ m = 95.

The insurance pays 80% of the layer from 20 to 95.
95 95
Expected payment per loss is: 0.8 ∫20 S(x) dx = 0.820∫ 1 - x2 / 10000 dx = 0.8(75 - 28.31) = 37.35.
Expected payment per payment is: 37.35/S(20) = 37.35/(1 - 0.22 ) = 38.91.
Alternately, f(x) = x/5000, 0 ≤ x ≤ 100.
95
E[X ∧ 95] = ∫ x f(x) dx + 95 S(95) = 57.16 + (95)(1 - 0.952 ) = 66.42.
0
20
E[X ∧ 20] = ∫ x f(x) dx + 20 S(20) = 0.533 + (20)(1 - 0.202 ) = 19.73.
0
E[YP] = 0.8 {E[X ∧ 95] - E[X ∧ 20]} / S(20) = 0.8 (66.42 - 19.73) / 0.96 = 38.91.
18.58. a. E[X ∧ 1 million] = 7,750,000/10 = $775,000.

E[X ∧ 5 million] = 24,450,000/10 = $2,445,000. $2,445,000/$775,000 = 3.155.
Claim Loss Limited to 1 million Limited to 5 million
A $250,000 $250,000 $250,000
B $300,000 $300,000 $300,000
C $450,000 $450,000 $450,000
D $750,000 $750,000 $750,000
E $1,200,000 $1,000,000 $1,200,000
F $2,500,000 $1,000,000 $2,500,000
G $4,000,000 $1,000,000 $4,000,000
H $7,500,000 $1,000,000 $5,000,000
I $9,000,000 $1,000,000 $5,000,000
J $15,000,000 $1,000,000 $5,000,000
Sum $40,950,000 $7,750,000 $24,450,000
b. With a deductible of $1 million there are 6 non-zero payments out of 10 losses.
Average payment per payment is: ($2,445,000 - $775,000)/0.6 = $2,783,333.
Alternately, the six non-zero payments are in millions: 0.2, 1.5, 3, 4, 4, 4.
(0.2 + 1.5 + 3 + 4 + 4 + 4)/6 = 16.7 million/6 = $2,783,333.
Comment: The solution to part a is one way to determine the $5 million increased limit factor
for a basic limit of $1 million.
2016-C-2, Loss Distributions, §19 Percentiles HCM 10/21/15, Page 218
Section 19, Percentiles
The 80th percentile is the place where the distribution function is 0.80.
For a continuous distribution, the 100pt h percentile is the first value at which F(x) = p.
Exercise: Let F(x) = 1 - e-x/10. Find the 75th percentile of this distribution.
[Solution: 0.75 = 1 - e-x/10. ⇒ x = -10 ln(1 - 0.75) = 13.86.
Comment: Check: 1 - e-13.86/10 = 1 - 0.250 = 0.75.]
The Value at Risk, VaRp , is defined as the 100pth percentile.78

In Appendix A of the Tables attached to the exam, there are formulas for VaRp (X) for many of the
distributions: Exponential, Pareto, Single Parameter Pareto, Weibull, Loglogistic, Inverse Pareto,
Inverse Weibull, Burr, Inverse Burr, Inverse Exponential, Paralogistic, Inverse Paralogistic.
One can use these formulas for VaRp in order to determine percentiles.
For example, for the Exponential Distribution as shown in Appendix A: VaRp (X) = -θ ln(1-p).
Thus in the previous exercise, VaR0.75 = (-10) ln(0.25) = 13.86.
The 50t h percentile is the median, the place where the distribution function is 0.50.
The 25th percentile is the lower or first quartile.

The 50th percentile is the middle or second quartile.
The 75th percentile is the upper or third quartile.79
Exercise: What is the 90th percentile of a Weibull Distribution with parameters τ = 3 and θ = 1000?
[Solution: F(x) = 1 - exp[-(x/θ)τ] = 1 - exp[-(x/1000)3 ]. Set F(x) = 0.90 and solve for x.
0.90 = 1 - exp[-(x/1000)3 ]. -ln[0.10] = (x/1000)3 . x = 1000 ln[10]1/3 = 1321.
Alternately, as shown in Appendix A: VaRp (X) = θ {-ln(1-p)}1/τ.
VaR0.90 = (1000) {-ln(0.1)}1/3 = 1321.]
78
As discussed in “Mahlerʼs Guide to Risk Measures.”
79
The difference between the 75th and 25th percentiles is called the interquartile range.
Percentiles of Discrete Distributions:
A more precise mathematical definition also covers situations other than the continuous loss
distributions:80 The 100pth percentile of a distribution F(x) is any number, πp such that:
F(πp -) ≤ p ≤ F(πp ), where F(y-) is the limit of F(x) as x approaches y from below.
Exercise: Let a distribution be such that there is 30% chance of a loss of $100, a 50% chance of a
loss of $200, and a 20% chance of a loss of $500.
Determine the 70th and 80th percentiles of this distribution.
[Solution: F(100) = 0.3, F(200) = 0.8. Since F(x) = 0.3 for 100 ≤ x < 200, F(200-) = 0.3.
Thus, F(200-) ≤ 0.7 ≤ F(200), so that π.70 = 200. 200 is the first value at which F(x) > 0.7.
F(x) = 0.8 for 200 ≤ x < 500, so that 200 ≤ π.80 ≤ 500.
Comment: Since there is a value at which F(x) = 0.8, there is no unique value of the 80th percentile
for this discrete distribution. For example, F(200-) =0.3 ≤ 0.8 = F(200), F(300-) = 0.8 = F(300), and
F(500-) = 0.8 ≤ 1.0 = F(500). Thus each of 200, 300 and 500 satisfy the definition of the 80th
percentile. In this case I would use 200 as the 80th percentile.]
For a discrete distribution, take the 100pth percentile as the first value at which F(x) ≥ p.
Quantiles:
The 95th percentile is also referred to as Q0.95, the 95% quantile.
25th percentile ⇔ Q0.25 ⇔ 25% quantile ⇔ first quartile.
50th percentile ⇔ Q0.50 ⇔ 50% quantile ⇔ median.
75th percentile ⇔ Q0.75 ⇔ 75% quantile ⇔ third quartile.
90th percentile ⇔ Q0.90 ⇔ 90% quantile.
80
Definition 3.6 of Loss Models.
Problems:
19.1 (1 point) What is the 90th percentile of a Pareto Distribution with parameters α = 3 and
θ = 100?
A. less than 120
E. at least 135
19.2 (1 point) Severity is Exponential.

What is the ratio of the 95th percentile to the median?
A. 4.1 B. 4.3 C. 4.5 D. 4.7 E. 4.9
19.3 (1 point) What is the 80th percentile of a Weibull Distribution with parameters τ = 2 and
θ = 100?
A. less than 120
E. at least 135
19.4 (2, 5/85, Q.4) (1.5 points) Let the continuous random variable X have the density function f(x)
as shown in the figure below:
f(x)
(a, 1/4)
1/4
x
(0,0) a
What is the 25th percentile of the distribution of X ?

A. 2 B. 4 C. 8 D. 16 E. 32

(i) F(x) = 1 - (θ/x)a where a > 0, θ > 0, x > θ.
(ii) The 90th percentile is (2)(101/4).
(iii) The 99th percentile is (2)(1001/4).
Determine the median of the distribution.
(A) 21/4 (B) 2(21/4) (C) 2 (D) 2 2 (E) 4 2
19.6 (2, 5/92, Q.10) (1.7 points) Let Y be a continuous random variable with cumulative
distribution function F(y) = 1 - exp[-(y - a)2 /2], y > a, where a is a constant.
What is the 75th percentile of Y?
A. F(0.75) B. a - 2 ln(4/ 3) C. a + 2 ln(4/ 3) D. a - 2 ln(2) E. a + 2 ln(2)
19.7 (IOA 101, 9/01, Q.1) (1.5 points) Data were collected on 100 consecutive days for the
number of claims, x, arising from a group of policies.
This resulted in the following frequency distribution
x: 0 1 2 3 4 ≥5
f: 14 25 26 18 12 5
Calculate the median, 25th percentile, and 75th percentile for these data.
19.1. A. F(x) = 1 - {100/(100+x)}3 . Set F(x) = 0.90 and solve for x.

0.90 = 1 - {100/(100+x)}3 . ⇒ x = 100 {(1-.9)-1/3 -1} = 115.4.
Alternately, for the Pareto Distribution: VaRp (X) = θ [(1-p)-1/α - 1].
VaR0.9 = (100) {(0.1)-1/3 - 1} = 115.4.
Comment: Check. F(115.4) = 1 - {100/(100 + 115.4)}3 = 0.900.
19.2. B. F(x) = 1 - e-x/θ. Set F(x) = 0.5 to find the median. median = -θln(1 - 0.5) = 0.693θ.
Set F(x) = 0.95 to find the 95th percentile. 95th percentile = -θln(1 - 0.95) = 2.996θ.
Ratio of the 95th percentile to the median = 2.996θ/.693θ = 4.32.
Alternately, for the Exponential Distribution as shown in Appendix A: VaRp (X) = -θ ln(1-p).
VaR0.95 / VaR0.5 = ln(1-0.95)/ ln(1-0.5) = 4.32.
19.3. C. F(x) = 1 - exp[-(x/100)2 ]. Set F(x) = 0.80 and solve for x.

0.80 = 1 - exp[-(x/100)2 ]. ⇒ ln(0.2) = -(x/100)2 . ⇒ x = 100 {-ln(0.2)}1/2 = 126.9.
Alternately, as shown in Appendix A: VaRp (X) = θ {-ln(1-p)}1/τ.
VaR0.80 = (100) {-ln(0.2)}1/2 = 126.9.
Comment: Check. F(126.9) = 1 - exp[-(126.9/100)2 ] = 0.800.
19.4. B. The area under the density must be: 1. (1/2)(a)(1/4) = 1. ⇒ a = 8.

The 25th percentile is where F(x) = 0.25, or where 1/4 of the area below the density
is to the left. This occurs at: a/2 = 8/2 = 4.
19.5. B. 0.90 = F[2(101/4)] = 1 - (θ/{2(101/4)})a. ⇒ -ln10 = alnθ - aln2 - (a/4)ln10.
0.99 = F[2(1001/4)] = 1 - (θ/{2(1001/4)})a. ⇒ -2ln10 = alnθ - aln2 - (a/2)ln10.
Subtracting the two equations: ln10 = (a/4)ln(10). ⇒ a = 4. ⇒ θ = 2.
.5 = F(x) = 1 - (2/x)4 . ⇒ x = 2(21 / 4).

Alternately, this is a Single Parameter Pareto Distribution.
As shown in Appendix A: VaRp (X) = θ (1- p) - 1/ α .
Therefore, (2)(101/4) = θ (0.1) -1/ a , and

(2)(1001/4) = θ (0.01) - 1/a .
Dividing the second equation by the first equation: 101/4 = (0.1)-1/a. ⇒ a = 4. ⇒ θ = 2.
19.6. E. Set F(x) = 0.75. exp[-(y - a)2 /2] = 0.25. ⇒ -(y - a)2 /2 = -ln(4). ⇒ (y - a)2 = 4ln(2).
⇒ y = a + 2 ln(2) .
19.7. For a discrete distribution, the median is the first place the Distribution Function is at least 50%.
x: 0 1 2 3 4
F: 0.14 0.39 0.65 0.83 0.95
Thus the median is 2. Similarly, the 25th percentile is 1 and the 75th percentile is 3.
2016-C-2, Loss Distributions, §20 Definitions HCM 10/21/15, Page 224
Section 20, Definitions
Those definitions that are not discussed elsewhere are included here.
Cumulative Distribution Function and Survival Function:81
Cumulative Distribution Function of X ⇔ cdf of X ⇔ Distribution Function of X ⇔

F(x) = Prob[X ≤ x].
The distribution function is defined on the real line and satisfies:

1. 0 ≤ F(x) ≤ 1.
2. F(x) is nondecreasing; F(x) ≤ F(y) for x < y.
3. F(x) is right continuous; lim F(x + ε) = F(x).
ε→0
4. F(-∞) = 0 and F(∞) = 1; lim F(x) = 0 and lim F(x) = 1.
x → -∞ x→∞
Most theoretical size of loss distributions, such as the Exponential, F(x) = 1 - e-x/θ, are continuous,
increasing functions, with F(0) = 0.
The survival function, S(x) = 1 - F(x) = Prob[X > x]. 0 ≤ S(x) ≤ 1.

S(x) is nonincreasing. S(x) is left continuous. S(-∞) = 1 and S(∞) = 0.
For the Exponential, S(x) = e-x/θ, is a continuous, decreasing function, with S(0) = 1.
Discrete, Continuous, and Mixed Random Variables:82
Discrete Random Variable ⇔ support is finite or countable
Continuous Random Variable ⇔ support is an interval or a union of intervals
Mixed Random Variable83 ⇔ combination of discrete and continuous random variables
Examples of Discrete Random Variables: Frequency Distributions, Made-up Loss Distributions

such as 70% chance of 100 and 30% chance of 500.
81
See Definitions 2.1 and 2.4 in Loss Models.
82
See Definition 2.3 at Loss Models. The support is the set of input values for the distribution; for a loss distribution
it is the set of possible sizes of loss. The set of integers is countable. The set of real numbers is not countable.
83
This is a different use of the term mixed than for n-point mixtures of Loss Distributions; in an n-point mixture one
weights together n individual distributions in order to create a new distribution.
Examples of Continuous Random Variables: Loss Distributions in Appendix A of Loss Models

such as the Exponential, Pareto, Gamma, LogNormal, Weibull, etc.
Examples of Mixed Random Variables: Loss Distributions censored from above (censored from
the right) by a maximum covered loss; there is a point mass of probability at the maximum covered
loss.
Probability Function ⇔ Probability Mass Function ⇔ probability density function for a discrete
distribution or a point mass of probability for a mixed random variable ⇔

p X(x) = Prob[X = x].84
Loss Events, etc.:
A loss event or claim is an incident in which someone suffers damages which result in an economic
loss. For example, an insured driver may damage his car in an accident, an insured business may
have its factory damaged by fire, someone with health insurance may enter the hospital, etc. Each of
these is a loss event.
On this exam, do not distinguish between the illness or accident that caused the insured to enter the
hospital, the entry into the hospital or the bill from the hospital; each or all of these combined could
be considered the loss event. Quite often I will refer to a loss event as an accident. However, loss
events can involve a death, illness, natural event, etc., with no accident involved.
On this exam, the term “claim” is not distinguished from a “loss event”. However, in common usage
the term claim is reserved for those situations where someone actually contacts the insurer
and asks for a payment. So for example, if an insured with a $5000 deductible suffered $1000
of otherwise covered damage to his home, he would probably not even bother to inform the
insurer. This is a loss event, but in common usage it would not be called a claim. One
common definition of a claim is: “A claim is a demand for payment by an insured or by an allegedly
injured third party under the terms and conditions of an insurance contract.” 85 The same
mathematical methods can be applied to self-insureds.
On this exam, do not distinguish between first party and third party claims. For example, if Bob
is driving his car and it hits Sueʼs car, then Sue may make a claim against Bobʼs insurer. Under
liability insurance, there is no requirement that a loss event or claim be made by an insured.
For example, a customer may slip and fall in a store and sue the store. The storeʼs insurer may
have to pay the customer.
84
85
Foundations of Casualty Actuarial Science, Chapter 2.
The loss, size of loss, or severity, is the dollar amount of damage as a result of a loss event. The
loss may be zero. If an insured suffers $20,000 of damage and the insured has a $5,000
deductible, then the insurer would only pay the insured $15,000. However, the (size of) loss is
$20,000, the amount of damage suffered by the insured. If the insured suffered only $1000 of
damage, then the insurer would pay nothing due to the $5000 deductible, but the (size of) loss is
$1000.
A payment event is an incident in which someone receives a (non-zero) payment as a result of a

loss event covered by an insurance contract.
The amount paid is the actual dollar amount paid as a result of a loss event or a payment event. If it
is as the result of a loss event, the amount paid may be zero.
So if an insuredʼs home suffers $20,000 of damage and the insured has a $5,000 deductible, then
the insurer would pay the insured $15,000. The amount paid is $15,000. If the insured suffered
$1000 damage, then the insurer would pay nothing due to the $5000 deductible. The amount paid
is 0.
If an injured worker makes a claim for Workers Compensation benefits, but it is found that the
injury is not work related, then the amount paid may be zero. If a doctor is sued for medical
malpractice, quite often the claim is closed without payment; the amount paid can be zero.
The allocated loss adjustment expense (ALAE) is the amount of expense incurred directly as
a result of a loss event.
Loss adjustment expenses (LAE) which can be directly related to a specific claim are classified
as ALAE, while those that can not are classified as unallocated loss adjustment expense
(ULAE).86 Examples of ALAE are fees for defense attorneys, expert witnesses for the defense,
medical evaluations, court costs, laboratory and x-ray costs, etc. Quite often claims closed
without (loss) payment, will have a positive amount of ALAE, sometimes a very large amount.
Note that any loss payment is not a part of the ALAE and vice versa.
A loss distribution is the probability distribution of either the loss or the amount paid from a loss
event or of the amount paid from a payment event. The distribution may or may not exclude
payments of zero and may or may not include ALAE.
86
For specific lines of insurance the distinction between ALAE and ULAE may depend on the statistical plan or
reporting requirement.
Loss distributions can be discrete. For example, a 20% chance of $100 and an 80% chance of
$500. The loss distributions in the Appendix A of Loss Models, such as the Exponential Distribution
or the Pareto Distribution, are all continuous and all exclude the chance of a loss of size zero. Thus in
order to model losses for lines of insurance with many claims closed without payment one
would have to include a point mass of probability at zero. On Liability lines of insurance, losses
and ALAE are often reported together, so they are frequently modeled together.
Frequency, Severity, and Exposure:
The frequency is the number of losses or number of payments random variable. Its expected
value is called the mean frequency. Unless indicated otherwise the frequency is for one exposure
unit. Frequency distributions are discrete with support on all or part of the non-negative integers.
They can be “made-up”, or can be named distributions, such as the Poisson or Binomial
Distributions.87
The severity can be either the loss or amount paid random variable. Its expected value is called the
mean severity.
Severity and frequency together determine the aggregate amount paid by an insurer. The number
of claims or loss events determines how many times we take random draws from the severity
variable. Note that first we determine the number of losses from the frequency distribution and then
we determine the size of each loss. Thus frequency and severity do not enter into the aggregate
loss distribution in a symmetric manner. This will be seen for example when we calculate the variance
of the aggregate losses.88
The exposure base is the basic unit of measurement upon which premiums are determined.
For example, insuring one automobile for one year is a car-year of exposure. Insuring $100 dollars
of payroll in Workersʼ Compensation is the unit of exposure. So if the rate for carpenters in State X
were $4 per $100 of payroll, insuring the “Close to You Carpenters”, which paid its carpenters
$250,000 per year in total, would cost:
(250,000/100)($4) = $10,000 per year.89
87
See “Mahlerʼs Guide to Frequency Distributions,” and Appendix B of Loss Models.
88
89
This is a simplified example.
Some Definitions from Joint Principles of Actuarial Science:90
Phenomena ⇔ occurrences that can be observed
Experiment ⇔ observation of a given phenomena under specified conditions
Event ⇔ set of one or more possible outcomes
Stochastic phenomenon ⇔ more than one possible outcome
contingent event ⇔ outcome of a stochastic phenomenon; more than one possible outcome
probability ⇔ measure of the likelihood of an event, on a scale from 0 to 1
random variable ⇔ function that assigns a numerical value to every possible outcome
Data Dependent Distributions:
Data-Dependent Distributions ⇔
complexity ≥ that of the data; complexity increases as the sample size increases.91
The most important example of a data-dependent distribution is the Empirical Distribution Function.
Another example is Kernel Smoothing.
As discussed previously, the empirical model assigns probability 1/n to each of n observed
values. For example, with the following observations: 81, 157, 213, the probability function (pdf) of
the corresponding empirical model is: p(81) = 1/3, p(157) = 1/3, p(213) = 1/3.
As discussed previously, the corresponding Empirical Distribution Function is:

F(x) = 0 for x < 81, F(x) = 1/3 for 81 ≤ x < 157, F(x) = 2/3 for 157 ≤ x < 213, F(x) = 1 for 213 ≤ x.
In a Kernel Smoothing Model, the data is smoothed using a “kernel” function.92
90
91
92
Kernel smoothing is covered in “Mahlerʼs Guide to Fitting Loss Distributions.”
2016-C-2, Loss Distributions, §21 Parameters of Distributions HCM 10/21/15, Page 229
Section 21, Parameters of Distributions
The probability that a loss is less than or equal to x is given by a distribution function F(x).
For a given type of distribution, in addition to the size of loss x, F(x) depends on
what are called parameters. Each type of distribution has a set of parameters, which enter into F(x)
and its derivative f(x) in a manner particular to that type of distribution.
For example, the Pareto Distribution has a set of two parameters α and θ. For fixed α and θ, F(x) is a
function of x. For α = 2 and θ = 10, the Pareto Distribution is given by:
⎛ θ ⎞α
F(x) = 1 - ⎜ ⎟ = 1 - (1 + x / 10)-2.
⎝ θ + x⎠
The Pareto Distribution is an example of a parametric distribution.93
The parameter(s) tell you which member of the family one has. For example, if one has a Pareto
Distribution with parameters α = 2 and θ = 10, the Distribution is completely described. In addition
one needs to know the support of the distributions in the family. For example, all Pareto Distributions
have support x > 0. Finally, one needs to know the set from which the parameter or parameters
may be drawn. For the Pareto Distribution, α > 0 and
θ > 0.
It is useful to group distributions based on how many parameters they have. Those in the
Appendix Loss Models have one, two, three or even occasionally four parameters. For example,
the Exponential has one parameter, the Gamma has two parameters, while the Transformed
Gamma has three parameters.
It is also useful to divide parameters into those that are related to the scale of the distribution and
those that are related to the shape of a parameter. A scale parameter is a parameter which divides
x everywhere it appears in the distribution function. For example, θ is a scale parameter for the
Pareto distribution: F(x) = 1 - {θ/(θ+x)}α = 1 - (1 + x / θ)−α. θ normalizes the scale of x, so that one
standard distribution can fit otherwise similar data sets. Thus for example, different units of
measurement (dollars, yen, pounds, marks, etc.) or the effects of inflation can be easily
accommodated by changing the scale parameter.
A scale parameter will appear to the nth power in the formula for the nth moment of the distribution.
Thus, the nth moment of the Pareto has a factor of θn .
93
Also, a scale parameter will not appear in the coefficient of variation, the skewness, or kurtosis. The
coefficient of variation, the skewness, and the kurtosis each measure the shape of the distribution.
Thus a change in scale should not affect the shape of the fitted Pareto; θ does not appear in the
coefficient of variation or the skewness of the Pareto. However, α does, and thus alpha is called a
shape parameter.
A parameter such as µ in the Normal distribution is referred to as a location parameter. Altering

µ shifts the whole distribution to the left or right.
Gamma Distribution, an Example:
For the Gamma Distribution, α is the shape parameter and θ is the scale parameter.
Here are graphs of four different Gamma Distributions, each with a mean of 200:
Prob. Exponential, alpha= 1 Prob. alpha= 2

0.005
0.004 0.003
0.003
0.002
0.002
0.001
0.001
x x
100 200 300 400 500 600 100 200 300 400 500 600
Prob. alpha= 4 Prob. alpha= 10
0.004 0.006
0.005
0.003
0.004
0.002 0.003
0.002
0.001
0.001
x x
100 200 300 400 500 600 100 200 300 400 500 600
Advantages of Parametric Estimation:94
Parametric estimation has the following the following advantages:

1. Accuracy.
Maximum likelihood estimators of parameters have good properties.
2. Inferences can be made beyond the population that generated the data.
For example, even if the largest observed loss is 100, we can estimate the chance that the next
observed loss will have a size greater than 100.
3. Parsimony.
Distributions can be summarized using only a few parameters.
4. Allows Hypothesis Tests.95
5. Scale parameters/families.
Allow one to more easily handle the effects of (uniform) inflation.
Some Desirable Properties of Size of Loss Distributions:96
As a model for claim sizes. in order to be practical, a distribution should have the following
desirable characteristics:
1. The estimate of the mean should be efficient and reasonably easy to use.
2. A confidence interval about the mean should be calculable.
3. All moments of the distribution function should exist.
4. The characteristic function can be written in closed form.
As will be seen, some distributions such as the Pareto do not have all of their moments. While this
does not prevent their use, it does indicate some caution must be exercised. For the LogNormal,
the characteristic can not be written in closed form.
94
See Section 2.6 of the first edition of Loss Models, not on the syllabus.
95
For example through the use of the Chi-Square Statistic or the Kolmogorov-Smirnov Statistic, discussed in
“Mahlerʼs Guide to Fitting Loss Distributions.”
96
See “Estimating Pure Premiums by Layer - An Approach,” by Robert J. Finger, Discussion by Lee R. Steeneck,
PCAS 1976, not on the syllabus
Problems:
21.1 (1 point) Which of the following are true with respect to the application in Property/Casualty
Insurance of theoretical size of loss distributions?
1. When data is extensive, theoretical distributions are not essential.
2. Inferences can be made beyond the population that generated the data.
3. The inconvenience restricts their use to a unusual circumstances.
A. 1 B. 2 C. 3 D. 2, 3 E. None of A, B, C, or D
21.2 (4B, 5/95, Q.29) (1 point) Which of the following are reasons for the importance of theoretical
distributions?
1. They permit calculations to be made without the formulation of a model.
2. They are completely summarized by a small number of parameters.
3. Their convenience for mathematical manipulation allows for the development of useful
theoretical results.
A. 2 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
21.1. B. 1. F. Even when data are extensive theoretical distributions may be essential, depending
on the question to be answered. For example, theoretical distributions may be essential to estimate
the tail of the distribution. 2. T. 3. F.
21.2. D. 1. F. 2. T. 3. T.
2016-C-2, Loss Distributions, §22 Exponential HCM 10/21/15, Page 234
Section 22, Exponential Distribution
This single parameter distribution is extremely simple to work with and thus appears in many exam
questions. In many practical applications, the Exponential doesnʼt provide enough flexibility to fit size
of loss data. Thus, it is much more common to use the Gamma or Weibull Distributions, of which the
exponential is a special case97. Following is a summary of the Exponential Distribution.
Exponential Distribution
Support: x > 0 Parameter: θ > 0 ( scale parameter)
D. f. : F(x) = 1 - e-x/θ F(x) = 1 - e-λx.
P. d. f. : f(x) = e-x/θ / θ f(x) = λe-λx.
1
Moments: E[Xn] = n! θn Moment Generating Function: ,t<θ
1 - θt
Mean = θ Hazard rate = f(x)/S(x) = λ = 1/θ.
Variance = θ2
Coefficient of Variation = Standard Deviation / Mean = 1

Skewness = 2 Kurtosis = 9
Mode = 0 Median = θ ln(2)
Limited Expected Value Function: E[X ∧ x] = θ (1- e-x/θ)
R(x) = Excess Ratio = e-x/θ

e(x) = Mean Excess Loss = θ
∂F(x)
Derivatives of d.f.: = x e-x/θ
∂θ
Method of Moments: θ = X .
Percentile Matching: θ = -x1 / ln(1-p1 )
Method of Maximum Likelihood: θ = X , same as method of moments.

97
For a shape parameter of unity, either the Gamma or the Weibull distributions reduce to the Exponential.
All Exponential Distributions have the same shape.

The Coefficient of Variation is always 1 and the skewness is always 2.
Hereʼs a graph of the density function of an Exponential. All Exponential Distributions look the same
except for the scale; in this case the mean is 10. Also note that while Iʼve only shown x ≤ 50, the
density is positive for all x > 0.
f(x)
0.10
0.08
0.06
0.04
0.02
x
10 20 30 40 50
The Exponential Distribution has a constant Mean Excess Loss and therefore a constant
hazard rate; it is the only continuous distribution with this “memoryless” property.
Exercise: Losses prior to any deductible follow an Exponential Distribution with θ = 8.

A policy has a deductible of size 5.
What is the distribution of non-zero payments under that policy?
[Solution: After truncating and shifting by d:
S(x + d) S(x + 5) e- (x + 5) / 8
G(x) = 1 - =1- =1- = 1 - e-x/8.
S(d) S(5) e- 5 / 8
Comment: This is an Exponential Distribution with θ = 8. ]
When an Exponential Distribution is truncated and shifted from below,

one gets the same Exponential Distribution, due to its memoryless
property. On any exam question involving an Exponential Distribution, check whether its
memoryless property helps to answer the question.98
98
Integrals Involving the Density of the Exponential Distribution:
Let f(x) = e-x/θ / θ, the density of an Exponential Distribution with mean θ.
x x x/θ
∫ tn f(t) dt = ∫ tn e- t / θ / θ dt = ∫ sn θn e-s d s = θn Γ(n+1 ; x/θ) Γ(n+1).99
0 0 0
i=n
Γ(n+1 ; x) = 1 - ∑ xi e -x / i! . Γ(n+1) = n!.
i=0
x i=n
Therefore, ∫ tn f(t) dt = θn (1 - ∑ (x /θ ) i e-x/ θ / i! ) n!.
0 i=0
x
∫ t f(t) dt = θ {1 - e-x/θ - (x/θ)e-x/θ} 1! = θ - (θ + x)e-x/θ.
0
x
∫ t2 f(t) dt = θ2 {1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2e-x/θ/2} 2! = 2θ2 - (2θ2 + 2θx + x2)e-x/θ.
0
x
∫ t3 f(t) dt = θ3 {1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2e-x/θ/2 - (x/θ)3e-x/θ/6} 3!
0
= 6θ3 - (6θ3 + 6θ2x + 3θx2 + x3 )e-x/θ.
15
Exercise: For an exponential distribution with mean 10, what is ∫7 x f(x) dx ?
15
[Solution: ∫0 x f(x) dx = (10) {1 - e-15/10 - (15/10)e-15/10} = 4.422.
7 15
∫0 x f(x) dx = (10) {1 - e-7/10 - (7/10)e-7/10} = 1.558. ∫7 x f(x) dx = 4.422 - 1.558 = 2.864.]

99
The Incomplete Gamma Function is discussed in “Mahlerʼs Guide to Frequency Distributions.”
Inverse Exponential Distribution:
If x follows an Exponential Distribution, then 1/x follows an Inverse Exponential Distribution.
θ e- x / θ
F(x) = e-θ/x. f(x) = , x > 0.
x2
The Inverse Exponential Distribution is very heavy-tailed. It fails to have a finite mean, let alone any
higher moments!
Problems:
Use the following information for the next eight questions:
Let X be an exponentially distributed random variable, the probability density function of which is:
f(x) = 8 e-8x, x ≥ 0
22.1 (1 point) Which of the following is the mean of X?

A. less than 0.06
E. at least 0.12
22.2 (1 point) Which of the following is the median of X?

A. less than 0.06
E. at least 0.12
22.3 (1 point) Which of the following is the mode of X?

A. less than 0.06
E. at least 0.12
22.4 (1 point) What is the chance that X is greater than 0.3?

A. less than 0.06
E. at least 0.12

A. less than 0.015
E. at least 0.018

A. less than 0.5
E. at least 1.1

A. less than 0
E. at least 3
22.8 (3 points) What is the kurtosis of X?

A. less than 3
E. at least 9
22.9 (1 point) Prior to the application of any deductible, losses follow an Exponential Distribution
with θ = 135. If there is a deductible of 25, what is the density of non-zero payments at 60?
A. less than 0.0045
E. at least 0.0060

• Claim sizes follow an exponential distribution with density function
f(x) = 0.1 e-0.1x , 0 < x < ∞.
• You observe 8 claims.
• The number of claims and claim sizes are independent.
Determine the probability that the largest of these claim is less than 17.
A. less than 80%
E. at least 95%
22.11 (1 point) What is F(3), for an Inverse Exponential Distribution, with θ = 10?
A. less than 3%
E. at least 9%

• Future lifetimes follow an Exponential distribution with a mean of θ.
• The force of interest is δ.
• A whole life insurance policy pays 1 upon death.
What is the actuarial present value of this insurance?
(A) e−δθ
(B) 1 / (1 + δθ)
(C) e−2δθ
(D) 1 / (1 + δθ)2
(E) None of A, B, C, or D.
22.13 (1 point) Prior to the application of any deductible, losses follow an Exponential Distribution
with θ = 25. If there is a deductible of 5, what is the variance of the non-zero payments?
A. less than 600
E. at least 750
22.14 (2 points) Prior to the application of any deductible, losses follow an Exponential Distribution
with θ = 31. There is a deductible of 10. What is the variance of amount paid by the insurer for one
loss, including the possibility that the amount paid is zero?
A. less than 900
E. at least 1050
22.15 (2 points) Size of loss is Exponential with mean θ.

Y is the minimum of N losses.
What is the distribution of Y?

• A claimant receives payments at a rate of 1 paid continuously while disabled.
• Payments start immediately.
• The length of disability follows an Exponential distribution with a mean of θ.
At the time of disability, what is the actuarial present value of these payments?
(A) 1 / (δ + θ) (B) 1 / (1 + δθ) (C) θ / (δ + θ)
(D) θ / (1 + δθ) (E) None of A, B, C, or D.
22.17 (2 points) You are given the following graph of the density of an Exponential Distribution.
Prob.
0.1
0.08
0.06
0.04
0.02
x
10 20 30 40 50
What is the third moment of this Exponential Distribution?
A. 1000 B. 2000 C. 4000 D. 6000 E. 8000
22.18 (3 points) Belle Chimes and Leif Blower are engaged to be married.
The cost of their wedding will be 110,000. They will receive 200 gifts at their wedding.
The size of each gift has distribution: F(x) = 1 - exp[-(x - 100)/500], x > 100.
What is the probability that the total value of the gifts will not exceed the cost of their wedding?
A. 6% B. 8% C. 10% D. 12% E. 14%
22.19 (5 points) Define the quartiles as the 25th, 50th, and 75th percentiles.
Define the interquartile range as the difference between the third and first quartiles, in other words as
the 75th percentile minus the 25th percentile.
Determine the interquartile range for an Exponential Distribution.
Define the Quartile Skewness Coefficient as:
(3rd quartile - 2nd quartile) - (2nd quartile - 1st quartile)
.
3rd quartile - 1st quartile
Determine the Quartile Skewness Coefficient for an Exponential Distribution,
and compare it to the skewness.
22.20 (3 points) Define the Mean Absolute Deviation as: E[ |X - E[X]| ].

Determine the Mean Absolute Deviation for an Exponential Distribution.
22.21 (1 point) The exponential distribution with a mean of 28,700 hours was used to describe the
hours to failure of a fan on diesel engines.
A diesel engine fan has gone 10,000 hours without failing.
Determine the probability of this fan lasting at least an additional 5000 hours.
22.22 (2 points) At a seismically active site, the distribution of the magnitude of earthquakes is given
by an Exponential Distribution with hazard rate ln(10).
Over the last 20 years, at that site there have been 150 earthquakes of magnitude at least 3 and
less than 5.
Estimate the probability that next year there will be an earthquake of magnitude at least 6.
22.23 (160, 11/86, Q.9) (2.1 points) X1 and X2 are independent random variables each with
Exponential distributions. The expected value of X1 is 9.5. The variance of X2 is 2.25.
Determine the probability that X1 < X2 .
(A) 2/19 (B) 3/22 (C) 3/19 (D) 3/16 (E) 2/3
22.24 (4, 5/87, Q.32) (1 point) Let X be an exponentially distributed random variable, the
probability density function of which is: f(x) = 10 exp(-10x), where x ≥ 0.
Which of the following statements regarding the mode and median of X is true?
A. The median of X is 0; the mode is 1/2.
B. The median of X is (ln 2) / 10; the mode of X is 0.
C. The median of X is 1/2; the mode of X does not exist.
D. The median of X is 1/2; the mode of X is 0.
E. The median of X is 1/10; and the mode of X is (ln 2) /10.
f(x) = λe−λx for x > 0. If the median of this distribution is 1/3, then what is λ?
A. (1/3) In(1/2) B. (1/3) In(2) C. 2 In(3/2) D. 3 In(2) E. 3

(i) Tu is the failure time random variable assuming the uniform distribution from 0 to ω.
(ii) Te is the failure time random variable assuming the exponential distribution.
(iii) Var(Tu ) = 3Var(Te ).
(iv) f(te ) / S(te ) = 0.5.
Calculate the uniform distribution parameter ω.
(A) 3 (B) 4 (C) 8 (D) 12 (E) 15
22.27 (2, 2/96, Q.40) (1.7 points) Let X1 , . . . , X100 be a random sample from an exponential
distribution with mean 1/2.
100
Determine the approximate value of P[ ∑ Xi > 57] using the Central Limit Theorem.
i =1
A. 0.08 B. 0.16 C. 0.31 D. 0.38 E. 0.46
f(x) = e-x/2/2 for x > 0. Determine the 25th percentile of the distribution of X.
A. In(4/9) B. ln(16/9) C. ln(4) D. 2 E. ln(16)

An insurer's portfolio consists of a single possible claim. You are given:
(i) the claim amount is uniformly distributed over (100, 500).
(ii) the probability that the claim occurs after time t is e-0.1t, t > 0.
(iii) the claim time and amount are independent.
(iv) the insurer's initial surplus is 20.
(v) premium income is received continuously at the rate of 40 per annum.
Determine the probability of ruin (not having enough money to pay the claim.)
(A) 0.3 (B) 0.4 (C) 0.5 (D) 0.6 (E) 0.7
(i) Two independent random variables X1 and X2 have exponential distributions with means
θ1 and θ2, respectively.
(ii) Y = X1 X2 .
Determine E[Y].
1 1 θ1 θ2 1
(A) + (B) (C) θ1 + θ2 (D) (E) θ1θ2
θ1 θ2 θ1 + θ 2 θ1 θ2
22.31 (4B, 11/99, Q.17) (2 points) Claim sizes follow a distribution with density function
f(x) = e-x, 0 < x < ∞. Determine the probability that the second claim observed will be more than
twice as large as the first claim observed.
A. e-3 B. e-2 C. 1/3 D. e-1 E. 1/2
22.32 (Course 1 Sample Exam, Q.23) (1.9 points) The value, v, of an appliance is based on
the number of years since purchase, t, as follows: v(t) = e(7- 0.2t).
If the appliance fails within seven years of purchase, a warranty pays the owner the value of the
appliance. After seven years, the warranty pays nothing. The time until failure of the appliance has
an exponential distribution with mean 10. Calculate the expected payment from the warranty.
A. 98.70 B. 109.66 C. 270.43 D. 320.78 E. 352.16
22.33 (1, 5/00, Q.3) (1.9 points) The lifetime of a printer costing 200 is exponentially distributed
with mean 2 years. The manufacturer agrees to pay a full refund to a buyer if the printer fails during
the first year following its purchase, and a one-half refund if it fails during the second year. If the
manufacturer sells 100 printers, how much should it expect to pay in refunds?
(A) 6,321 (B) 7,358 (C) 7,869 (D) 10,256 (E) 12,642
22.34 (1, 5/00, Q.18) (1.9 points) An insurance policy reimburses dental expense, X, up to a
maximum benefit of 250. The probability density function for X is: c e-0.004x for x > 0,
where c is a constant. Calculate the median benefit for this policy.
(A) 161 (B) 165 (C) 173 (D) 182 (E) 250
22.35 (1, 11/00, Q.9) (1.9 points) An insurance company sells an auto insurance policy that covers
losses incurred by a policyholder, subject to a deductible of 100.
Losses incurred follow an exponential distribution with mean 300.
What is the 95th percentile of actual losses that exceed the deductible?
(A) 600 (B) 700 (C) 800 (D) 900 (E) 1000
22.36 (1, 11/00, Q.14) (1.9 points) A piece of equipment is being insured against early failure.
The time from purchase until failure of the equipment is exponentially distributed with mean 10
years. The insurance will pay an amount x if the equipment fails during the first year, and it will pay
0.5x if failure occurs during the second or third year. If failure occurs after the first three years, no
payment will be made. At what level must x be set if the expected payment made under this
insurance is to be 1000?
(A) 3858 (B) 4449 (C) 5382 (D) 5644 (E) 7235
22.37 (1, 5/01, Q.20) (1.9 points) A device that continuously measures and records seismic
activity is placed in a remote region. The time, T, to failure of this device is exponentially distributed
with mean 3 years. Since the device will not be monitored during its first two years of service, the
time to discovery of its failure is X = max(T, 2). Determine E[X] .
(A) 2 + e-6/3 (B) 2 - 2e-2/3 + 5e-4/3 (C) 3 (D) 2 + 3e-2/3 (E) 5
22.38 (1, 5/01, Q.32) (1.9 points) A company has two electric generators. The time until failure for
each generator follows an exponential distribution with mean 10. The company will begin using the
second generator immediately after the first one fails.
What is the variance of the total time that the generators produce electricity?
(A) 10 (B) 20 (C) 50 (D) 100 (E) 200
22.39 (1, 5/03, Q.4) (2.5 points) The time to failure of a component in an electronic device has an
exponential distribution with a median of four hours. Calculate the probability that the component will
work without failing for at least five hours.
(A) 0.07 (B) 0.29 (C) 0.38 (D) 0.42 (E) 0.57
22.40 (CAS3, 11/03, Q.17) (2.5 points) Losses have an Inverse Exponential distribution.
The mode is 10,000. Calculate the median.
A. Less than 10,000
E. At least 25,000
22.41 (SOA3, 11/03, Q.34 & 2009 Sample Q.89) (2.5 points) You are given:
(i) Losses follow an exponential distribution with the same mean in all years.
(ii) The loss elimination ratio this year is 70%.
(iii) The ordinary deductible for the coming year is 4/3 of the current deductible.
Compute the loss elimination ratio for the coming year.
(A) 70% (B) 75% (C) 80% (D) 85% (E) 90%
22.42 (CAS3, 5/04, Q.20) (2.5 points)

Losses have an exponential distribution with a mean of 1,000.
There is a deductible of 500.
The insurer wants to double the loss elimination ratio.
Determine the new deductible that achieves this.
A. 219 B. 693 C. 1,046 D. 1,193 E. 1,546
22.43 (CAS3, 11/05, Q.20) (2.5 points)

Losses follow an exponential distribution with parameter θ.
For a deductible of 100, the expected payment per loss is 2,000.
Which of the following represents the expected payment per loss for a deductible of 500?
A. θ
B. θ(1 - e-500/θ)
C. 2,000 e-400/θ
D. 2,000 e-5/θ
E. 2,000 (1 - e-500/θ) / (1 - e-100/θ)
22.44 (4, 11/06, Q.26 & 2009 Sample Q.269) (2.9 points) The random variables X1 , X2 , ... , Xn ,
are independent and identically distributed with probability density function f(x) = e-x/θ/θ, x ≥ 0.
Determine E[ X 2 ].
n+ 1 2 n+ 1 2 θ2 θ2
(A) θ (B) θ (C) (D) (E) θ2
n n2 n n
22.1. E. An exponential with θ = 1/8; mean = θ = 0 .125.
22.2. C. An exponential with θ = 1/8; F(x) = 1 - e-x/θ = 1 - e-8x.
At the median: F(0.5) = 0.5 = 1 - e-8x. ⇒ x = -ln(0.5)/8 = 0.0866.
22.3. A. The mode of the exponential is always zero.

(The density 8e-8x decreases for x > 0 and thus attains its maximum at x=0.)
22.4. C. An exponential with 1/θ = 8 ; F(x) = 1 - e -x/θ = 1 - e-8x.

1- F(0.3) = e-(8)(0.3) = e-2.4 = 0.0907.
22.5. B. An exponential with 1/θ = 8 ; variance = θ2 = 0.015625.
22.6. D. An exponential always has a coefficient of variation of 1.

The C.V. = standard deviation / mean = (0.015625)0.5 / 0.125 = 1.
22.7. D. An exponential always has skewness of 2. Specifically the moments are:

µ1 = (1!) θ1 = 1/8 = 0.125. µ2 ′ = (2!) θ2 = 2 / 82 = 0.03125. µ3 ′ = (3!) θ3 = 6 / 83 = 0.01172.
Standard Deviation = (0.03125 - 0.1252 )0.5 = 0.125.

Skewness = {µ3 ′ - (3 µ1 µ2 ′) + (2 µ1 3 )} / STDDEV3 =
{0.01172 - (3)(0.125)(0.03125)) + (2)(0.125)3 )} / 0.1253 = 0.0039075 / 0.001953 = 2.00.
22.8. E. An exponential always has kurtosis of 9. Specifically the moments are:

µ1 = (1! ) θ1 = θ. µ2 ′ = (2!) θ2 = 2θ2 . µ3 ′ = (3!) θ3 = 6θ3 . µ4 ′ = (4!) θ4 = 24 θ4 .
Standard Deviation = (2θ2 - θ2 )0.5 = θ. µ4 = µ4 ′ - (4 µ1 µ3 ′) + (6 µ1 2 µ2 ′) - 3µ1 4 =
24 θ4 - (4)(θ)(6θ3 ) + (6)(θ2 )(2θ2 ) - (3)θ4 = 9θ4 . kurtosis = µ4 / STDDEV4 = 9θ4 / θ4 = 9.
22.9. B. After truncating and shifting from below, one gets the same Exponential Distribution with
θ = 135, due to its memoryless property.
The density is: e-x/135/135, which at x = 60 is: e-60/135/135 = 0.00475.
22.10. A. For this exponential distribution, F(x) = 1 - e-0.1x. F(17) = 1- e-(0.1)(17) = 0.817.
The chance that all eight claims will be less than or equal to 17, is: F(17)8 = 0.8178 = 19.9%.
Comment: This is an example of an order statistic. The maximum of the 8 claims is less than or equal
to 17 if and only if each of the 8 claims is less than or equal to 17.
22.11. B. F(x) = e-θ/x. F(3) = e-10/3 = 0.036.
22.12. B. The probability of death at time t, is the density of the Exponential Distribution:
f(t) = e-t/θ /θ. The present value of a payment of one at time t is e−δt .
Therefore, the actuarial present value of this insurance is:
∞ ∞
∫0 e- δ t e- t / θ / θ dt = (1/θ) ∫0 e- (δ + 1/ θ)t dt = (1/θ) / (δ + 1/θ) = 1 / (1 + δθ).
22.13. B. After truncating and shifting from below, one gets the same Exponential Distribution with
θ = 25, due to its memoryless property. The variance is θ2 = 252 = 625.
22.14. A. After truncating and shifting from below, one gets the same Exponential Distribution with
θ = 31, due to its memoryless property. Thus the nonzero payments are Exponential with θ = 31,
with mean 31 and variance 312 . The probability of a nonzero payment is the probability that a loss
is greater than the deductible of 10; S(10) = e-10/31 = 0.7243. Thus the payments of the insurer can
be thought of as an aggregate distribution, with Bernoulli frequency with mean 0.7243 and
Exponential severity with mean 31. The variance of this aggregate distribution is:
(Mean Freq.)(Var. Sev.) + (Mean Sev.)2 (Var. Freq.) =
(0.7243)(312 ) + (31)2 {(0.7243)(1 - 0.7243)} = 888.
22.15. The survival function of Y is: Prob[all N losses > y] = S(y)N = (e-x/θ)N = e-xN/θ.
The distribution of Y is Exponential with mean θ/N.
22.16. D. Given a disability of length t, the present value of an annuity certain is:
(1-e-δt)/δ. The expected present value is the average of this over all t:
∞ ∞ ∞
∫0 {(1- e- δ t) / δ} f(t) dt = ∫0 {(1- e- δ t) / δ} (e- t / θ / θ) dt ∫0

= (1/δ) e- t / θ / θ - e - t(δ + 1/ θ) / θ dt =
(1/δ) {1 - (1/(δ + 1/θ))/θ} = (1/δ) {1 - (1/( 1 + δθ)} = θ / (1 + δθ).
22.17. D. For an Exponential, f(x) = e-x/θ/θ. f(0) = 1/θ. Thus 1/10 = 1/θ. ⇒ θ = 10.
Third moment is: 6θ3 = 6000.
22.18. B. Let Y = X - 100. Then Y is Exponential with mean θ = 500.

E[X] = E[Y] + 100 = 500 + 100 = 600. Var[X] = Var[Y] = 5002 = 250,000.
The mean total value of gifts is: (200)(600) = 120,000.
The variance of the total value of gifts is: (200)(250,000) = 50,000,000.
Prob[gifts ≤ 110,000] ≅ Φ[(110,000 - 120,000)/ 50,000,000 ] = Φ[-1.41] = 7.9%.
Comment: The distribution of the size of gifts is a Shifted Exponential.
22.19. 0.25 = 1 - exp[-Q0.25 / θ]. ⇒ Q0.25 = θ ln[4/3].
0.5 = 1 - exp[-Q0.5 / θ]. ⇒ Q0.5 = θ ln[2].
0.75 = 1 - exp[-Q0.75 / θ]. ⇒ Q0.75 = θ ln[4].
Interquartile range = Q0.75 - Q0.25 = θ ln[4] - θ ln[4/3] = θ ln[3] = 1.0986θ.

(θ ln(4) - θ ln(2)) - (θ ln(2) - θ ln(4 / 3))
Quartile Skewness Coefficient = = ln(4/3) / ln(3) = 0.262.
θ ln(3)
The skewness of any Exponential Distribution is 2.
Specifically, the third central moment is: E[X3 ] - 3E[X2 ]E[X] + 2E[X]3 = 6θ3 - (3)(2θ2)(θ) + 2θ3 = 2θ3.
The variance is θ2. Thus the skewness is: 2θ3 / (θ2)3/2 = 2.

Comment: The first quartile is also called the lower quartile, while the 3rd quartile is also called the
upper quartile.
The Quartile Skewness Coefficient as applied to a small sample of data would be a robust
estimator of the skewness of the distribution from which the data was drawn; it would be not be
significantly affected by unusual values in the sample, in other words by outliers.
∞ θ ∞
22.20. ∫0 | x - θ | f(x) dx = ∫0 (θ - x) e- x / θ / θ dx + ∫θ (x - θ) e- x / θ / θ dx =
θ ∞ ∞ θ
θ ∫0 e - x / θ / θ dx - θ ∫θ e - x / θ / θ dx + ∫θ x e - x / θ / θ dx - ∫0 x e - x / θ / θ dx =
x= ∞ x =θ
θ (1 - e-1) - θ e-1 + (-x exp[-x / θ] - θ exp[-x / θ]) ] - (-x exp[-x / θ] - θ exp[-x / θ]) ] =
x =θ x= 0
θ (1 - 2e-1) + 2θe-1 + (2θe-1 - θ) = 2θ e- 1 = 0.7358 θ.
22.21. Due to its memoryless property, the future lifetime follows the original Exponential.
S(5000) = e-5000/28,700 = 84.0%.
22.22. F(3) = 1 - exp[-3 ln(10)] = 1 - 10-3 = 0.999. F(5) = 1 - exp[-5 ln(10)] = 1 - 10-5 = 0.99999.
F(5) - F(3) = 0.99999 - 0.999 = 0.00099. S(6) = exp[-6 ln(10)] = 10-6.
The annual rate of earthquakes of magnitude at least 3 and less than 5 is: 150/20 = 7.5.
Thus the inferred annual rate of all earthquakes is: 7.5/0.00099.
Thus the expected number of large earthquakes next year is:
(7.5/0.00099) (10-6) = 0.76%.
Comment: Based on the Gutenberg-Richter law.
I have ignored the small probability of two large earthquakes in one year.
22.23. B. θ22 = 2.25. ⇒ θ2 = 1.25. Given X1 = t, Prob[X2 > t] = e-t/1.5.

∞
Prob[X2 > X1] = ∫0 e− t /1.5 e− t /9.5 / 9.5 dt = (1/9.5)/(1/1.5 + 1/9.5) = 3/22.
Comment: This is mathematically equivalent to two independent Poisson Processes, with
λ 1 = 1/9.5 and λ2 = 1/1.5. The probability of observing an event from the first process before the
second process is: λ1 / (λ1 + λ2) = (1/9.5) / (1/9.5 + 1/1.5) = 3/22.

See “Mahlerʼs Guide to Poisson Processes,” on CAS Exam ST.
22.24. B. The median is where F(x) = 0.5. F(x) = 1 - e-10x. Therefore solving for x,
the median = -ln(.5) / 10 = ln(2) / 10. The mode is that point where f(x) is largest. Since f(x) declines
for x ≥0, f(x) is at its maximum at x = 0. Therefore, the mode is zero.
22.25. D. F(x) = 1 - e−λx. F(1/3) = 0.5. ⇒ 0.5 = e−λ/3. ⇒ λ = 3 In(2).
22.26. D. For the Exponential, the hazard rate is given as 0.5, and therefore θ = 1/0.5 = 2.
Variance of the Exponential is: θ2 = 4. Variance of the Uniform is: ω2/12. ω2/12 = (3)(4). ⇒ ω = 12.
22.27. A. The sum of 100 Exponential distributions has mean (100)(1/2) = 50, and variance
100
(100)(1/22 ) = 25. P[ ∑ Xi > 57] ≅ 1 - Φ[(57 - 55)/5] = 1 - Φ[1.4] = 0.0808.
i =1
22.28. B. 0.25 = F(x) = 1 - e-x/2. ⇒ x = -2ln(.75) = 2ln(4/3) = ln(16/9).
Comment: F(ln(16/9)) = 1 - exp[-ln(16/9)/2] = 1 - 9 / 16 = 1 - 3/4 = 1/4.
22.29. C. If the loss occurs prior to t = 2, then since the insurer has less than 100 in assets, the
probability of ruin is 100%. If the loss occurs subsequent to t = 12, then since the insurer has assets
of more than 500, the probability of ruin is 0. If the loss occurs at time 2 ≤ t ≤ 12, then the insurer has
assets of 20 + 40t, and the probability of ruin is: {500 - (20 + 40t)}/400 = (12 - t)/10.
Adding up the different situations, using that the time of loss is exponentially distributed, the
12 12
probability of ruin is: F(2) + ∫2 f(t) (12 - t) / 10 dt + 0 = (1 - e-0.2) + ∫2 0.1e- 0.1t (12 - t) / 10 dt =
12
t = 12 t = 12
0.181 + (12 / 10)(-e- 0.1t ) ]
t= 2
+ 0.01 ∫2 t e- 0.1t dt = 0.181 + 0 .621 -0.01(te - 0.1t + e - 0.1t ) ]
t =2
= 0.802 - 0.320 = 0.482.

Alternately, if the loss is of size 100, there is ruin if the loss occurs at time t < 2, which has probability:
1- e-0.2.
If the loss is of size 500, there is ruin if the loss occurs at time t < 12, which as probability: 1 - e-1.2.
If the loss is of size x, then there is ruin if the loss occurs prior to time (x - 20)/40,
since at t = (x - 20)/40 the assets are: 20 + 40(x-20)/40 = x.
Thus for a loss of size x, the probability of ruin is: 1 - exp[-.1(x-20)/40] = 1 - e-(x-20)/400.
The losses are uniformly distributed from 100 to 500, so the overall probability of ruin is:
500
t = 500
∫ (1 - e- (x - 20) / 400 ) (1/ 400) dx = x / 400 - e- (x - 20) / 400 ]
t = 100
= (1.2 - 0.2) - ( e-1.2 - e-0.2) =
100
0.482.
22.30. E. E[X1 X2 ] = E[X1 ]E[X2 ] = θ1θ2.
22.31. C. Given that the first claim is of size x, the probability that the second will be more than
twice as large is 1 - F(2x) = S(2x) = e-2x. The overall average probability is:
∞ ∞ ∞
∫0 (Probability given x) f(x) dx = ∫0 e- 2x e - x dx = ∫0 e- 3x dx = 1/3.
22.32. D. The density of the time of failure is: f(t) = e-t/10/10.

7 7 7
∫0 v(t) f(t) dt = ∫0 e ∫0 e- 0.3t dt =

(7 - 0.2t)
Expected payment is: e - t / 10 / 10 dt = 0.1 e7
0.1 e7 (1 - e- 2.1) 0.3 = 320.78.
22.33. D. Prob[fails during the first year] = F(1) = 1 - e-1/2 = 0.3935.

Prob[fails during the second year] = F(2) - F(1) = e-1/2 - e-2/2 = 0.2386.
Expected Cost = 100{(200)(0.3935) + (100)(0.2386)} = 10,256.
22.34. C. This is an Exponential with 1/θ = 0.004. θ = 250. The median of this Exponential is:
250 ln(2) = 173.3, which since it is less than 250 is also the median benefit.
22.35. E. By the memoryless property of the Exponential Distribution, the non-zero payments
excess of a deductible are also an Exponential Distribution with mean 300. Thus the 95th percentile
of the nonzero payments is: -300 ln(1 - 0.95) = 899. Adding back the 100 deductible, the 95th
percentile of the losses that exceed the deductible is: 999.
22.36. D. Prob[fails during the first year] = F(1) = 1 - e-1/10 = 0.09516.

Prob[fails during the second or third year] = F(3) - F(1) = e-1/10 - e-3/10 = 0.16402.
Expected Cost = 0.09516x + 0.16402x/2 = 1000. ⇒ x = 5644.
22.37. D. max(T, 2) + min(T, 2) = T + 2.

E[max(T, 2)] = E[T+ 2] - E[min(T, 2)] = E[T] + 2 - E[T ∧ 2] = 3 + 2 - 3(1 - e-2/3) = 2 + 3e- 2 / 3.
22.38. E. Each Exponential has variance 102 = 100.

The variances of independent variables add: 100 + 100 = 200.
Comment: The total time is Gamma with α = 2, θ = 10, and variance (2)(102 ) = 200.
22.39. D. Median = 4 ⇒ 0.5 = 1 - e-4/θ. ⇒ θ = 5.771. S(5) = e-5/5.771 = 0.421.
22.40. E. The mode of the Inverse Exponential is θ/2. θ/2 = 10000 ⇒ θ = 20000.
To get the median: F(x) = e-θ/x = e-20000/x = .5. ⇒ x = 28,854.
Comment: One could derive the mode: f(x) = θe−θ/x/x2 . fʼ(x) = -2f(x)/x + f(x)θ/x2 = 0. ⇒ x = θ/2.
22.41. C. LER(x) = E[X ∧ x]/E[X] = θ(1 - e-x/θ)/θ = 1 - e-x/θ.
LER(d) = 1 - e-d/θ = 0.7. ⇒ d = 1.204θ.
LER(4d/3) = LER(1.605θ) = 1 - e-1.605θ/θ = 1 - e-1.605 = 80.0%.
22.42. E. For the Exponential, LER[x] = E[X ∧ x]/E[X] = 1 - e-x/θ.
1 - e-500/1000 = 0.3935. We want: 1 - ed/1000 = (2)(0.3935) = 0.7869. ⇒ d = 1546.
22.43. C. Due to the memoryless property of the Exponential, the expected payment per
payment is θ, regardless of the deductible.
Therefore, for a deductible of d, the expected payment per loss is: S(d)θ = θe-d/θ.
Thus 2000 = e-100/θ θ. ⇒ θ = 2000e100/θ.

Therefore, the expected payment per loss for a deductible of 500 is:
θe-d/θ = 2000e100/θ e-500/θ = 2,000 e-400/θ .
Alternately, the expected payment per loss is:
E[X] - E[X ∧ d] = θ - θ(1 - e-d/θ) = θe-d/θ. Proceed as before.
Comment: One could solve numerically for θ with result θ = 2098.
22.44. A. Xi is Exponential. ΣXi is Gamma with α = n and θ.
X = ΣXi/n is Gamma with α = n and θ/n.
Therefore X has 2nd moment: (θ/n)2 (n)(n + 1) = {(n+1)/n} θ2.
Alternately, E[ X 2 ] = Var[ X ] + E[ X ]2 = Var[X]/n + E[X]2 = θ2/n + θ2 = {(n+1)/n} θ2.
Alternately, for i = j, E[XiXj] = E[X2 ] = 2θ2. For i ≠ j, E[XiXj] = E[Xi] E[Xj] = E[X]2 = θ2.
E[ X 2 ] = E[ΣXi/n ΣXj/n] = Σ Σ E[XiXj]/n2 = {(n)(2θ2) + (n2 - n)θ2}/n2 = {(n+1)/n} θ2.

2016-C-2, Loss Distributions, §23 Single Parameter Pareto HCM 10/21/15, Page 255
Section 23, Single Parameter Pareto Distribution
The Single Parameter Pareto Distribution is described in Appendix A.5.1.4 of Loss Models.
It is not the same as the Pareto distribution described in Appendix A.2.3.1 of Loss Models.100
The Single Parameter Pareto applies to a size of claim distribution above a lower limit θ > 0.101
⎛θ⎞ α
F(x) = 1 - ⎜ ⎟ , x > θ.
⎝x⎠
Note that F(θ) = 0.

α θα
f(x) = , x > θ.
xα + 1
Since this single parameter distribution is simple to work with it is very widely used by
actuaries in actual applications involving excess losses or layers of loss.102 It also has appeared
in many past exam questions.
Exercise: What is the limited expected value for the Single Parameter Pareto Distribution for α ≠ 1?
x x
α θα ⎛θ ⎞ α x
1
[Solution: E[X ∧ x] = ∫ y f(y) dy + x S(x) = ∫ y α + 1 dy + x ⎜ ⎟ = αθ ∫ α dy + θα x1− α =
α
θ θ y ⎝x ⎠ θ y
x1- α - θ1 - α −α θα αθ θα αθ θ
α
αθα + θα x1− α = + + = - .
1 −α (α − 1) xα −1 α −1 xα −1 α − 1 (α - 1) xα-1
α θk k θα
Comment: In Appendix A of Loss Models, E[(X ∧ x)k] = - .
α - k (α - k) xα - k
αθ θα
For k = 1, E[X ∧ x] = - , matching the above formula.]
α - 1 (α - 1) xα - 1
100
If one takes F(x) = 1 - {(β+θ)/(x+θ)}α for x > β, then one gets a distribution function of which the “Pareto” and
“Single Parameter Pareto” are each special cases.
One gets the former “Pareto” for beta = 0 and the latter “Single Parameter Pareto” for theta = 0.
101
The Single Parameter Pareto is designed to work directly with data truncated from below at θ.
See “Mahlerʼs Guide to Fitting Loss Distributions.”
102
See “A Practical Guide to the Single Parameter Pareto Distribution,” by Stephen W. Philbrick, PCAS LXXII, 1985,
pp. 44.
Exercise: Using the formula for the Limited Expected Value, what is the mean excess loss, e(x), for
the Single Parameter Pareto Distribution?
αθ αθ θα
- { - }
E[X] - E[X ∧ x] α - 1 α - 1 (α -1) xα - 1 x
[Solution: e(x) = = = .]
S(x) ⎛ θ⎞ α α - 1
⎝ x⎠
Single Parameter Pareto Distribution
Support: x > θ Parameter: α > 0 (shape parameter)
⎛θ⎞ α
D. f. : F(x) = 1 - ⎜ ⎟
⎝x⎠
α θα α ⎛ θ⎞ α + 1
P. d. f. : f(x) = α + 1 = ⎜ ⎟
x θ ⎝ x⎠
α θn
Moments: E[Xn] = , α >n
α−n
αθ α θ2
Mean = ,α>1 Second Moment = ,α > 2
α −1 α-2
α θ2
Variance = ,α > 2
(α - 1)2 (α - 2)
1 2(α + 1) α −2
Coefficient of Variation = ,α > 2 Skewness = ,α>3
α(α − 2) α−3 α
Mode = θ Median = θ 21/α

α
αθ θ
Limited Expected Value Function: E[X ∧ x] = - ,α>1
α − 1 (α - 1) xα-1
R(x) = Excess Ratio = (1/α) (x/θ)1−α , α > 1, x > θ

e(x) = Mean Excess Loss = x / (α-1) α>1
∂F(x)
Derivatives of d.f.: = -(θ/x)α ln(x / θ)
∂α
Method of Moments: α = m1 / (m1 − θ) Percentile Matching: α = - ln(1-p1 ) / ln (x1 / θ)
N
Method of Maximum Likelihood: α =
∑ ln[xi / θ]
Probability density function of a Single Parameter Pareto with θ = 1000 and α = 2:
f(x)
0.0020
0.0015
0.0010
0.0005
x
2000 3000 4000 5000
Why Not All Moments Exist:
Assume that we have a Single Parameter Pareto Distribution with α = 1 and θ = 1:
f(x) = α x-(α+1) θα = 1 / x2 , x > 1.
∞ ∞
E[X] = ∫1 x f(x) dx = ∫1 (1/ x) dx = ln(∞) - ln(1) = ∞.
Since the density goes to zero slowly as x approaches infinity, this distribution has no finite mean.
If instead α = 2, there would be a finite mean, but no finite second moment.

∞ ∞
For α = 2, E[X2 ] =
∫1 x2 f(x) dx = ∫1 x2 (2 / x3) dx = 2ln(∞) - 2ln(1) = ∞.
In contrast, for the Exponential Distribution, f(x) = e-x/θ / θ, which goes to zero very quickly as x
approaches infinity. Thus the Exponential Distribution has all of its moments.
Problems:
Use the following information for the next nine questions:

X has the probability density function: f(x) = 607.5 x-3.5, x ≥ 9.
23.1 (1 point) Which of the following is the mean of X?

A. less than 12
E. at least 18
23.2 (1 point) Which of the following is the median of X?

A. less than 12
E. at least 18
23.3 (1 point) Which of the of following is the mode of X?

A. less than 10
E. at least 13
23.4 (1 point) What is the chance that X is greater than 30?

A. less than 1%
E. at least 4%
23.5 (2 points) What is the variance of X?

A. less than 170
E. at least 230

A. less than 1
D. at least 3
E. Can not be determined

A. less than 0
D. at least 4
E. Can not be determined
23.8 (3 points) What is the Limited Expected Value at 20?

A. less than 8
E. at least 14
23.9 (2 points) What is the Excess Ratio at 20?

A. less than 9%
E. at least 12%
23.10 (3 points) The large losses for Pickwick Insurance are given by X:
f(x) = 607.5 x-3.5, x ≥ 9.
Pickwick Insurance expects 65 such large losses per year.
Pickwick Insurance reinsures the layer of loss from 20 to 30 with Global Reinsurance.
How much does Global Reinsurance expect to pay per year for losses from Pickwick Insurance?
A. less than 20
E. at least 50
23.11 (2 points) You are modeling the distribution of the diameters of those meteors that have
diameters greater than 1 meter and that hit the atmosphere of the Earth.
If you use a Single Parameter Pareto Distribution for the model, what are the possible reasonable
values of α and θ?
23.12 (2 points) X follows a Single Parameter Pareto Distribution.

What is the expected value of ln(X/θ)?
A. 1/(α - 1) B. 1/α C. θ/(α - 1) D. θ/α E. αθ/(α - 1)
23.13 (2 points) You are given the following graph of a Single Parameter Pareto Distribution.
Density
0.06
0.05
0.04
0.03
0.02
0.01
x
50 100 150 200
What is the variance of this Single Parameter Pareto Distribution?
A. less than 1600
E. at least 2200
23.14 (3 points) The Pareto principle, named after economist Vilfredo Pareto, states that for many
phenomena, 80% of the consequences stem from 20% of the causes.
If the size of loss follows a Single Parameter Pareto Distribution, for what value of α is it the case that
80% of the aggregate losses are expected to come from the largest 20% of the loss events?
A. less than 1.1
E. at least 1.4
1 - (θ / x)α
23.15 (3 points) F(x) = , θ ≤ x ≤ T, α > 0, θ > 0.
1 - (θ / T)α
Determine E[Xk], for k > 0.
23.16 (3 points) Assume a Single Parameter Pareto Distribution with α = 1.

(a) Derive the formula for the Limited Expected Value.
(b) For y > x > θ, and c > 1, compare the expected losses in the layer from x to y
to the expected losses in the layer from cx to cy.
23.17 (2 points) Losses X follow a single-parameter Pareto distribution with α = 4 and θ = 6000.
Calculate E[(X - 10,000)+].
A. 350 B. 450 C. 550 D. 650 E. 750
23.18 (4, 5/86, Q.60) (1 point) Given a Single Parameter Pareto distribution
F(x; c, α) = 1 - (c/x)α, for x > c, for a random variable x representing large losses.
Which of the distribution functions shown below represents the distribution function of x truncated
from below at d, d > c?
A. 1 - {(c - d)/x}α x>c-d B. 1 - {(c + d)/x}α x > c + d
C. 1 - (d/x)α x>d D. 1 - {(d - c)/x}α x > d - c
E. 1 - {d/(x - d)}α x>d
23.19 (4, 5/89, Q.25) (1 point) The distribution function of the random variable X is
F(x) = 1 - 1/x2 , x ≥ 1.
Which of the following are true about the mean, median, and mode of X?
A. mode < mean < median B. mode < median < mean
C. mean < mode < median D. median < mean and the mode is undefined
23.20 (4, 5/90, Q.30) (1 point) Losses, denoted by T, have the probability density function:
f(T) = 4 T-5 for 1 ≤ T < ∞ .
What is the coefficient of variation of T?
A. 1/8 B. 1/4 C. 2 /4 D. 3/4 E. 3 2 /4
f(T) = 4 T-5 for 1 ≤ T < ∞ .
What is the coefficient of skewness of T?
A. 5 B. 5 2 C. 20/27 D. 5/9 E. 5 2 /9
f(T) = 4 T-5 for 1 ≤ T < ∞.
What is the actual 95th percentile of T?
A. less than 2.25
E. at least 3.00
23.23 (4B, 5/92, Q.15) (2 points) Determine the coefficient of variation of the claim severity
distribution f(x) = (5/2) x-7/2, x > 1.
A. Less than 0.70
E. At least 1.15
23.24 (1, 5/00, Q.34) (1.9 points) An insurance policy reimburses a loss up to a benefit limit of 10.
The policyholderʼs loss, Y, follows a distribution with density function: f(y) = 2/y3 , y > 1.
What is the expected value of the benefit paid under the insurance policy?
(A) 1.0 (B) 1.3 (C) 1.8 (D) 1.9 (E) 2.0
23.25 (1, 11/00, Q.25) (1.9 points) A manufacturerʼs annual losses follow a distribution with density
function f(x) = 2.5 (0.62.5) / x3.5, x > 0.6.
To cover its losses, the manufacturer purchases an insurance policy with an annual
deductible of 2.
What is the mean of the manufacturerʼs annual losses not paid by the insurance policy?
(A) 0.84 (B) 0.88 (C) 0.93 (D) 0.95 (E) 1.00
23.26 (1, 5/01, Q.39) (1.9 points) An insurance company insures a large number of homes.
The insured value, X, of a randomly selected home is assumed to follow a distribution with density
function f(x) = 3x-4, x > 1. Given that a randomly selected home is insured for at least 1.5, what is
the probability that it is insured for less than 2?
(A) 0.578 (B) 0.684 (C) 0.704 (D) 0.829 (E) 0.875
23.27 (3, 11/01, Q.37 & 2009 Sample Q.103) (2.5 points)
For watches produced by a certain manufacturer:
(i) Lifetimes follow a single-parameter Pareto distribution with α > 1 and θ = 4.
(ii) The expected lifetime of a watch is 8 years.
Calculate the probability that the lifetime of a watch is at least 6 years.
(A) 0.44 (B) 0.50 (C) 0.56 (D) 0.61 (E) 0.67
23.28 (1, 5/03, Q.22) (2.5 points) An insurer's annual weather-related loss, X, is a random variable
with density function f(x) = 2.5 2002.5/x3.5, x > 200.
Calculate the difference between the 30th and 70th percentiles of X .
(A) 35 (B) 93 (C) 124 (D) 231 (E) 298
23.1. C. Single Parameter Pareto Distribution with θ = 9 and α = 2.5.

Mean = {α / (α - 1)} θ = (2.5 / 1.5)(9) = 15.
Alternately, one can integrate xf(x) from 9 to ∞.
23.2. A. Single Parameter Pareto Distribution with θ = 9 and α = 2.5.
F(x) = 1 - (x / θ)-α = 1 - (x / 9)-2.5. At the median we want F(x) = 0.5: 0.5 = (x / 9)-2.5.
Therefore, x = (9) 0.5-1/2.5 = 11.9.
23.3. A. The mode of the Single Parameter Pareto Distribution is always θ which in this case is 9.
(The density decreases for x > θ and thus attains its maximum at x=θ.)
23.4. E. F(x) = 1 - (x / θ)-α = 1 - (x / 9)-2.5. 1 - F(30) = (30/9)-2.5 = 0.049.
23.5. B. Single Parameter Pareto Distribution with θ = 9 and α = 2.5.
Variance = (α / { (α - 2) (α − 1)2 }) θ2 = (92 )(2.5) / {(0.5)(1.52 )} = 180.

Alternately, one can compute the second moment as the integral of x2 f(x) from 9 to ∞.
∞ ∞ ∞ x=
∫9 x2 f(x) dx =
∫9 x2 607.5 x- 3.5 dx = 607.5
∫9 x - 1.5 dx = (-607.5 / 0.5) x - 0.5
x= 9
] = 1215 / 3 = 405.
Thus the variance is 405 - 152 = 405 - 225 = 180.
23.6. A. From the solutions to previous questions, the mean is 15 and the variance is 180, the
coefficient of variation is: 180 / 15 = 0.894.
Alternately, the coefficient of variation is 1/ α(α - 2) = 1/ (2.5)(2.5 - 2) = 0.894.
23.7. E. Single Parameter Pareto Distribution with θ = 9 and α = 2.5.

The skewness only exists α > 3, therefore in this case the skewness does not exist.
(If one tries to calculate the third moment by taking the integral of x2 f(x) from 9 to ∞, one gets infinity
due to evaluating x0.5 at ∞.)
23.8. D. Single Parameter Pareto Distribution with θ = 9 and α = 2.5.
E[X ∧ x] = θ [{α − (x/θ)1−α} / (α - 1)] .

E[X ∧ 20] = 9[ {2.5 -(20/9)-1.5}/1.5] = 13.19.
Alternately, one can compute the integral of xf(x) from θ to x, from 9 to 20:
20 20 20 x = 20
∫9 x f(x) dx =
∫9 x 607.5 x- 3.5 dx = 607.5
∫9 x - 2.5 dx = (-607.5 / 1.5) x- 1.5
x =9
]
= -405 (.01118 - .03704) = 10.473.

E[X ∧ 20] is the sum of the above integral plus 20{1 - F(20)} = 110.47 + 20(9/20)2.5 =
10.473 + 2.717 = 13.19.
R(x) = Excess Ratio = (1/α) (x/θ)1−α . R((20) = (1/2.5) (20/9)-1.5 = 12.1%.

Alternately, one can compute the integral of 1-F(x) = S(x) from 20 to ∞:
∞ ∞
x= ∞
∫ ∫ x - 2.5 dx = (-243) x - 1.5 / 1.5 ]
- 2.5
(x / 9) dx = 92.5 = (-162) (0 - 0.01118) = 1.811.
x = 20
20 20
This integral is the losses excess of 20. R(20) is the ratio of the above integral to the mean.
Mean = (α / (α − 1))θ = (2.5 / 1.5) (9) = 15. Thus R(20) = 1.811 / 15 = 12.1%.
Alternately, using previous solutions, R(20) = 1 - E[X ∧ 20]/E[X] = 1 - 13.19/15 = 12.1%.

E[X ∧ x] = θ [{α − (x/θ)1−α} / (α − 1)] .
E[X ∧ 30] = 9[ {2.5 -(30/9)-1.5}/1.5] = 14.01.
E[X ∧ 20] = 9[ {2.5 -(20/9)-1.5}/1.5] = 13.19.
65 large losses expected per year, so that the Reinsurer expects in the layer from 30 to 20:
65{ E[X ∧ 30] - E[X ∧ 20] } = (65)(14.01-13.19) = 53.
Comment: The Reinsurer only expects to make payments on (65)S(20) = (65)(20/9)-2.5 = 8.8
losses per year. Of these (65)S(30) = (65)(30/9)-2.5 = 3.2 are expected to exceed the upper limit
of the layer; on such claims the reinsurer pays the width of the layer or 10.
So on large losses of size greater than 30, the reinsurer expects to pay a total of: (10)(3.2) = 32.
The remaining 53 - 32 = 21 is expected to be paid on losses of size 20 to 30.
On average the pays for each such medium sized loss: 21 / (8.8 - 3.2) = 3.75.
23.11. Since the data is truncated from below at 1 meter, one takes θ = 1 meter.
The volume of a meteor is proportional to d3 . Therefore, assuming some reasonable average
density, the mass of a meteor is also proportional to d3 . The average mass of meteors hitting the
Earth should be finite. Therefore, the distribution of d should have a finite third moment.
Therefore, α > 3.
Comment: Beyond what you are likely to be asked on your exam. However, it is important to know
that the Single Parameter Pareto does not have all of its moments.
∞
α θα
23.12. B. E[ln(X/θ)] = ∫ ln(x / θ) f(x) dx = ∫θ ln(x / θ) (α + 1) dx .
x
Let y = α ln(x/θ). x = θey/α. dx = θey/α dy/α.

∞ ∞ ∞
α θα θα
∫θ ln(x / θ) (α + 1) dx =
x ∫0 y
(θ ey / α )(α + 1)
θ ey / α dy / α = (1/α) ∫0 y e- y dy = Γ(2)/α = 1/α.
α θα
23.13. C. For the Single Parameter Pareto, f(x) = α + 1 , x > θ.
x
Since the graph starts at 50, θ = 50.
f(θ) = α/θ. Therefore, 0.06 = α/θ. ⇒ α = (50)(0.06) = 3.

α θn
E[Xn ] = , α > n. E[X] = (3)(50)/2 = 75. E[X2 ] = (3)(502 )/1 = 7500.
α−n
Var[X] = 7500 - 752 = 1875.
23.14. B. The scale parameter θ will drop out of the analysis, so for convenience take θ = 1.
We want 20% of the aggregate losses to come from the smallest 80% of the loss events.
E[X] = θα/(α - 1) = α/(α - 1).
The 80th percentile is such that 0.2 = S(x) = 1/xα. ⇒ x = 51/α.
Dollars from those losses of size less than x = 51/α is:
E[X ∧ x] - xS(x) = α/(α - 1) - 1/{51/α(α - 1)} - 51/α(0.2) = {α/(α - 1)}(1 - 51/α/5).

The portion of the total dollars from those loss of size less than 80th percentile is the above divided
by the mean: 1 - 51/α/5.
Thus we want: 0.2 = 1 - 51/α/5. ⇒ 4 = 51/α. ⇒ α = 1.161.
Comment: For θ = 1 and α = 1.161, E[X] = 1.161/0.161 = 7.211,
the 80th percentile is: 51/1.161 = 4, and
E[X ∧ 4] = 1.161/0.161 - 1/{(0.161)(4.161)} = 2.242.
Dollars from those loss of size less than 4 is: 2.242 - (0.2)(4) = 1.442. 1.442/ 7.211 = 0.20.
α θα / xα+ 1
23.15. Differentiating, f(x) = .
1 - (θ / T)α
T T
α θα / xα +1 α θα α θα Tk - α - θk - α
E[Xk] = ∫θ xk
1 - (θ / T)α
dx =
1 - (θ / T)α ∫θ xk - α -1 dx =
1 - (θ / T)α k - α
α θk 1 - (θ / T)α - k
= , α ≠ k.
α - k 1 - (θ / T)α
Comment: Using lʼHopitalʼs Rule, as α approaches k, E[Xk] approaches ln(T/θ).

This is a Single Parameter Pareto Distribution truncated from above at T.
See “A Note on the Upper-Truncated Pareto Distribution,” by David R. Clark,
Winter 2013 CAS E-Forum.
23.16. (a) f(x) = θ/x2 , x > θ. S(x) = θ/x.

x x
θ
Therefore, fox x > θ, E[X ∧ x] = ∫θ x f(x) dx + x S(x) =
∫θ x 2 dx + (x) (θ/x) = θ ln(x/θ) + θ.
x
(b) E[X ∧ y] - E[X ∧ x] = {θ ln(y/θ) + θ} - {θ ln(x/θ) + θ} = θ ln(y/x).

E[X ∧ cy] - E[X ∧ cx] = {θ ln(cy/θ) + θ} - {θ ln(cx/θ) + θ} = θ ln[(cy)/(cx)] = θ ln(y/x).
The expected losses in the two layers are equal.
Comment: The Single Parameter Pareto Distribution with α = 1 has the property that expected
losses in layers are equal if the ratio of the top and the bottom of the layers are the same.
23.17. B. E[X] = αθ / (α-1) = (4)(6000) / 3 = 8000.
E[X ∧ x] = αθ / (α-1) - θα / {(α-1)xα-1}.

E[X ∧ 10,000] = 8000 - 60004 / {(3)(100003 )} = 8000 - 432 = 7568.
E[(X - 10,000)+] = E[X] - E[X ∧ 10,000] = 8000 - 7568 = 432.
Comment: E[X] - E[X ∧ x] = θα / {(α-1)xα-1}.
23.18. C. The distribution function for the data truncated from below at d is:
G(x) = (F(x)-F(d)/(1-F(d)) for x >d. In this case G(x) = ((c/d)α - (c/x)α) / (c/d)α
= 1 - (d/x)α for x >d.
23.19. B. The density f(x) = 2 x−3, x ≥ 1. Since the density declines for all x ≥ 1, it has its maximum
at x =1. The mode is 1. The mean is the integral from 1 to ∞ of xf(x) which is
∞ x =∞
∫1 2x- 2 dx = -2 / x]
x=1
= 2. Thus the mean = 2.
The median is such that F(x) = .5. Thus .5 = 1 - 1/x2 . median = 2 = 1.414.
Comment: A Single Parameter Pareto Distribution, with α = 2 and θ = 1.
Mean = {α / (α − 1)} θ = (2/1)(1) = 2. Mode = θ = 1. Median = θ 21/α = 21/2 = 1.414.

For a continuous distribution with positive skewness, such as the Single Parameter Pareto
Distribution, typically: mean > median > mode (alphabetical order.)
∞ ∞ ∞
x =∞
23.20. C. mean = ∫1 x f(x) dx = ∫1 x (4x- 5 ) dx = 4 ∫1 x- 4 dx = -4x- 3 /3]
x=1
= 4/3.
∞ ∞ ∞
x= ∞
second moment = ∫1 x2 f(x) dx = ∫1 x2 (4x- 5 ) dx = 4 ∫1 x- 3 dx = -4x- 2 /2
x =1
] = 2.
Thus the variance is: 2 - (4/3)2 = 2/9. The standard deviation is: 2 / 3.
Coefficient of Variation = Standard Deviation / Mean = ( 2 / 3) / (4/3) = 2 / 4.
Comment: A Single Parameter Pareto Distribution with α = 4 and θ = 1.
The CV = 1/ α(α - 2) = 1/ 8 = 2 / 4.
∞ ∞ ∞
x= ∞
23.21. B. third moment = ∫1 x3 f(x) dx = ∫1 x3 (4x- 5 ) dx = 4 ∫1 x -2 dx = -4x- 1 /1 ]
x =1
= 4.
Thus the skewness = {µ3 ′ - (3 µ1 ′ µ2 ′) + (2 µ1 ′3)} / STDDEV3 =
4 - {(3) (4 / 3)(2)} + {(2)(4 / 3)3 }

= {128/27 - 4} / {2 2 / 27} = 20 / (2 2 ) = 5 2 .
( 2 / 3)3
23.22. f(T) = 4 T-5 for T ≥ 1, so that taking the integral F(T) = 1 - T-4 for T ≥ 1.
At the 95th percentile 0.95 = F(T) = 1 - T-4. Therefore T = (1/.05)1/4 = 2.115.
∞ ∞ x= ∞
23.23. C. The mean is: ∫1 x f(x) dx =
∫1 x (5 / 2) x- 7 / 2 dx = - {(5 / 2) / (3 / 2)} x - 3 / 2 ]
x=1
= 5/3
∞ x =∞
The 2nd moment is ∫1 x2 (5 / 2) x - 7 / 2 dx = - {(5 / 2) / (1/ 2)} x - 1/ 2 ]
x =1
= 5
Thus the variance = 5 - (5/3)2 = 2.22. The coefficient of variation is: 2.22 / (5/3) = 0.894.
Comment: This is a Single Parameter Pareto Distribution, with parameters θ =1 and α = 5/2 = 2.5.
1 1
It has coefficient of variation equal to: = = 0.894.
α(α - 2) (2.5)(2.5 - 2)
23.24. D. The density of a Single Parameter Pareto Distribution is: αθα / xα+1, x > θ.
This is a Single Parameter Pareto Distribution with α = 2 and θ = 1.
E[X ∧ x] = αθ / (α-1) - θα/ {(α-1)xα−1}. E[X ∧ 10] = 2 - 1/10 = 1.9.
23.25. C. The mean annual losses not paid by the insurance policy is E[X ∧ 2].
This is a Single Parameter Pareto Distribution with α = 2.5 and θ = 0.6.
E[X ∧ x] = αθ/ (α-1) - θα/{(α-1)xα−1}. E[X ∧ 2] = (2.5)(0.6)/1.5 - (0.62.5)/{(1.5)21.5} = 0.9343.
23.26. A. F(x) = 1 - 1/x3 , x > 1.

Prob[X < 2 | X > 1.5] = (F(2) - F(1.5))/S(1.5) = (7/8 - 0.7037)/0.2963 = 0.578.
23.27. A. For the Single Parameter Pareto Distribution, E[X] = αθ/(α-1).
Therefore, 8 = 4α/(α-1) . ⇒ α = 2. S(x) = (θ/x)α. S(6) = (4/6)2 = 4/9 = 0.444.
23.28. B. F(x) = 1 - (200/x)2.5, x > 200. F(Π30) = 0.3. ⇒ Π30 = 200(1/0.7)1/2.5 = 230.7.
F(Π75) = 0.7. ⇒ Π70 = 200(1/0.3)1/2.5 = 323.7. Π70 - Π30 = 323.7 - 230.7 = 93.
2016-C-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/21/15, Page 271
Section 24, Common Two Parameter Distributions
For the exam it is important for the student to become familiar with the material in Appendix A of
Loss Models.103 Here are the four most commonly used Distributions with Two Parameters:
Pareto, Gamma, LogNormal, and Weibull.104
Pareto:
α is a shape parameter and θ is a scale parameter. Notice the factor of θn in the moments. The
Pareto is a heavy-tailed distribution. Higher moments may not exist.
The coefficient of variation (when it exists) is always greater than one; the standard deviation is
always greater than the mean.105 The skewness for the Pareto is always greater than twice
the coefficient of variation.
F(x) = 1 - ⎜ ⎟ = 1 - (1 + x / θ)−α. f(x) = = (α/θ)(1 + x / θ)−(α + 1)
⎝θ + x⎠ (θ + x)α + 1
θ α θ2
Mean = α>1 Variance = ,α>2
α −1 (α − 1)2 (α − 2)
2 θ2 n! θn n! θ n
E[X2 ] = , α>2 E[Xn ] = n = , α> n
(α − 1) (α − 2) (α − 1)...(α − n)
∏ (α − i)
i=1
α α +1 α−2
Coefficient of Variation = α>2 Skewness = 2 α>3
α−2 α−3 α
Mode = 0. Median = θ (21/α - 1).
θ ⎧ ⎛ θ ⎞ α − 1⎫ θ+ x
E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬, α > 1 Mean Excess Loss = ,α>1
α −1 ⎩ ⎝ θ + x⎠ ⎭ α −1
Loss Elimination Ratio = 1 - (1 + x / θ)1 − α, α > 1. Excess Ratio = (1 + x / θ)1 − α, α > 1

103
There are a few other distributions used by actuaries than those listed there, and the distributions are sometimes
parameterized in a different manner.
104
In my opinion. See a subsequent section for additional two parameter distributions in Loss Models.
105
This fact is also equivalent to the fact that for the Pareto E[X2 ] > 2 E[X]2 .
Hereʼs a graph of the density function of a Pareto Distribution with α = 3 and θ = 60:
f(x)
0.05
0.04
0.03
0.02
0.01
x
20 40 60 80 100
Exercise: Losses prior to any deductible follow a Pareto Distribution with parameters α = 1.7 and
θ = 30. A policy has a deductible of size 10.
What is the distribution of non-zero payments under that policy?
[Solution: After truncating and shifting by d, G(x) = 1 - S(x+d)/S(d) = 1 - S(x+10)/S(10) =
⎛ 30 ⎞ 1.7
1 - ⎜ ⎟
⎝ 30 + x + 10 ⎠ ⎛ 40 ⎞ 1.7
= 1 - ⎜ ⎟ .
⎛ 30 ⎞ 1.7 ⎝ 40 + x ⎠
⎜ ⎟
⎝ 30 + 10 ⎠
Comment: This is a Pareto Distribution with α = 1.7 and θ = 40.]
If losses prior to any deductible follow a Pareto Distribution with parameters α and θ, then after
truncating and shifting from below by a deductible of size d:
⎛ θ ⎞α
⎜ ⎟
⎝ θ + x + d⎠ ⎛ θ + d ⎞α
G(x) = 1 - S(x+d)/S(d) = 1 - = 1 - ⎜ ⎟ .
⎛ θ ⎞α ⎝θ + d + x ⎠
⎜ ⎟
⎝ θ + d⎠
If losses prior to any deductible follow a Pareto Distribution with parameters α and θ,
then after truncating and shifting from below by a deductible of size d, one gets another
Pareto Distribution, but with parameters α and θ + d.106
106
The form of an Exponential Distribution is also preserved under truncation and shifting from below. While for the
Exponential the parameter remains the same, for the Pareto the θ parameter becomes θ + d.
Exercise: Losses prior to any deductible follow a Pareto Distribution with parameters α = 1.7 and
θ = 30. What is the mean non-zero payment under a policy that has a deductible of size 10?
[Solution: The non-zero payments follow a Pareto Distribution with α = 1.7 and θ = 40,
with a mean of 40/(1.7-1) = 57.1. Alternately, the mean of the data truncated and shifted from below
is the mean excess loss for the original Pareto Distribution.
e(x) = (θ+x)/(α-1). e(10) = (30 + 10)/(1.7-1) = 57.1.]
Gamma:107 108
α is a shape parameter and θ is a scale parameter.109 Note the factors of θ in the moments.
For α = 1, the Gamma is an Exponential Distribution.
The Gamma always has well-defined moments and is thus not as heavy-tailed as other distributions
such as the Pareto.
The sum of two independent random variables each of which follows a Gamma distribution with the
same scale parameter, is also a Gamma distribution; it has a shape parameter equal to the sum of
the two shape parameters and the same scale parameter. Specifically the sum of n independent
identically distributed variables which are Gamma with parameters α and θ is a Gamma distribution
with parameters nα and θ.
The Gamma is infinitely divisible; if X follows a Gamma, then given any n >1 we can find a
random variable Y which also follows a Gamma, such that adding up n independent version of
Y gives X. Take n independent copies of a Gamma with parameters α/n and θ. Their sum is a
Gamma with parameters α and θ.
For a positive integral shape parameter, α = m, the Gamma distribution is the sum of m
independent variables each of which follows an Exponential distribution. Thus for α = 1,
we get an Exponential. The sum of two independent, identically distributed Exponential variables
follows a Gamma Distribution with α = 2. As α approaches infinity the Gamma approaches a
Normal distribution by the Central Limit Theorem. The Gamma has variance equal to αθ2, the sum of
α identical independent exponential distributions each with variance θ2.

107
The incomplete Gamma Function, which underlies the Gamma Distribution, is covered in “Mahlerʼs Guide to
Frequency Distributions.”
108
The Gamma Distribution is sometimes called a Pearson Type III Distribution.
109
In Actuarial Mathematics θ is replaced by 1/β.
(x / θ)α e- x/ θ x α -1 e - x / θ
F(x) = Γ(α; x/θ) f(x) = = .
Mean = αθ Variance = αθ2 Second moment = α(α +1)θ2
Γ(α + k)
E[Xk] = θk (α + k - 1) ... α , for k a positive integer. E[Xk] = θk , k > −α.
Γ(α)
Mode = θ(α - 1), α > 1. Mode = 0, α ≤ 1.
{ } {
Points of inflection: θ α − 1 ± α − 1 , α > 2; θ α − 1 + α − 1 , 2 ≥ α > 1. }
Coefficient of Variation = 1/ α Skewness = 2/ α = 2CV. Kurtosis = 3 + 6/α = 3+ 6CV2 .
The skewness for the Gamma distribution is always twice the coefficient of variation. Thus the
Gamma is likely to fit well only to data sets for which this is true.
The Kurtosis of a Gamma is always greater than 3, the kurtosis of a Normal Distribution.
As α goes to infinity, the Kurtosis of a Gamma goes to 3, the kurtosis of a Normal, since the Gamma
approaches a Normal. For a Gamma: 2 Kurtosis - 3 Skewness2 = 6.
Hereʼs a graph of the density function of a Gamma Distribution with α = 3 and θ = 10:
Prob.
0.025
0.02
0.015
0.01
0.005
Size
20 40 60 80 100
For α = 3, the Gamma is a peaked curve, skewed to the right .110
Note that while Iʼve only shown x ≤ 100, the density is positive for all x > 0.
Note that for α ≤ 1, rather than a peaked curve, we get a curve with mode of zero.111
Note that for very large alpha, one would get a curve much less skewed to the right.
Calculating the Distribution Function for a Gamma:
As mentioned previously in “Mahlerʼs Guide to Frequency,” one can write an Incomplete Gamma
function for integer α = n as the sum of Poisson probabilities for the number of events greater than or
equal to n:112
n-1 ∞
∑ ∑
xi e- x xi e- x
Γ(n ; x) = 1 - = .
i! i!
i=0 i=n
Now the Gamma Distribution Function is: F(x) = Γ(α; x/θ).

α-1
(x / θ)i e- x / θ
Thus for integer α, F(x) = 1 - ∑ i!
.
i=0
Exercise: For a Gamma Distribution with α = 4 and θ = 10, compute F(25) and F(50).
[Solution: F(25) = 1 - e-2.5 (1 + 2.5 + 2.52 /2 + 2.53 /6) = 0.2424.
F(50) = 1 - e-5 (1 + 5 + 52 /2 + 53 /6) = 0.7350.]
The Chi-Square Distribution with ν degrees of freedom is a Gamma Distribution with α = ν /2 and
θ = 2.113 Thus for alpha integer or half integer one can use the Chi-Square Table to determine
percentiles of the Gamma Distribution.
For example, as shown in the Chi-Square Table, the 90th percentile of a Chi-Square Distribution
with 7 degrees of freedom is 12.017. Thus for a Gamma Distribution with α = 7/2 and θ = 2,
F(12.017) = 0.9. Thus for a Gamma Distribution with α = 7/2 and θ = 10, F(60.085) = 0.9.114
110
This general description applies to the densities of most Loss Distributions.
111
For alpha =1, one gets an Exponential Distribution, with mode of zero.
112
113
The Chi-Square Test and Table are discussed in “Mahlerʼs Guide to Fitting Loss Distributions.” A Chi-Square
Distribution with ν degrees of freedom, is the sum of ν squares of independent unit Normal Distributions
114
(10/2)(12.017) = 60.085.
Exercise: Determine the 5th and 99th percentiles of a Gamma Distribution with α = 4.5 and θ = 100.
[Solution: A Chi-Square Distribution with 9 degrees of freedom is a Gamma Distribution with α = 4.5
and θ = 2. As shown in the Chi-Square Table, the 5th percentile is 3.325 and the 99th percentile is
21.666. Thus, for a Gamma Distribution with α = 4.5 and θ = 100,
the 5th percentile is: (100/2)(3.325) = 166.25,
and the 99th percentile is: (100/2)(21.666) = 1083.3.]
LogGamma:115
If ln[X] follows a Gamma Distribution, then X follows a LogGamma Distribution.
The distribution function of the LogGamma Γ[a ; ln(x)/θ] is just that of the Gamma Γ(α ; x/θ)
with ln(x) in place of x.
In order to derive the density of the LogGamma, one can differentiate the distribution function, but
must remember to use the chain rule and take into account the change of variables.
Let y = ln(x) . F(x) = Γ(α ; ln(x)/θ) = Γ(α ; y/θ).
Therefore, f(x) = dF/dx = (dF/dy)(dy/dx) = (density function of Gamma in terms of y) (1/x) =
{θ−αy α−1 e−y/θ / Γ(α)} / x = θ−α{ln(x)}α−1 e−ln(x)/θ /Γ(α)} / x =
θ−α {ln(x/θ)}α−1 / {x1+1/θ Γ(α) } , x > 1.
α is a shape parameter and θ is not exactly a scale parameter. For very large α the distribution
approaches a LogNormal distribution, (just as the Gamma approaches the Normal distribution.)
For α = 1, ln[X] follows an Exponential Distribution, and one gets a Single Parameter Distribution.
If one were to graph the size of loss distribution, but have the x-axis (the size of loss) on a
logarithmic scale, then the size of loss distribution would look much less skewed. If ln(x) followed a
Gamma, then x itself follows a LogGamma distribution. The LogGamma is much more skewed than
the Gamma distribution.
The product of independent LogGamma variables with the same θ parameter is a LogGamma with
the same θ parameter and the sum of the individual α parameters.116
115
The LogGamma is not in Appendix A of Loss Models, and is extremely unlikely to be asked about on your exam.
For more information on the LogGamma see for example Loss Distributions by Hogg & Klugman.
116
This follows from the fact that the sum of two independent random variables each of which follows a Gamma
distribution with the same scale parameter, is also a Gamma distribution; it has a shape parameter equal to the sum of
the two shape parameters and the same scale parameter.
LogNormal:
If one were to graph the size of loss distribution, but have the x-axis (the size of loss) on a
logarithmic scale, then the size of loss distribution would be much less skewed. If ln(x) follows a
(symmetric) Normal, then x itself follows a LogNormal.117
The product of a series of independent LogNormal variables is also LogNormal.118

The only condition necessary to produce a LogNormal Distribution is that the amount of an
observed value be the product of a large number of factors, each of which is independent of the
size of any other factor.119
Please note that µ is not the mean of the LogNormal nor is σ the standard deviation. Rather µ is the
mean and σ is the standard deviation of the Normal Distribution of the logs of the claim sizes. σ is a
shape parameter; note the way the CV and skewness only depend on σ.
As parameterized in Loss Models, the LogNormal Distribution does not have a scale parameter.
However, we can rewrite the Distribution Function:
F(x) = Φ[{ln(x)−µ} / σ] = Φ[{ln(x)−ln(eµ)} / σ] = Φ[{ln(x/eµ)} / σ] .
Thus since everywhere x appears in the distribution function it is divided by eµ, eµ would be
the scale parameter for the LogNormal. Thus if reparameterized in this way, the LogNormal
Distribution would have a scale parameter. Note the way that (eµ)n appears in the formula for
the moments of the LogNormal, another sign that eµ would be the scale parameter, if one
parameterized the distribution differently.
The LogNormal Distribution can also be used to model stock prices.120
( ln(x)
- µ)2
⎡ ln(x) − µ ⎤
F(x) = Φ ⎢ f(x) =
[
exp -
2σ2 ]
⎣ σ ⎥⎦ x σ 2π
Mean = exp[µ + σ2/2]
Second moment = exp[2µ + 2σ2] E[Xn ] = exp[nµ + n2 σ2/2] .

117
The LogNormal is less skewed than the LogGamma distribution, (because the Normal distribution is less skewed
than the Gamma distribution.)
118
Since the sum of independent Normal variables is also a Normal.
119
Quoted from “Sampling Theory in Casualty Insurance,” by Arthur L. Bailey, PCAS 1942 and 1943.
120
See Derivative Markets by McDonald, not on the syllabus of this exam.
Variance = exp(2µ + σ2) {exp(σ2) - 1} Coefficient of Variation = exp(σ2 ) - 1
exp(3σ2) - 3 exp(σ2) + 2
Skewness = = (3 + CV2 ) CV.
{exp(σ2 ) - 1}1.5
Mode = exp(µ - σ2) Median = exp(µ)
The relationships between the Gamma, Normal, LogGamma, and LogNormal Distributions are
shown below:121
α →∞
Gamma ⇒ Normal
y = ln(x)
⇓ ⇓ y = ln(x)
LogGamma ⇒ LogNormal
α →∞
Exercise: A LogNormal Distribution has parameters µ = 5 and σ = 1.5.

What are the mean and variance?
[Solution: Mean = exp[µ + σ2/2] = exp(5 +(1.52 )/2) = 457.145.
Second Moment = exp[2µ + 2σ2] = exp[10 + (2)(1.52 )] = 1,982,759.

Variance = 1,982,759 - 457.1452 = 1,773,777.]
The formula for the moments of a LogNormal Distribution follows from the formula for its mean.
If X is LogNormal with parameters µ and σ, then ln(X) is Normal with the same parameters.
Therefore, n ln(X) is Normal with parameters nµ and nσ.
Therefore, exp[n ln(X)] = Xn is LogNormal with parameters nµ and nσ.
Therefore, E[Xn ] = exp[nµ + (nσ)2 /2] = exp[nµ + n2 σ2/2].
Exercise: For a LogNormal Distribution with µ = 8.0064 and σ = 0.6368, what are the mean, median
and mode?
[Solution: Mean = exp[µ + σ2/2] = 3674. Median = exp(µ) = 3000. Mode = exp(µ − σ2) = 2000.]
121
A summary of the Normal Distribution appears in “Mahlerʼs Guide to Frequency Distributions.”
Here is a graph of this LogNormal Distribution, with µ = 8.0064 and σ = 0.6368:
density Mode
0.00025
Median
0.00020
Mean
0.00015
0.00010
0.00005
x
2000 3000 3674
The LogNormal is a heavy-tailed distribution, yet all of its (positive) moments exist.122
Its mode (place where the density is largest) is less than it median (place where the distribution
function is 50%), which in turn is less than its mean (average).
As σ increases, the LogNormal gets heavier-tailed.
For a LogNormal Distribution with σ = 2, what is the ratio of the mean to the median?
[Solution: Mean / Median = exp(µ + σ2/2) / exp(µ) = exp(σ2/2) = e2 = 7.39.]
For a LogNormal Distribution with σ = 2, what is the ratio of the median to the mode?
[Solution: Median / Mode = exp(µ) / exp(µ − σ2) = exp(σ2) = e4 = 54.6.]
For a LogNormal Distribution with σ = 2, what is the probability of a loss of size less than the mean?
[Solution: Mean = exp(µ + .5 σ2). F(mean) = Φ[(ln(mean) - µ)/σ)] = Φ[(µ + .5 σ2 - µ)/σ)] =

Φ[.5 σ] = Φ[1] = 84.13%.]
122
See the section on tails of distributions.
The LogNormal is the distribution with the heaviest tail such that all of its moments exist.
Weibull:123
τ is a shape parameter, while θ is a scale parameter.

τ = 1 is the Exponential Distribution.
⎛ x ⎞τ ⎡ ⎛ x ⎞ τ⎤
τ ⎜ ⎟ exp ⎢-⎜ ⎟ ⎥
⎡ ⎛ x ⎞ τ⎤ ⎝ θ⎠ ⎣ ⎝ θ⎠ ⎦ τ ⎡ ⎛ x⎞ τ⎤
F(x) = 1 - exp⎢-⎜ ⎟ ⎥ f(x) = = τ x τ -1 exp⎢- .
⎣ ⎝ θ⎠ ⎦ x θ ⎣ ⎝ θ⎠ ⎥⎦
⎛ τ - 1⎞ 1 / τ
The mode of a Weibull is: θ ⎜ ⎟ for τ > 1, and 0 for τ ≤ 1.
⎝ τ ⎠
The Weibull Distribution is a generalization of the Exponential. One applies a “power transformation”
to the size of loss and gets a new more general distribution with two parameters from the
Exponential Distribution with one parameter. So where x/θ appears in the Exponential Distribution,
(x/θ)τ appears in the Weibull Distribution.
Note that F(θ) = 1 - e-1 = 0.632.

Thus for any Weibull, including the special case of an Exponential, θ is the 63.2th percentile.
⎛ x ⎞τ
[ ]
S(x) = exp -⎜ ⎟ . Note that for large τ, the righthand tail can decrease very quickly since x is taken
⎝ θ⎠
to a power in the exponential.
123
This distribution is named after Ernst Hjalmar Waloddi Weibull.
Here is a graph of a Weibull Distribution with θ = 100 and τ = 2 (solid), compared to an Exponential
Distribution with θ = 100 (dashed):
Prob.
0.01
0.008
0.006
0.004
0.002
x
100 200 300 400 500
For τ sufficiently large the Weibull has a negative skewness.
For τ less than 1 the Mean Excess Loss increases, but for τ greater than 1 the Mean Excess Loss
decreases. For τ = 1 you get the Exponential Distribution, with constant Mean Excess Loss.
For large x the Mean Excess Loss is proportional to x1−τ.
The mean of the Weibull is: θ Γ(1+ 1/τ). For τ = 1, we have an Exponential with mean θ Γ(2) = θ.
For example, for τ = 1/3, the mean is: θ Γ(4) = 6θ.
For example for τ = 2, the mean is: θ Γ(3/2) = θ π /2 = 0.8862 θ.
Here is a graph of the mean divided by θ, as function of the shape parameter of the Weibull
Distribution, τ:
mean over theta
1.3
1.2
1.1
1.0
0.9
tau
1 2 3 4 5
The τth moment of a Weibull with shape parameter τ is: θτ Γ(1 + τ/τ) = θτ Γ(2) = θτ.
For example, the third moment of a Weibull Distribution with τ = 3 is θ3.

Problems:

1. The mean of a LogNormal distribution, exists for all values of σ > 0.
2. The mean of a Pareto distribution, exists for all values of α > 1.
3. The variance of a Weibull distribution, exists for all values of τ > 1.
A. 1 B. 2 C. 1, 2 D. 2, 3 E. 1, 2, 3
24.2 (1 point) Given a Gamma Distribution with a coefficient of variation of 0.5,

what is the value of the parameter α?
A. 1 B. 2 C. 3 D. 4 E. Cannot be determined.
24.3 (2 points) Given a Gamma Distribution with a coefficient of variation of 0.5 and skewness of 1,
what is the value of the parameter θ?
A. 1 B. 2 C. 3 D. 4 E. Cannot be determined.
24.4 (1 point) Given a Weibull Distribution with parameters θ = 100,000 and τ = 0.2,
what is the survival function at 25,000?
A. 43% B. 44% C. 45% D. 46% E. 47%
what is the mean?
A. less than 10 million
B. at least 10 million but less than 15 million
C. at least 15 million but less than 20 million
D. at least 20 million but less than 25 million
E. at least 25 million
what is the median?
A. less than 10 thousand
B. at least 10 thousand but less than 15 thousand
C. at least 15 thousand but less than 20 thousand
D. at least 20 thousand but less than 25 thousand
E. at least 25 thousand
24.7 (3 points) X follows a Gamma Distribution with α = 4 and θ = 10.

Y follows a Pareto Distribution with α = 3 and θ = 10.
X and Y are independent and Z = XY. What is the variance of Z?
A. 120,000 B. 140,000 C. 160,000 D. 180,000 E. 200,000
24.8 (1 point) Given a Pareto Distribution with parameters α = 2.5 and θ = 34 million,
what is the distribution function at 20 million?
A. less than 60%
E. at least 75%
what is the mean?
what is the median?
24.11 (2 points) Given a Pareto Distribution with parameters α = 2.5 and θ = 34 million,
what is the standard deviation?
24.12 (2 points) The times of reporting of claims follows a Weibull Distribution with τ = 1.5 and
θ = 4.
If 172 claims have been reported by time 5, estimate how many additional claims will be reported in
the future.
A. 56 B. 58 C. 60 D. 62 E. 64
24.13 (1 point) X1 , X2 , X3 , X4 are a sample of size four from an Exponential Distribution with mean
100. What is the mode of X1 + X2 + X3 + X4 ?
A. 0 B. 100 C. 200 D. 300 E. 400

• V is distributed according to an Gamma Distribution with parameters α = 3 and θ = 50.
• W is distributed according to an Gamma Distribution with parameters α = 5 and θ = 50.
• X is distributed according to an Gamma Distribution with parameters α = 9 and θ = 50.
• V, W, and X are independent.

Which of the following is the distribution of Y = V + W + X?
A. Gamma with α = 17 and θ = 50 B. Gamma with α = 17 and θ = 150
C. Gamma with α = 135 and θ = 50 D. Gamma with α = 135 and θ = 150
24.15 (3 points) A large sample of claims has an observed average claim size of $2,000 with a
variance of 5 million. Assuming the claim severity distribution to be LogNormal, estimate the
probability that a particular claim exceeds $3,500.
A. less than 0.14
E. at least 0.26
24.16 (1 point) Which of the following are Exponential Distributions?

1. The Gamma Distribution as α approaches infinity.
2. The Gamma Distribution with α = 1.
3. The Weibull Distribution with τ = 1.
A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C, or D
24.17 (2 points) Claims are assumed to follow a Gamma Distribution, with α = 5 and θ = 1000.
What is the probability that a claim exceeds 8,000? (Use the Normal Approximation.)
A. less than 4%
E. at least 10%
24.18 (1 point) What is the mode of a Pareto Distribution, with α = 2 and θ = 800?
A. less than 700
E. at least 1000
24.19 (2 points) The claim sizes at first report follow a LogNormal Distribution, with µ = 10 and
σ = 2.5.
The amount by which a claim develops from first report to ultimate also follows a LogNormal
Distribution, but with µ = 0.1 and σ = 0.5.
Assume that there are no new claims reported after first report, and that the distribution of
development factors is independent of size of claim.
What is the probability that a claim chosen at random is more than 1 million at ultimate?
A. less than 4%
E. at least 10%
24.20 (2 points) X follows a Gamma Distribution, with α = 5 and θ = 1/1000.

What is the expected value of 1/X?
A. less than 200
E. at least 260
24.21 (2 points) X follows a LogNormal Distribution, with µ = 6 and σ = 2.5.

What is the expected value of 1/X2 ?
A. less than 1.4
E. at least 2.0
24.22 (1 point) What is the mean of a Pareto Distribution, with α = 4 and θ = 3000?
A. less than 700
E. at least 1000
24.23 (2 points) What is the coefficient of variation of a Pareto Distribution, with α = 4 and θ = 3000?
A. less than 1.4
E. at least 2.0
24.24 (2 points) What is the coefficient of skewness of a Pareto Distribution, with α = 4 and
θ = 3000?
A. less than 1
E. at least 7
24.25 (3 points) Y is the sum of 90 independent values drawn from a Weibull Distribution
with τ = 1/2. Using the Normal Approximation, estimate the probability that Y > 1.2 E[Y].
A. 5% B. 10% C. 15% D. 20% E. 25%
24.26 (2 points) For a LogNormal Distribution, the ratio of the 99th percentile to the 95th percentile
is 3.4. Determine the σ parameter.
A. 1.8 B. 1.9 C. 2.0 D. 2.1 E. 2.2

• A claimant receives payments at a rate of 1 paid continuously while disabled.
• Payments start immediately.
• The length of disability follows a Gamma distribution with parameters α and θ.
At the time of disability, what is the actuarial present value of these payments?
1 - (δ + θ) - α 1 - (1 + δθ)- α θα θ
(A) (B) (C) (D)
δ δ 1 + δθ (1 + δθ)α
24.28 (1 point) What is the median of a LogNormal Distribution, with µ = 4.2 and σ = 1.8?
A. less than 70
E. at least 85
24.29 (2 points) For a LogNormal Distribution, with µ = 4.2 and σ = 1.8, what is the probability that a
claim is less than the mean?
A. less than 70%
E. at least 85%
24.30 (3 points) Calculate the fourth central moment of a Gamma Distribution with parameters α
and θ.
A. θ4 B. θ4 6α C. θ4{3α 2 +6α} D. θ4{α3 + 3α 2 +6α} E. θ4{α4 - 3α3 + 3α 2 +6α}
24.31 (1 point) What is the kurtosis of a Gamma Distribution with parameters α and θ?
Hint: Use the solution to the previous question.
A. 3 B. 3 + 6/α C. 3 + 6θ/α D. 3 + 6θ/α + 3θ2/α2 E. None of A, B, C, or D.
24.32 (1 point) For a LogNormal Distribution, with µ = 3 and σ = 0.4, what is the mean?
A. 18 B. 19 C. 20 D. 21 E. 22
24.33 (1 point) For a LogNormal Distribution, with µ = 3 and σ = 0.4, what is the median?
A. less than 18
E. at least 21
24.34 (1 point) For a LogNormal Distribution, with µ = 3 and σ = 0.4, what is the mode?
A. less than 18
E. at least 21
24.35 (1 point) For a LogNormal Distribution, with µ = 3 and σ = 0.4, what is the survival function at
50?
A. 1.1% B. 1.2% C. 1.3% D. 1.4% E. 1.5%

• Future lifetimes follow a Gamma distribution with parameters α and θ.
• A whole life insurance policy pays 1 upon death.
What is the actuarial present value of this insurance?
1 δ δ θ
(A) α (B) α (C) (D)
(δ + θ) (1 + δθ) (δ + θ)α (1 + δθ)α
24.37 (2 points) Prior to the application of any deductible, losses follow a Pareto Distribution with
α = 3.2 and θ = 135. If there is a deductible of 25, what is the density of non-zero payments at 60?
A. less than 0.0045
E. at least 0.0060

X is Normally distributed with mean 4 and standard deviation 0.8.
24.38 (1 point) What is the mean of eX?

A. 69 B. 71 C. 73 D. 75 E. 77
24.39 (2 points) What is the standard deviation of eX?

A. 69 B. 71 C. 73 D. 75 E. 77
24.40 (2 points) What is the 10th percentile of eX?

A. 20 B. 25 C. 30 D. 35 E. 40
24.41 (2 points) A company has two electric generators.

The time until failure for each generator follows an exponential distribution with mean 10.
The company will begin using the second generator immediately after the first one fails.
What is the probability that both generators have failed by time 30?
A. 70% B. 75% C. 80% D. 85% E. 90%
24.42 (3 points) For a LogNormal Distribution, with parameters µ and σ, what is the value of the
mean - mode
ratio ?
mean - median
1 - exp(-σ 2) 1 - exp(-1.5 σ 2) 1 - exp(-1.5 σ 2)
A. B. C.
1 - exp(-0.5 σ 2) 1 - exp(-0.5 σ 2) 1 - exp(-σ 2)
1 - exp(-0.5 σ 2) 1 - exp(-σ 2)
D. E.
1 - exp(-σ 2) exp(-σ 2) - exp(-0.5 σ 2)

• Future lifetimes follow a Weibull distribution with τ = 2 and θ = 30.
• The force of interest is 0.04.
• A whole life insurance policy pays $1 million upon death.
Calculate the actuarial present value of this insurance.
∞
Hint: ∫b exp(-x2) dx = π {1 - Φ(b 2 )}
(A) $325,000 (B) $350,000 (C) $375,000 (D) $400,000 (E) $425,000
24.44 (3 points) The following three LogNormal Distributions have been fit to three different sets of
claim size data for professional liability insurance:
Physicians µ = 7.8616 σ2 = 3.1311
Surgeons µ = 8.0562 σ2 = 2.8601
Hospitals µ = 7.4799 σ2 = 3.1988

Compare their means and coefficients of variation.
24.45 (2 points) The data represented by the following histogram is most likely to follow which of
the following distributions?
A. Normal B. Exponential C. Gamma, α > 1 D. Pareto E. Single Parameter Pareto
24.46 (3 points) Prior to the application of any deductible, losses follow a Pareto Distribution with
α = 2.5 and θ = 47. There is a deductible of 10. What is the variance of amount paid by the insurer
for one loss, including the possibility that the amount paid is zero?
A. 4400 B. 4500 C. 4600 D. 4700 E. 4800
24.47 (3 points) Size of loss is Exponential with mean 4. Three losses occur.
What is the probability that the sum of these three losses is greater than 20?
A. less than 8%
E. at least 14%

Prior to the effect of any deductible, the losses follow a Pareto Distribution with parameters
α = 2.5 and θ = 24.
24.48 (2 points) If the insured has an ordinary deductible of size 15, what is the average payment
by the insurer per non-zero payment?
A. 18 B. 20 C. 22 D. 24 E. 26
24.49 (2 points) If the insured has a franchise deductible of size 15, what is the average payment
by the insurer per non-zero payment?
A. 37 B. 39 C. 41 D. 43 E. 45
24.50 (2 points) If the insured has an ordinary deductible of size 10, what is the average payment
by the insurer per loss?
A. 9.5 B. 10.0 C. 10.5 D. 11.0 E. 11.5
24.51 (2 points) If the insured has an franchise deductible of size 10, and there are 73 losses
expected per year, what are the insurer's expected annual payments?
A. 850 B. 900 C. 950 D. 1000 E. 1050
24.52 (1 point) Given a Weibull Distribution with parameters θ = 1000 and τ = 1.5,
what is the mode?
A. less than 470
E. at least 500
24.53 (2 points) Size of loss is LogNormal with µ = 7 and σ = 1.6.

One has a sample of 10 independent losses: X1 , X2 , ..., X10.
Let Y be their geometric average, Y = (X1 X2 ... X10)1/10.
Determine the expected value of Y.
A. 1100 B. 1150 C. 1200 D. 1250 E. 1300
24.54 (2 points) Determine the variance of a Weibull Distribution with parameters θ = 9 and τ = 4.
Some values of the gamma function are: Γ(0.25) = 3.62561, Γ( 0.5) = 1.77245,
Γ( 0.75) = 1.22542, Γ(1) = 1, Γ(1.25) = 0.90640, Γ(1.5) = 0.88623, Γ(1.75) = 0.91906, Γ(2) = 1.
A. 3 B. 5 C. 7 D. 9 E. 11
24.55 (4 points) Demonstrate that for a Gamma Distribution with α ≥ 1,

(mean - mode)/(standard dev.) = (skewness)(kurtosis + 3)/(10 kurtosis - 12 skewness2 - 18).
24.56 (4 points) For a three-year term insurance on a randomly chosen member of a population:
(i) 1/4 of the population are smokers and 3/4 are nonsmokers.
(ii) The future lifetimes follow a Weibull distribution with:
τ = 2 and θ = 15 for smokers
τ = 2 and θ = 20 for nonsmokers
(iii) The death benefit is 100,000 payable at the end of the year of death.
(iv) i = 0.06
(A) 2000 (B) 2100 (C) 2200 (D) 2300 (E) 2400
24.57 (2 points) X1 , X2 , X3 is a sample of size three from a Gamma Distribution with α = 5 and
mean of 100. What is the mode of X1 + X2 + X3 ?
A. 200 B. 220 C. 240 D. 260 E. 280
24.58 (3 points) X follows a LogNormal Distribution, with µ = 3 and σ2 = 2.
Y also follows a LogNormal Distribution, but with µ = 4 and σ2 = 1.5.

X and Y are independent. Z = XY. What is the standard deviation of Z?
A. 35,000 B. 36,000 C. 37,000 D. 38,000 E. 39,000

• The Tan Teen Insurance Company is set up solely to jointly insure 7 independent lives.
• Each life has a future lifetime which follows a Weibull Distribution with τ = 2 and θ = 30.
• Tan Teen starts with assets of 5, which grow continuously at 10% per year.
Thus the assets at time t are: 5(1.10)t.
• Tan Teen pays 50 upon the death of the last survivor.
Calculate the probability of ruin of Tan Teen.
A. Less than 0.3%
E. At least 0.6%
24.60 (3 points) For medical malpractice insurance, the size of economic damages follows a
LogNormal Distribution with a coefficient of variation of 5.75.
What is the probability that an economic damage exceeds the mean?
A. 11% B. 13% C. 15% D. 17% E. 19%

(i) The frequency distribution for the number of losses for a policy with no deductible is
Binomial with m = 50 and q = 0.3.
(ii) Loss amounts for this policy follow the Pareto distribution with θ = 2000 and α = 4.
Determine the expected number of payments when a deductible of 500 is applied.

(A) Less than 5
(E) At least 11
24.62 (2 points) The ratio of the median to the mode of a LogNormal Distribution is 5.4.
What is the second parameter, σ, of this LogNormal?
A. 0.9 B. 1.0 C. 1.1 D. 1.2 E. 1.3
Define the interquartile range as the difference between the third and first quartiles,
in other words as the 75th percentile minus the 25th percentile.
Determine the interquartile range for a Pareto Distribution with α = 2 and θ = 1000.
(A) Less than 825
(E) At least 900
24.64 (3 points) A random sample of size 20 is drawn from a LogNormal Distribution with µ = 3 and
σ = 2. Determine E[ X 2 ].
(A) 50,000 (B) 60,000 (C) 70,000 (D) 80,000 (E) 90,000
24.65 (3 points) Determine for a Gamma Distribution with α = 4.2 and θ = 10 each of the following.
(a) Mode
(b) Mean
(c) Second Moment
(d) Third Moment
24.66 (4 points) The probability of a major league baseball team scoring n runs in a game is:
F(n+1) - F(n), where F is a Weibull Distribution with θ = 5.4 and τ = 1.8.
With the aid of a computer, graph the probability of n runs for n = 0 , 1, ..., 15.
24.67 (1 point) The life (in thousands of miles) of a certain type of electronic control for locomotives
follows a Lognormal Distribution with µ = 5.149 and σ = 0.737.
Determine the standard deviation (in thousands of miles) of this lifetime.
A. 170 B. 180 C. 190 D. 200 E. 210
24.68 (1 point) The life in years of a type of generator field winding is given by
a Weibull distribution with τ = 2 and θ = 13.
Determine how long it takes until on average for 10% of the field windings to have failed.
A. 3.6 B. 3.8 C. 4.0 D. 4.2 E. 4.4
24.69 (3 points) For a Gamma Distribution with α = 5 and θ = 200, compute F(600) and F(1200).
α-1
∑
xi e- x
Hint: For alpha integer, Γ(α ; x) = 1 - .
i!
i=0
24.70 (3 points) A lognormal life distribution for an electrical insulation has

µ = 1.3355 and σ = 0.2265.
Calculate the following.
(a) Median life.
(b) Most likely (modal) life.
(c) Mean.
24.71 (2 points) A Weibull distribution for engine fan life has θ = 26,710 hours and τ = 1.053.
Calculate the following.
(a) Median life.
(b) Most likely (modal) life.
24.72 (4 points) For a Gamma Distribution with α = 2 and θ = 50 determine the 99th percentile.
(a) Use the Normal Approximation.
(b) Use the Chi-Square Table.
24.73 (1 point) For a given line of insurance, the percentage of ultimate losses paid for an accident
year follows a Weibull Distribution with τ = 1.5 and θ = 20, where x is the number of months from the
beginning of the accident year to the present.
Determine the expected percentage of ultimate losses remaining to be paid at 36 months from the
beginning of an accident year.
A. 6% B. 7% C. 8% D. 9% E. 10%
24.74 (2 points) In the land of Normark, income follows a Pareto DIstribution with α = 1.8.
A person is defined as “middle class” if their income is at least 2/3 of the median income and at most
twice the median income.
In Normark, what portion of the people are middle class?
A. 31% B. 32% C. 33% D. 34% E. 35%
24.75 (2, 5/83, Q.7) (1.5 point) W has LogNormal Distribution with parameters
µ = 1 and σ = 2. Which of the following random variables has a uniform distribution on [0, 1]?
A. In[(Φ[W] - 1)/4] B. Φ[(ln[W] - 1)/4] C. Φ[(ln[W] - 1)/2] D. Φ[ln[(W - 1)/2]] E. In[(Φ[W] - 1)/2]
24.76 (2, 5/85, Q.46) (1.5 point) Let X be a continuous random variable with density function
f(x) = ax2 e-bx for x ≥ 0, where a > 0 and b > 0. What is the mode of X?
A. 0 B. 2 C. 2/b D. b/2 E. ∞
1. If X is normally distributed, then ln(X) is lognormally distributed.
2. The tail of a Pareto distribution does not approach zero as fast as does the tail of
a lognormal distribution.
3. The mean of a Pareto distribution exists for all values of its parameters.
A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3
24.78 (4, 5/87, Q.48) (1 point) Which of the following are true?
1. The sum of 25 independent identically distributed random variables from
a markedly skewed distribution has an approximate normal distribution.
2. The sum of two independent normal random variables is a normal random variable.
3. A random variable X is lognormally distributed if X = ln(Z) where Z is normally distributed.
A. 1 B. 2 C. 3 D. 2, 3 E. None of the above.
24.79 (160, 5/88, Q.4) (2.1 points) A survival function is defined by:
f(t) = k (t/β2) e-t/β; t > 0, β > 0. Determine k.
(A) 1 / β4 (B) 1 / β2 (C) 1 (D) β2 (E) β4
1. The LogNormal distribution is symmetric.
2. The tail of the Pareto distribution fades to zero more slowly than does that of a LogNormal,
for large enough x.
3. An important application of the binomial distribution is in connection with the distribution
of claim frequencies when the risks are not homogeneous.
A. 1 B. 2 C. 1, 2 D. 2, 3 E. 1, 2, 3
24.81 (4, 5/89, Q.42) (1 point) Which of the following are true?
1. If the independent random variables X and Y are Poisson variables,
then Z = X + Y is a Poisson random variable.
2. If X1 ,,....Xn are n independent unit normal variables, then the sum Z = X1 +..+ Xn is
normally distributed with mean µ = n and standard deviation σ = n.
3. If X is normally distributed then Y = lnX has a LogNormal distribution.
A. 1 B. 3 C. 1, 2 D. 1, 3 E. 1, 2, 3
24.82 (4, 5/89, Q.44) (2 points) The severities of individual claims are Pareto distributed, with
parameters α = 8/3 and θ = 8,000. Using the Central Limit Theorem, what is the probability that the
sum of 100 independent claims will exceed 600,000?
A. Less than 0.025
E. 0.175 or more
24.83 (4, 5/89, Q.56) (1 point) The random variable X has a Pareto distribution
F(x) = 1 - {100 / (100+x)}2 , for x > 0. Which of the following distribution functions represents the
distribution function of X truncated from below at 100?
A. 1 - {200 / (100+x)}2 , x > 100
B. 1 - (100/x)2 , x > 100
C. 1 - 1 / (x - 100)2 , x > 100
D. 1 - {200 / (200+x)}2 , x > 0
E. 1 - {100 / (x - 100)}2 , x > 0
24.84 (4, 5/90, Q.50) (2 points) The underlying claim severity distribution for the ABC Insurance
Company is lognormal with parameters µ = 7 and σ2 = 10. The company only records losses that
are less than $50,000. Let X be the random variable representing all losses with cumulative
distribution function FX(x) and Y be the random variable representing the company's recorded
losses with cumulative distribution function FY(x).
Then for x ≤ $50,000 FY(x) = A FX(x) where A is the necessary adjustment factor.
In what range does A fall?
A. A < 0.70
B. 0.70 ≤ A < 0.90
C. 0.90 ≤ A < 1.10
D. 1.10 ≤ A < 1.30
E. 1.30 ≤ A
• X1 and X2 are independent, identically distributed random variables.
• X = X1 + X2
Which of the following are true?
1. If X1 , X2 have Poisson distributions with mean µ, then X has a Poisson distribution with mean 2µ.
2. If X1 , X2 have gamma distributions with parameters α and θ, then X has a gamma distribution
with parameters 2α and 2θ.

3. If X1 , X2 have standard normal distributions, then X has a normal distribution with mean 0 and
variance 2.
A. 1 only B. 2 only C. 1, 3 only D. 2, 3 only E. 1, 2, 3
24.86 (4B, 11/92, Q.31) (2 points) The severity distribution of individual claims is gamma with
parameters α = 5 and θ = 1000. Use the Central Limit Theorem to determine the probability that
the sum of 100 independent claims exceeds $525,000.
A. Less than 0.05
E. At least 0.20
24.87 (4B, 11/93, Q.19) (2 points) A random variable X is distributed lognormally with parameters
µ = 0 and σ = 1. Determine the probability that X lies within one standard deviation of the mean.
A. Less than 0.65
E. At least 0.95
• X1 , X2 , and X3 are random variables representing the amount of an individual claim.
• The first and second moments for X1 , X2 , and X3 are:
E[X1 ] = 1. E[X1 2 ] = 1.5. E[X2 ] = 0.5. E[X2 2 ] = 0.5. E[X3 ] = 0.5. E[X3 2 ] = 1.5.
For which of the random variables X1 , X2 , and X3 is it appropriate to use a Pareto distribution?
A. X1 B. X2 C. X3 D. X1 , X3 E. None of A, B, C or D
• Losses follow a Weibull distribution with parameters θ = 20 and τ = 1.0.
• A random sample of losses is collected, but the sample data is truncated from below by
a deductible of 10.
Determine the probability that an observation from the sample data is at most 25.
A. Less than 0.50
E. At least 0.80
X is a random variable representing size of loss.
Y = ln(x) is a random variable having a normal distribution with a mean of 6.503 and standard
deviation of 1.500. Determine the probability that X is greater than $1,000.
A. Less than 0.300
E. At least 0.375
A portfolio consists of 10 independent risks.
The distribution of annual losses (x is in $ millions) for each risk is given by a Gamma Distribution
f(x) = θ−αxα−1 e−x/θ / Γ(α), x > 0, with θ = 1 and α = 0.1.
Determine the probability that the portfolio has aggregate annual losses greater than $1.0 million.
A. Less than 20%
E. At least 80%
Losses follow a Weibull distribution with parameters θ = 20 and τ = 1.0. For each loss that occurs,
the insurerʼs payment is equal to the amount of the loss truncated and shifted by a deductible of 10.
If the insurer makes a payment, what is the probability that an insurerʼs payment is less than or equal
to 25?
A. Less than 0.65
E. At least 0.80
24.93 (4B, 11/95, Q.8) (2 points) Losses follow a Weibull distribution, with parameters
θ (unknown) and τ = 0.5.
Determine the ratio of the mean to the median.
A. Less than 2.0
D. At least 4.0
E. Cannot be determined from the given information.
24.94 (4B, 11/96, Q.8) (1 point) The random variable Y is the sum of two independent and
identically distributed random variables, X1 and X2 . Which of the following statements are true?
1. If X1 and X2 have Poisson distributions, then Y must have a Poisson distribution.
2. If X1 and X2 have gamma distributions, then Y must have a gamma distribution.
3. If X1 and X2 have lognormal distributions, then Y must have a lognormal distribution.
A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3

• A portfolio consists of 16 independent risks.
• For each risk, losses follow a Gamma distribution, with parameters θ = 250 and α = 1.
24.95 (4B, 5/97, Q.6) (2 points) Without using the Central Limit Theorem, determine the
probability that the aggregate losses for the entire portfolio will exceed 6,000.
A. 1 - Γ(1; 1) B. 1 - Γ(1; 24) C. 1 - Γ(1; 384) D. 1 - Γ(16; 24) E. 1 - Γ(16; 384)
24.96 (4B, 5/97, Q.7) (2 points) Using the Central Limit Theorem, determine the approximate
probability that the aggregate losses for the entire portfolio will exceed 6,000.
A. Less than 0.0125
E. At least 0.0500
• The random variable X has a Weibull distribution, with parameters θ = 625 and τ = 0.5.
• Z is defined to be 0.25X.
Determine the correlation coefficient of X and Z.
A. 0.00 B. 0.25 C. 0.50 D. 0.75 E. 1.00
24.98 (4B, 11/98, Q.27) (2 points) Determine the skewness of a gamma distribution with a
coefficient of variation of 1. Hint: The skewness of a distribution is defined to be the third central
moment divided by the cube of the standard deviation.
A. 0 B. 1 C. 2 D. 4 E. 6
24.99 (4B, 5/99. Q.1) (1 point) Which of the following inequalities is true for a Pareto distribution
with a finite mean?
A. Mean < Median < Mode B. Mean < Mode < Median C. Median < Mode < Mean
D. Mode < Mean < Median E. Mode < Median < Mean
For a laser operated gene splicer, you are given:
(i) It has a Weibull survival model with parameters θ = 2 and τ = 2.
(ii) It was operational at time t = 1.
(iii) It failed prior to time t = 4.
Calculate the probability that the splicer failed between times t = 2 and t = 3.
(A) 0.2046 (B) 0.2047 (C) 0.2048 (D) 0.2049 (E) 0.2050
24.101 (1, 5/00, Q.7) (1.9 points) An insurance companyʼs monthly claims are modeled by a
continuous, positive random variable X, whose probability density function is proportional to
(1 + x)-4, where 0 < x < ∞.
Determine the companyʼs expected monthly claims.
(A) 1/6 (B) 1/3 (C) 1/2 (D) 1 (E) 3
24.102 (3, 5/00, Q.8) (2.5 points) For a two-year term insurance on a randomly chosen member
of a population:
(i) 1/3 of the population are smokers and 2/3 are nonsmokers.
(ii) The future lifetimes follow a Weibull distribution with:
τ = 2 and θ = 1.5 for smokers
τ = 2 and θ = 2.0 for nonsmokers
(iii) The death benefit is 100,000 payable at the end of the year of death.
(iv) i = 0.05
(A) 64,100 (B) 64,300 (C) 64,600 (D) 64,900 (E) 65,100
24.103 (3, 5/01, Q.24) (2.5 points) For a disability insurance claim:
(i) The claimant will receive payments at the rate of 20,000 per year, payable
continuously as long as she remains disabled.
(ii) The length of the payment period in years is a random variable with the gamma
distribution with parameters α = 2 and θ = 1.
(iii) Payments begin immediately.
(iv) δ = 0.05
Calculate the actuarial present value of the disability payments at the time of disability.
(A) 36,400 (B) 37,200 (C) 38,100 (D) 39,200 (E) 40,000
24.104 (1, 11/01, Q.35) (1.9 points) Auto claim amounts, in thousands, are modeled by a random
variable with density function f(x) = xe-x for x ≥ 0.
The company expects to pay 100 claims if there is no deductible.
How many claims does the company expect to pay if the company decides to introduce
a deductible of 1000?
(A) 26 (B) 37 (C) 50 (D) 63 (E) 74
24.105 (CAS3, 5/04, Q.21) (2.5 points) Auto liability losses for a group of insureds
(Group R) follow a Pareto distribution with α = 2 and θ = 2,000.
Losses from a second group (Group S) follow a Pareto distribution with α = 2 and θ = 3,000.
Group R has an ordinary deductible of 500, while Group S has a franchise deductible of 200.
Calculate the amount that the expected cost per payment for Group S exceeds that for
Group R.
A. Less than 350
D. At least 950, but less than 1,250
E. At least 1,250
24.106 (CAS3, 11/04, Q.25) (2.5 points)

Let X be the random variable representing the aggregate losses for an insured.
X follows a gamma distribution with mean of $1 million and coefficient of variation 1.
An insurance policy pays for aggregate losses that exceed twice the expected value of X.
Calculate the expected loss for the policy.
24.107 (CAS3, 5/05, Q.35) (2.5 points) An insurance company offers two types of policies,
Type Q and Type R. Type Q has no deductible, but a policy limit of 3,000.
Type R has no limit, but an ordinary deductible of d.
Losses follow a Pareto distribution with θ = 2,000 and α = 3.
Calculate the deductible, d, such that both policy types have the same expected cost per loss.
A. Less than 50
E. 200 or more
24.108 (SOA M, 11/05, Q.8) (2.5 points) A Mars probe has two batteries. Once a battery is
activated, its future lifetime is exponential with mean 1 year. The first battery is activated when the
probe lands on Mars. The second battery is activated when the first fails. Battery lifetimes after
activation are independent. The probe transmits data until both batteries have failed.
Calculate the probability that the probe is transmitting data three years after landing.
(A) 0.05 (B) 0.10 (C) 0.15 (D) 0.20 (E) 0.25
24.109 (CAS3, 5/06, Q.25) (2.5 points)

Calculate the skewness of a Pareto distribution with α = 4 and θ = 1,000.
A. Less than 2
E. At least 8
24.110 (CAS3, 5/06, Q.36) (2.5 points)

The following information is available for a collective risk model:
• X is a random variable representing the size of each loss.
• X follows a Gamma Distribution with α = 2 and θ = 100.
• N is a random variable representing the number of claims.
• S is a random variable representing aggregate losses.
• S = X1 + ... + XN.
Calculate the mode of S when N = 5.
A. Less than 950
E. At least 1250

(i) The frequency distribution for the number of losses for a policy with no deductible is
negative binomial with r = 3 and β = 5.
(ii) Loss amounts for this policy follow the Weibull distribution with θ = 1000 and τ = 0.3.
Determine the expected number of payments when a deductible of 200 is applied.
(A) Less than 5
(E) At least 11
24.112 (2 points) In the previous question, 4, 5/07, Q.39, determine the variance of the number of
payments when a deductible of 200 is applied.
A. 20 B. 30 C. 40 D. 50 E. 60
24.1. E. 1. True. All of the moments of a LogNormal exist. 2. True. (For the Pareto, the nth
moment exists if n > α.) 3. True. All of the moments of a Weibull exist.
24.2. D. For the Gamma Distribution, the mean is αθ, while the variance is αθ2 .
variance αθ2 1
Thus the coefficient of variation is: = = .
mean αθ α
Thus for the Gamma Distribution, α = 1/CV2 . Thus α = 1 / (0.5)2 = 4.
24.3. E. Both the coefficient of variation and the skewness do not depend on the scale parameter
θ. Therefore θ can not be determined from the given information.
For the Gamma Distribution, the coefficient of variation = 1/ α and Skewness = 2/ α .
24.4. E. S(25,000) = exp[-(25,000/100,000)0.2] = 0.469.
24.5. B. The mean of the Weibull is θΓ(1+ 1 /τ) = (100,000)Γ(1+ 1/0.2) = (100,000)Γ(6) = 5!
(100,000) = (120) (100,000) = 12 million.
24.6. C. The median of the Weibull is such that .5 = F(m) = 1 - exp(-(m/θ)τ).
Thus, -(m/θ)τ = ln 0.5. m = θ(-ln0.5)1/τ = θ(0.693147)1/τ = (100,000) (0.6931475 ) = 16,000.

Comment: Note that the median for this Weibull is much smaller than the mean, a symptom of a
distribution skewed to the right (positively skewed.)
24.7. C. In general for X and Y two independent variables :

E[XY] = E[X]E[Y] and E[X2 Y 2 ] = E[X2 ]E[Y2 ].
VAR[XY] = E[(XY)2 ] - E[XY]2 = E[X2 Y 2 ] - {E[X]E[Y]}2 = E[X2 ]E[Y2 ] - E[X]2 E[Y]2 .
n! θn
For the Pareto Distribution the moments are: n , α > n.
∏ (α − i)
i=1
Therefore, E[Y] = θ/(α-1) = 10/2 = 5, and E[Y2 ] = 2θ2 /{(α−1)(α-2)} = (2)(102 )/{(2)(1)} = 100.
n−1
For the Gamma Distribution the moments are: θn ∏(α + i) .
i=0
Therefore, E[X] = αθ = 40, and E[X2 ] = α(α+1)θ2 = (4)(5)/(102 ) = 2000.

Therefore, VAR[Z] =(2000)(100) - {(40)(5)}2 = 160,000.
24.8. C. F(20 million) = 1 - {34/(34 + 20)}2.5 = 68.5%.
24.9. D. The mean of the Pareto is θ / (α-1) = 34 million / 1.5 = 22.667 million.
24.10. B. The median of the Pareto is such that .5 = F(m) = 1 - {θ/(θ+m)}α.
Thus (θ+m)/θ = 0.5-1/α, and m = θ{0.5-1/α - 1} = 34 million{0.5-1/2.5 - 1} = 10.86 million.

Comment: Note the connection to Simulation by the Method of Inversion and to fitting via Percentile
Matching.
n! θn
24.11. B. For the Pareto Distribution the moments are: n , α > n.
∏ (α − i)
i=1
Therefore putting everything in units of 1 million, E[X] = θ/(α−1) = 34/1.5 = 22.67 and
E[X2 ] = 2θ2 /{(α−1)(α−2)} = (2)(342 ) / {(1.5)(.5)} = 3083. Thus VAR[X] = 3803 - (22.67)2 = 2569.
Therefore, the standard deviation of X is 2569 = 50.7 (million).
Alternately, for the Pareto the variance = θ2α / { (α−2)(α−1)2 } = (342 )(1012)(2.5) / {(.5)(1.52 )} =
2569 x1012. Therefore, the standard deviation is: 2569 x 10 12 = 50.7 million.
Comment: Note that the coefficient of variation = standard deviation / mean
= 50.7 / 22.67 = 2.24 = 5 = 2.5 / 0.5 = α / (α - 2) .
24.12. A. F(5) = 1 - exp[-(5/4)1.5] = .753. Expect 172/.753 = 228 claims ultimately reported.
Expect 228 - 172 = 56 claims reported in the future.
Comment: We are dealing with time rather than size of loss.
F(5) = the expected percentage of claims reported by time 5.
F(5) (expected total number of claims) = expected number of claims reported by time 5.
expected total number of claims ≅ (number of claims reported by time 5)/F(5).
24.13. D. X1 + X2 + X3 + X4 is a Gamma Distribution with α = 4 and θ = 100.
As shown in the Appendix attached to the exam, for α > 1 its mode is: θ(α - 1) = 300.
24.14. A. The sum of independent random variables each of which follows a Gamma distribution
with the same scale parameter, is also a Gamma distribution; it has a shape parameter equal to the
sum of the shape parameters and the same scale parameter.
Thus Y is Gamma with α = 3 + 5 + 9 = 17, and θ = 50.
24.15. B. Mean of LogNormal is exp(µ + 0.5σ2 ). Second Moment of the LogNormal is

exp(2µ + 2σ2 ). Therefore the variance is: exp(2µ + 2σ2 ) - exp(2µ + σ2 ) =
exp(2µ + σ2 ) {1 + exp(σ2 )}. CV2 = Variance / Mean2 = exp(σ2 ) - 1. σ = {ln(1 + CV2 )}0.5 =
{ln(1 + 5/4)}0.5 = 0.9005. µ = ln(Mean) - 0.5σ2 = ln(2000) - 0.5(.90052 ) = 7.1955.
Chance that a claim exceeds $3,500 is 1 -F(3500) = 1 - Φ[{ln(3500) - µ} / σ] =
1 - Φ[(8.1605 - 7.1955) / 0.9005] = 1 - Φ[1.07] = 1 - 0.8577 = 0.1423.
24.16. C. The Gamma Distribution as α approaches infinity is a Normal Distribution.

The Gamma Distribution with α = 1 is an Exponential Distribution.
The Weibull Distribution with τ = 1 is an Exponential Distribution.
24.17. D. For the Gamma Distribution: Mean = αθ = 5000, Variance = αθ2 = 5 million.
Thus the Standard Deviation is 5 million = 2236. Thus the chance of a claim exceeding 8,000 is
approximately: 1 - Φ[(8000 - 5000)/2236] = 1 - Φ(1.34) = 1 - 0.9099 = 9.01%.
Comment: When applying the Normal Approximation to a continuous distribution, there is no
“continuity correction” such as is applied as when approximating a discrete distribution.
(In any case, here it would make no significant difference.) The Gamma approaches a Normal as α
approaches infinity. In this case, the exact answer is given via an incomplete Gamma Function:
1 - Γ(α; x/θ) = 1 - Γ(5; 8) = 1 - 0.900368 = 9.9632%, gotten via computer.
24.18. A. The mode of any Pareto Distribution is 0.
24.19. C. Let X be the losses at first report, let Y be the loss development factor and let
Z = XY be the losses at ultimate. Ln(X) and ln(Y) are two independent Normal variables.
Therefore, ln(Z) = ln(XY) = ln(X) + ln(Y) is a Normal variable. Therefore, Z is a LogNormal variable.
ln(Z) = ln(X) + ln(Y) has mean equal to the sum of the means of ln(X) and ln(Y) and variance equal to
the sum of the variances of ln(X) and ln(Y). Therefore ln(Z) has parameters µ = 10 + 0.1 = 10.1 and
σ2 = 2.52 + 0.52 = 6.5, and therefore so does Z.
For a LogNormal Distribution F(z) = Φ[{ln(z) − µ} / σ]. In this case with µ = 10.1 and σ = 2.55,
F(1,000,000) = Φ[{ln(1,000,000) - 10.1} / 2.55] = Φ[1.46] = .9279.
1 - F(1,000,000) = 1 - 0.9279 = 0.0721.
Comment: The product of two independent LogNormal variables is also LogNormal.
Note that the variances add, not the σ parameters themselves.
∞ ∞
24.20. D. E[1/X] = ∫0 f(x) / x dx = {θ−α/ ∫0
Γ(α)} xα - 2 e- x / θ dx =
{θ−α / Γ(α)} / {θ−(α−1) / Γ(α-1)} = 1/ {θ(α-1)} = 1000/(5-1) = 250.

Alternately, the moments of the Gamma Distribution are E[Xn ] = θn Γ(α+n) / Γ(α).
This formula works for n positive or negative. Therefore for n = -1, α = 5 and θ = 1/1000:
E[1/X] = 1000 Γ(4) / Γ(5) = 1000 (3!)/(4!) = 1000 / 4 = 250.
Alternately, if X follows a Gamma, then Z = 1/X has Distribution F(z) = 1 - Γ(α; 1/ θz) =
1 - Γ(5; 1000/ z), which is an Inverse Gamma, with scale parameter 1000 and α = 5.
The Inverse Gamma has Mean = θ/(α-1) = 1000 / (5-1) = 250.
Comment: Note that theta for the Gamma of 1/1000 becomes one over theta for the Inverse
Gamma, which has theta equal to 1000.
24.21. C. The moments of the LogNormal Distribution are E[Xn ] = exp[nµ + 0.5 n2 σ2].
Therefore for n = -2, with µ = 6 and σ = 2.5, E[1/X2 ] = exp[-12 + (2) 2.52] = e0.5 = 1.65.
Alternately, if lnX follows a Normal, then if Z = 1/X, lnZ = - lnX also follows a Normal ( but with mean
of -6 and standard deviation of 2.5.) Therefore, Z follows a LogNormal with µ = -6 and σ = 2.5.
Then one can apply the formula for moments of the LogNormal in order to get the second moment
of Z: E[1/X2 ] = E[Z2 ] = exp[-12 + (2) 2.52 ] = e0.5 = 1.65.
Alternately, if Y = 1/X2 , then lnY = -2lnX also follows a Normal but with mean = (-2)(6) = -12 and
standard deviation equal to ( |-2| )(2.5) = (2)(2.5) = 5. Thus Y follows a LogNormal with
µ = -12 and σ = 5. Thus E[1/X2 ] = E[Y] = exp[µ + 0.5σ2] = exp[-12 + (1/2) 52 ] = e0.5 = 1.65.
24.22. E. For a Pareto, mean = θ/(α-1) = 3000 / (4-1) = 1000.
24.23. B. For a Pareto, coefficient of variation = α / (α - 2) = 2 = 1.414.

n! θn
Alternately, For the Pareto Distribution the moments are: n , α > n.
∏ (α − i)
i=1
E[X] = θ/(α-1) = 1000. E[X2 ] = 2θ2/{(α-1)(α-2)} = 3,000,000. σ = E[X2] - E[X]2 = 1414.2.

coefficient of variation = σ/E[X] = 1414.2 / 1000 = 1.414.
Comment: The coefficient of variation does not depend on the scale parameter θ.
24.24. E. For a Pareto, skewness = 2{(α+1)/(α-3)} (α - 2) / α = 2(5)/ 2 = 7.071.

n! θn
Alternately, for the Pareto Distribution the moments are: , α > n.
n
∏ (α − i)
i=1
E[X] = θ/(α−1) = 1000. E[X2 ] = 2θ2/{(α−1)(α−2)} = 3,000,000. σ = E[X2] - E[X]2 = 1414.2.
E[X3 ] = 6θ3/{(α−1)(α−2)(α-3)} = 27,000,000,000.

skewness = {E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 } / σ3 =
27,000,000,000 - (3)(1000)(3,000,000) + (2)(10003)
= 7.07.
1414.2 3
Comment: The skewness does not depend on the scale parameter θ. Large positive skewness,
is typical for a heavier-tailed distribution such as the Pareto, when its skewness exists (for α > 3.)
24.25 .D. The mean of the Weibull is: θ Γ(1 + 1/τ) = θ Γ(1 + 2) = θ Γ(3) = θ 2! = 2θ.
The second moment of the Weibull is: θ2 Γ(1 + 2/τ) = (θ2)Γ(1 + 4) = (θ2)Γ(5) = (θ2)4! = 24θ2.
The variance of the Weibull is: 24θ2 - (2θ)2 = 20θ2.
Y has a mean of: (90)(2θ) = 180θ, and a variance of: (90)(20θ2) = 1800θ2.
Prob[X > 1.2 E[X]] = 1 - Φ[(216θ - 180θ) / 1800 θ2 ] = 1 - Φ[0.85] = 19.77%.
24.26. A. The 99th percentile is: exp[µ + 2.326σ]. The 95th percentile is: exp[µ + 1.645σ].
3.4 = exp[µ + 2.326σ] / exp[µ + 1.645σ] = exp[0.681σ]. ⇒ σ = 1.797.
24.27. B. Given a disability of length t, the present value of an annuity certain is:
(1 - e-δt)/δ. The expected present value is the average of this over all t:
∞ ∞ ∞
e- δt e - t / θ tα - 1
∫0 ∫0 ∫0
1 1 1 1
{(1 - e - δt) / δ} f(t) dt = f(t) - α dt = - α e- t(δ + 1/ θ) tα - 1 dt =
δ θ Γ(α) δ δ θ Γ(α)
1 1 1 Γ(α) 1 - (1 + δθ)-α
- = .
δ δ θα Γ(α) (δ + 1/ θ)α δ
Comment: Similar to 3, 5/01, Q. 24.

∞
I used the fact that ∫0 tα - 1 e- t / θ dt = Γ(α)θα. ⇔ Gamma density integrates to 1 over its support.
24.28. A. The median is where the Distribution Function is 0.5. Φ[ {ln(x)−µ} / σ] = 0.5.
Therefore, {ln(x)−µ} / σ = 0. x = eµ = e4.2 = 66.7.

24.29. D. The mean is exp[µ + 0.5σ2]. The Distribution Function at the mean is:
Φ[{ln(exp[µ + 0.5σ2]) - µ} / σ] = Φ[{(µ + 0.5σ2)) - µ} / σ] = Φ[σ/2] = Φ[0.9] = 0.8159.

Comment: For a heavier-tailed distribution, thereʼs only a small chance that a claim is greater than the
mean; a few large claims contribute a great deal to the mean.
The mean is: exp[µ + 0.5σ2] = exp[4.2 + (1/2)(1.82 )] = exp[5.82] = 336.972.
F(336.972) = Φ[{ln(336.972) - 4.2}/1.8] = Φ[0.9] = 0.8159.
24.30. C. For the Gamma Distribution, the moments about the origin are:
E[Xn ] = θn Γ(α+n)/Γ(α). E[X] = θα. E[X2 ] = θ2α(α+1). E[X3 ] = θ3α(α+1)(α+2).
E[X4 ] = θ4α(α+1)(α+2)(α+3).
Fourth Central Moment = E[X4 ] - 4E[X]E[X3 ] + 6E[X]2 E[X2 ] - 3E[X]4 =
θ4α {(α+1)(α+2)(α+3) - 4α(α+1)(α+2) + 6αα(α+1) - 3α3} =
θ4α {α3 + 6α2 +11α + 6 - 4α3 -12α2 - 8α + 6α3 + 6α2 - 3α3} = θ4 {3α 2 + 6α}.
24.31. B. From the previous solution, for the Gamma Distribution,

Fourth Central Moment = θ4{3α 2 + 6α}. Variance = θ2α.
Kurtosis = Fourth Central Moment/ Variance2 = 3 + 6/α.
Comment: Note that the scale parameter, θ, does not appear in the kurtosis, which is a
dimensionless quantity. Also note that the kurtosis of a Gamma is always larger than that of a Normal
Distribution, which is 3. The Gamma has a heavier tail than the Normal.
24.32. E. For the LogNormal, the mean is: exp(µ + 0.5 σ2) = exp(3.08) = 21.8.
24.33. D. The median is that point where F(x) = 0.5.

Thus Φ[{ln(x) - µ} / σ] = 0.5. ⇒ 0 = {ln(x) - µ} / σ.
Thus ln(x) = µ, or the median = eµ = e3 = 20.1.
24.34. A. The mode is that point where f(x) is a maximum. For the LogNormal:
f(x) = exp[-0.5 ({ln(x) − µ} / σ)2] / {xσ 2 π ). fʼ(x) = -f(x)/x - f(x) ({ln(x) − µ} / σ) /xσ.
Thus fʼ(x) = 0 for ({ln(x) − µ} / σ2) = -1. ⇒ mode = exp(µ - σ2) = exp(2.84) = 17.1.
Comment: The formula for the mode of a LogNormal is in the Tables attached to your exam.
24.35. A. S(50) = 1 - Φ[(ln(50) - 3)/0.4] = 1 - Φ[2.28] = 1 - 0.9887 = 1.13%.
24.36. E. The probability of death at time t, is the density of the Gamma Distribution:
f(t) = e-t/θtα−1 / {θαΓ(α)}. The present value of a payment of one at time t is e−δt .
Therefore, the actuarial present value of this insurance is:
∞ ∞
e - t / θ tα - 1 Γ(α)
∫0 ∫0
1 1 1
e- δt dt = e- t(δ + 1/ θ) tα - 1 dt = α = .
θα Γ(α) α
θ Γ(α) θ Γ(α) (δ + 1/ θ)α (1 + δθ)α
Comment: The Gamma Distribution is too heavy-tailed to be a good model of future lifetimes.
24.37. C. After truncating and shifting from below by a deductible of size d, one gets another
Pareto Distribution, but with parameters α and θ + d, in this case 3.2 and
135 + 25 = 160. This has density of: (αθα)(θ + x) − (α + 1) = (3.2)(1603.2)(160+x)-4.2.

Plugging in x = 60 one gets: (3.2)(1603.2)(160+60)-4.2 = 0.00525.
Alternately, after truncating and shifting by 25, G(x) = 1 - S(x+25)/S(25) =
1 - {(135/(135 + x + 25))3.2} / {(135/(135 +25))3.2} = 1 - (160/(160 + x))3.2.
This is a Pareto Distribution with α = 3.2 and θ = 160. Proceed as before.
Alternately, after truncating and shifting by 25, g(x) = f(x+25)/S(25) =
(3.2)(1353.2)(135 + x + 25) -4.2 / {(135/(135 +25))3.2} = (3.2)(1603.2)(160+x)-4.2.
h(60) = (3.2)(1603.2)(220)-4.2 = 0.00525.
24.38. D. & 24.39. B. eX is LogNormal with µ = 4 and σ = 0.8.
Mean of LogNormal = E[eX] = exp[µ + σ2/2] = exp[4 + 0.82 /2] = 75.189.
E[(eX)2 ] = E[e2X] = Second moment of LogNormal = exp[2µ + 2σ2] = exp[(2)(4 + 0.82 )] =

10,721.4.
Variance of LogNormal = 10,721.4 - 75.1892 = 5068.0.
Standard Deviation of LogNormal = 5068.0 = 71.2.
Alternately, for the LogNormal, 1 + CV2 = E[X2 ]/E[X]2 = exp[σ2] = exp[0.82 ] = 1.8965.
⇒ CV = 0.9468. ⇒ Standard Deviation of LogNormal = (0.9468)(75.189) = 71.2.

24.40. A. eX is LogNormal with µ = 4 and σ = 0.8.
0.1 = Φ[{ln(x) - 4}/0.8]. ⇒ -1.282 = {ln(x) - 4} / 0.8. ⇒ x = 19.58.

Alternately, the 10th percentile of the Normal Distribution is: 4 - (1.282)(0.8) = 2.9744.
Thus, the 10th percentile of eX is: e2.9744 = 19.58.
24.41. C. The sum of two identically distributed Exponential Distributions is a Gamma Distribution
with α = 2. The density of a Gamma Distribution with α = 2 and θ = 10 is:
f(t) = (t/10)2 e-t/10 /{t Γ(2)} = .01 t e-t/10.
30
t = 30
Prob[T ≤ 30] = ∫0 0.01 t e- t / 10 dt = -(t / 10) e- t / 10 - e- t / 10
t =0
] = 1 - 4e-3 = 80.1%.
Alternately, Prob[T ≤ 30] = Γ[2; 30/10] = 1 - (30/10) e-30/10 - e-30/10 = 80.1%.
24.42. B. For the LogNormal: mean is exp[µ + 0.5 σ2], median = eµ, mode = exp(µ-σ2).
(mean - mode)/(mean - median) = {exp(µ + 0.5 σ2) - exp(µ-σ2)} / {exp(µ + 0.5 σ2) - eµ} =
{1 - exp(-1.5σ2)} / {1 - exp(-0.5σ2)}.
Comment: Note that for the LogNormal, mean > median > mode (alphabetical order) . This is
typical for a continuous distribution with positive skewness. (The situation is reversed for negative
skewness.) Also note that the median is closer to the mean than to the mode (just as it is in the
dictionary.) Also, note that as σ goes to zero, this ratio goes to 1.5/.5 = 3.
(For curves with “mild” skewness, it is reasonable to approximate this ratio by 3, according to
Kendallʼs Advanced Theory of Statistics.)
24.43. D. The probability of death at time t, is the density of the Weibull Distribution:
f(t) = τ(t/θ)τ exp(-(t/θ)τ) / t = 2t exp(-(t/30)2 ) / 302. The present value of a payment of one at time t is
e−δt = e-0.04t. Therefore, the actuarial present value of this insurance in million of dollars is:
∞ ∞
∫0 e- 0.04t t exp[-(t / 30)2] / 450 dt = (1/450) ∫0 t exp[-{(t / 30)2 + 0.04t + (0.62) - 0.36}] dt =
∞ ∞
(1/450) e0.36 ∫0 t exp[-{(t / 30 + 0.6}2] dt = (1/450) e0.36 ∫0.6 (30x -18) exp(-x2) 30 dx =
∞ ∞
e0.36 ∫0.6 2x exp(-x2) dx - (6/5) e0.36 0.6∫ exp(-x2) dx =
x =∞
e0.36 {- exp(-x2)} ] - e0.36 (6/5) π {1 - Φ(0.6 2 )} = 1 - e0.36 (6/5) π {1 - Φ(0.8489)} =
x = 0.6
1 - (1.433)(1.2)(1.772)(1 - 0.8019) = 1 - 0.604 = 0.396.

Comment: Iʼve used the change of variables x = (t/30) + 0.6 and made use of the hint.
Thus the the actuarial present value of this insurance is: ($1 million)(0.396) = $396,000.
24.44. Mean = exp[µ + σ2/2]. Physicians: exp[7.8616 + 3.1311/2] = 12,421.

Surgeons: exp[8.0562 + 2.8601/2] = 13,177. Hospitals: exp[ 7.4799 + 3.1988/2] = 8772.
Hospitals have the smallest mean, while Surgeons have the largest mean. 1 + CV2 = E[X2 ] / E[X]2 .
CV = E[X2] / E[X]2 - 1 = exp[2µ + 2σ 2] / exp[µ + σ2 / 2]2 - 1 = exp[σ 2] - 1.
Physicians: exp[3.1311] - 1 = 4.680. Surgeons: exp[2.8601] - 1 = 4.057.
Hospitals: exp[3.1988] - 1 = 4.848.

Surgeons have the smallest CV, while Hospitals have the largest CV.
Comment: Taken from Table 4 of Sheldon Rosenbergʼs discussion of “On the Theory of Increased
Limits and Excess of Loss Pricing”, PCAS 1977. Based on data from Policy Year 1972.
24.45. C. The Normal distribution is symmetric. The Exponential and Pareto each have a mode of
zero. The single Parameter Pareto has support x > θ > 0 and mode = θ.
If it were a Single Parameter Pareto, then the biggest density (the mode) would be where the
support of the density starts.
This is not the case for the given histogram.
Thus none of these are similar to the histogram.
The Gamma for α > 1 has a mode > 0.
Comment: This histogram is from a Gamma Distribution with α = 4.
24.46. E. After truncating and shifting from below, one gets a Pareto Distribution with
α = 2.5 and θ = 47 + 10 = 57. Thus the nonzero payments are Pareto with α = 2.5 and θ = 57.
This has mean: θ/(α - 1) = 57/1.5 = 38, second moment: 2θ2 / {(α - 1)(α - 2)} = 8664, and variance:
8664 - 382 = 7220. The probability of a nonzero payment is the probability that a loss is greater
than the deductible of 10; for the original Pareto, S(10) = {47/(47+10)}2.5 = 0.617.
Thus the payments of the insurer can be thought of as an aggregate distribution, with Bernoulli
frequency with mean 0.617 and Pareto severity with α = 2.5 and θ = 57.
The variance of this aggregate distribution is: (Mean Freq.)(Var. Sev.) + (Mean Sev.)2 (Var. Freq.) =
(0.617)(7220) + (38)2 {(0.617)(1 - 0.617)} = 4796.
24.47. D. The sum of three identically distributed Exponential Distributions is a Gamma Distribution
with α = 3. The density of a Gamma Distribution with α = 3 and θ = 4 is:
f(x) = (x/4)3 e-x/4 / {x Γ(3)} = x2 e-x/4/128.
∞ x =∞
Prob[X > 20] = ∫ x2 e- x / 4 / 128 dx = -(x / 4)2 e- x / 4 / 2 - (x / 4) e- x / 4 - e - x / 4]
x = 20
20
= 18.5e-5 = 12.5%.
Alternately, Prob[X > 20] = 1 - Γ[3; 20/4] = 52 e-5/2 + 5 e-5 + e-5 = 18.5e-5 = 12.5%.
24.48. E. For an ordinary deductible, the average payment per non-zero payment is:
{E[X] - E[X ∧ d]}/ S(d) = {E[X] - E[X ∧ 15]}/ S(15) =
{(24/(2.5-1)) - (24/(2.5-1))(1 - {1 + 15/24}-1.5)} {1 + 15/24}2.5 = 26.
24.49. C. For a franchise deductible, one has data truncated from below, and the average payment
per non-zero payment is: {E[X] - E[X ∧ d] + S(d)d} / S(d) = e(d) + d =
(15+24)/(2.5-1) + 15 = 41.
Alternately, the numerator is the integral from 15 to infinity of x f(x), which is:
E[X] - E[X ∧ 15] + S(15)15 = 16 - 8.276 + (0.2971)(15) = 12.180.
The denominator is S(15) = 0.2971.
Thus the average payment per non-zero payment is: 12.180/0.2971 = 41.00.
Alternately, each non-zero payment is 15 more than with an ordinary deductible, thus using the
previous solution: 15 + 26 = 41.
Comment: For the Pareto Distribution e(d) = (d+θ)/(α-1).
24.50. A. For an ordinary deductible, the average payment per loss is: E[X] - E[X ∧ d] =
E[X] - E[X ∧ 10] = {(24/(2.5-1)) - (24/(2.5-1))(1 - {1 + 10/24}-1.5) = 16{1 + 10/24}-1.5 = 9.49.
24.51. D. For a franchise deductible, the average payment per loss is the same as that for an
ordinary deductible, except d is added to each nonzero payment.
Average payment per loss = E[X] - E[X ∧ 10] + (10)(Probability of nonzero payment) =
9.49 + (10){24/(24 + 10)}2.5 = 13.68. (13.68)(73) = 999.
24.52. C. As shown in the Appendix attached to the exam, for τ > 1 its mode is:
θ{(τ - 1)/τ}1/τ = (1000){0.5/1.5)1/1.5 = 481.
24.53. D. lnY = (X1 + X2 + ... + X10)/10, which is the average of 10 independent, identically
distributed Normals, which is therefore another Normal with the same mean of 7 and a standard
deviation of 1.6/ 10 . Therefore Y is LogNormal with µ = 7 and σ = 1.6/ 10 .
Using the formula for the mean of a LogNormal, E[Y] = exp[7 + (1.6/ 10 )2 /2] = e7.128 = 1246.
Comment: In general, the expected value of the Geometric average of n independent, identically
distributed LogNormals is: exp[µ + σ2/(2n)].
As n → ∞, this approaches the median of eµ.
Note that while the expected value of lnY is µ, it is not true that E[Y] = exp[E[lnY]] = eµ.
24.54. B. E[X] = (9)Γ[1 + 1/4] = (9)(0.90640) = 8.1576.

E[X2 ] = (92 )Γ[1 + 2/4] = (92 )(0.88623) = 71.7846.
Variance = 71.7846 - 8.15762 = 5.238.
24.55. mean = αθ. mode = (α−1)θ. standard deviation = θ α .

(mean - mode)/(standard deviation) = {αθ − (α−1)θ}/(θ α ) = 1/ α .
Skewness = 2/ α . Kurtosis = 3 + 6/α.
(skewness)(kurtosis + 3) / (10 kurtosis - 12 skewness2 - 18) =
(2/ α )(3 + 6/α + 3)/(30 + 60/α - 48/α - 18) = (2/ α )(6 + 6/α)/(12 -12/α) = 1/ α .
Comment: This is true in general for any of the members of the Pearson family of distributions.
See equation 6.6 in Volume I of Kendallʼs Advanced Theory of Statistics.
24.56. D. The survival function for the Weibull is: S(t) = exp[-(t/θ)τ].
τ = 2 and θ = 15 for smokers: S(t) = exp[-(t/15)2 ].
S(1) = 0.9956. S(2) = 0.9824. S(3) = 0.9608.
τ = 2 and θ = 20 for nonsmokers: S(t) = exp(-(t/20)2 ).
S(1) = 0.9975. S(2) = 0.9900. S(3) = 0.9778.
Assume for example a total of 400,000 people alive initially.
Then, one fourth or 100,000 are smokers, and three fourths or 300,000 are nonsmokers.
# Smoker # # Non-Smoker # # Total
of Prob. of Smokers Smoker Prob. of Non-Smoker Non-Smoker #
Years Survival Surviving Deaths Survival Surviving Deaths Deaths
0 1 100,000 1 300,000
1 0.9956 99,557 443 0.9975 299,251 749 1,193
2 0.9824 98,238 1,319 0.9900 297,015 2,236 3,555
3 0.9608 96,079 2,159 0.9778 293,325 3,690 5,849
For example, the number of smokers who survive through year one is 99,557, while the number
who survive through year two is 98,238.
Therefore, 99,557 - 98,238 = 1319 smokers are expected to die during year two.
For 400,000 insureds, the actuarial present value of the payments is:
(100,000) {1193/1.06 + 3555/1.062 + 5849/1.063 } = 920,034,223.
The actuarial present value of this insurance is: 920,034,223 / 400,000 = 2300.
Comment: Similar to 3, 5/00, Q.8. This question is less likely to be asked about on your exam,
since it involves actuarial present values and term insurance.
The survival function of the mixture is the mixture of the survival functions.
24.57. E. θ = Mean/α = 100/5 = 20. The sum is Gamma with α = (5)(3) = 15 and θ = 20.
Mode = θ(α - 1) = (20)(14) = 280.
24.58. B. Ln(X) and ln(Y) are two independent Normal variables.

Therefore, ln(Z) = ln(XY) = ln(X) + ln(Y) is a Normal variable. Therefore, Z is a LogNormal variable.
ln(Z) = ln(X) + ln(Y) has mean equal to the sum of the means of ln(X) and ln(Y) and variance equal to
the sum of the variances of ln(X) and ln(Y). Therefore, ln(Z) has parameters
µ = 3 + 4 = 7 and σ2 = 2 + 1.5 = 3.5, and therefore so does Z. For the LogNormal Distribution,
variance = exp(2µ + σ2) (exp( σ2) -1). Thus VAR[Z] = exp(14 + 3.5) (exp( 3.5) -1) =
(39.82 million)(32.12) = 1.279 billion. ⇒ Standard deviation of Z is 1.279 billion = 35,763.

Alternately, in general for X and Y two independent variables: E[XY] = E[X]E[Y] and E[X2 Y 2 ] =
E[X2 ]E[Y2 ]. VAR[XY] = E[(XY)2 ] - E[XY]2 = E[X2 Y 2 ] - {E[X]E[Y]}2 = E[X2 ]E[Y2 ] - E[X]2 E[Y]2 .
E[X] = exp(µ + .5 σ2) = e4 , E[Y] = e4.75 , E[X2 ] = exp(2µ + 2 σ2) = e10, E[Y2 ] = e11.
Therefore, Var[Z] = Var[XY] = (e10)(e11) - ( e4 )2 (e4.75)2 = e21 - e17.5 = 1.279 billion.
Therefore, the standard deviation of Z is: 1.279 billion = 35,763.
Comment: In general the product of independent LogNormal variables is a LogNormal with the sum
of the individual µ and σ2 .
24.59. D. If the payment is made before the assets have grown to 50, then there is ruin.
50 = 5(1.10)t, implies t = 24.16. The probability each person has died by time 24.16 is given by
the Weibull Distribution: 1 - exp[-(24.16/30)2 ] = 0.477.
The probability that all 7 persons are dead by time 24.16 is: 0.4777 = 0.56%.
24.60. D. 1 + CV2 = E[X2 ] / E[X]2 = exp[2µ + 2σ2] / exp[µ + σ2/2]2
= exp[2µ + 2σ2] / exp[2µ + σ2] = exp[(2µ + 2σ2) - (2µ + σ2)] = exp[σ2].
Thus, 1 + 5.752 = exp[σ2]. ⇒ σ = 1.88.

For a LogNormal Distribution, the probability that a value is greater than the mean is:
1 - F[exp[µ + σ2/2]] = 1 - Φ[(ln[exp[µ + σ2/2]] - µ) / σ] = 1 - Φ[σ/2] = 1 - Φ[0.94] = 17.36%.
24.61. B. For the Pareto, S(500) = {2000/(2000 + 500)}4 = 0.4096.
The mean number of losses is: mq = (50)(0.3) = 15.

The expected number of (non-zero) payments is: (0.4096)(15) = 6.144.
Comment: Similar to 4, 5/07, Q. 39. As discussed in “Mahlerʼs Guide to Frequency Distributions,”
the number of (non-zero) payments is Binomial with m = 5 and q = (0.4096)(0.3) = 0.12288.
Therefore, the variance of the number of payments is: (50)(0.12288)(1 - 0.12288) = 5.39.
24.62. E. The Median is where the distribution function is 0.5.
0.5 = Φ[(ln(x) - µ)/σ]. ⇒ 0 = (ln(x) - µ)/σ. ⇒ x = exp[µ].
As shown in Appendix A attached to the exam, Mode = exp[µ − σ2].
Therefore, Median/Mode = exp[µ]/exp[µ − σ2] = exp[σ2] = 5.4. ⇒ σ = 1.30.
24.63. B. For the Pareto Distribution, VaRp [X] = θ {(1-p)-1/α - 1}.
Q 0.25 = VaR0.25[X] = θ {(0.75)-1/α - 1} = (1000) {(0.75)-1/2 - 1} = 154.7.
Q 0.75 = VaR0.75[X] = θ {(0.25)-1/α - 1} = (1000) {(0.25)-1/2 - 1} = 1000.

Interquartile range = Q0.75 - Q0.25 = 1000 - 154.7 = 845.3.
Comment: For a Pareto Distribution with θ = 1000, here is the interquartile range as a function of α:
Interquartile Range
4000
3000
2000
1000
alpha
1 2 3 4 5
24.64. D. E[X] = exp[3 + 22 /2] = 148.41. E[X2 ] = exp[(2)(3) + (2)(22 )] = 1,202,604.

Var[X] = 1,202,604 - 148.412 = 1,180,579.
E[ X ] = E[X] = 148.41. Var[ X ] = Var[X]/20 = 1,180,579/20 = 59,029.
E[ X 2 ] = Var[ X ] + E[ X ]2 = 59,029 + 148.412 = 81,054.
24.65. (a) Since α > 1, the mode is: θ (α - 1) = (10)(4.2 - 1) = 32.

(b) Mean = αθ = (4.2)(10) = 42.
(c) Second Moment = α(α+1)θ2 = (4.2)(5.2)(102 ) = 2184.
(d) Third Moment = α(α+1)(α+2)θ3 = (4.2)(5.2)(6.2)(102 ) = 135,408.

Comment: For α ≤ 1, the mode would be zero.
Variance = 2184 - 422 = 420 = (4.2)(102 ) = αθ2.

24.66. For example, Prob[N = 1] = F(2) - F(1) = S(1) - S(2) =

exp[-(1/5.4)1.8] - exp[-(2/5.4)1.8] = 10.7%.
Prob[N = 10] = S(10) - S(11) = exp[-(10/5.4)1.8] - exp[-(11/5.4)1.8] = 2.1%.
Prob.
0.14
0.12
0.10
0.08
0.06
0.04
0.02
n
0 5 10 15
The probability of more than 15 runs is: Prob[16] + Prob[17] + Prob[18] + ... =
{S(16) - S(17)} - {S(17-S(18)} + {S(18) - S(19)} + ... = S(16) = exp[-(16/5.4)1.8] = 0.085%.
Comment: The Weibull with τ approximately 1.8 is a realistic model.
See “A Derivation of the Pythagorean Won-Loss Formula in Baseball”, by Steven J. Miller,
Chance Magazine 20 (2007), no. 1.
For θ = 5.4, the mean number of runs per game scored by one team is 4.3.
Here we have approximated a discrete distribution by a continuous distribution. This is the reverse
of the Method of Rounding, discussed in “Mahlerʼs Guide to Aggregate Distributions.”
24.67. C. E[X] = exp[5.149 + (0.7372 )/2] = 226.01.

E[X2 ] = exp[(2)(5.149) + (2)(0.7372 )] = 87,934.
Var[X] = 87,934 - 226.012 = 36,853. StdDev[X] = 36,853 = 192.
24.68. D. For the Weibull, VaRp [X] = θ {-ln(1-p)}1/τ .
Thus the 10th percentile is: (13) {-ln(0.9)}1/2 = 4.22.

24.69. For the Gamma Distribution: F(x) = Γ(α; x/θ).

α-1
(x / θ)i e- x / θ
.
i=0
F(600) = 1 - e-3 (1 + 3 + 32 /2 + 33 /6 + 34 /24) = 0.1847.

F(1200) = 1 - e-6 (1 + 6 + 62 /2 + 63 /6 + 64 /24) = 0.7149.
Comment: The given hint is Theorem A.1 in Loss Models.
24.70. (a) 0.5 = Φ[{ln(x) - 1.3355) / 0.2265]. ⇒ 0 = {ln(x) - 1.3355) / 0.2265.
⇒ x = e1.3355 = 3.802.
(b) mode = exp[µ - σ2] = exp[1.3355 - 0.22652 ] = 3.612.
(c) mean = exp[µ + σ2/2] = exp[1.3355 + 0.22652 /2] = 3.901.

Comment: For the LogNormal: mode < median < mean.
24.71. (a) For the Weibull, VaRp [X] = θ {-ln(1-p)}1/τ .
Thus the 50th percentile is: (26,710) {-ln(0.5)}1/1.053 = 18,859.

1/ τ
⎛ τ - 1⎞
(b) For the Weibull the mode is: θ for τ > 1, otherwise zero.
⎝ τ ⎠
Thus the mode is: (26,710) (0.053/1.053)1/1.053 = 1563.

Comment: The mean is: θ Γ[1+1/t] = 26,710 Γ[1 + 1/1.053]. Using a computer, mean = 26,169.
24.72. (a) Mean = αθ = (2)(50) = 100.
Second Moment = α(α+1)θ2 = (2)(3)(502 ) = 15,000.

Variance = 15,000 - 1002 = 5000.
Φ[2.326 ] = 0.99.
Thus the estimate of the 99th percentile is: 100 + 2.326 5000 = 264.5.
Alternately, Variance = αθ2 = (2)(502 ) = 5000. Proceed as before.
(b) The Chi-Square Distribution with 4 degrees of freedom is a Gamma Distribution with
α = 4/2 = 2, and θ = 2. From the Table, the 99th percentile is 13.277.
Thus the 99th percentile for a Gamma Distribution with α = 2 and θ = 50 is: (50/2)(13.277) = 331.9.
Comment: Part (b) is beyond what you are likely to be asked on your exam.
For a Gamma Distribution with α = 2, the Normal Approximation is not particularly good.
As alpha approaches infinity, the Gamma Distribution approaches a Normal Distribution.
24.73. D. S(36) = exp[-(36/20)1.5] = 8.9%.
24.74. A. Let m be the median. 0.5 = {θ/(θ+m)}1.8. ⇒ m = θ(21/1.8 - 1) = 0.4697 θ.

S(m 2/3) = S(0.3131 θ) = 1/1.31311.8 = 0.6124.
S(m 2) = S(0.9394 θ) = 1/1.93941.8 = 0.3035.
S(m 2/3) - S(m 2) = 0.6124 - 0.3035 = 30.9%.
Comment: There is no standard definition of middle class.
24.75. C. ln(W) has a Normal Distribution with parameters µ = 1 and σ = 2.

(ln(W) - 1)/2 has a Normal Distribution with parameters µ = 0 and σ = 1; in other words,
(ln(W) - 1)/2 follows a standard unit Normal Distribution, with distribution function Φ.
Therefore, Φ[(ln(W) - 1))/2] is uniform on [0, 1].
Comment: This forms the basis of simulating a LogNormal Distribution.
If X follows the distribution F, then F(X) is uniform on [0, 1].
24.76. C. Find where the density is a maximum. 0 = fʼ(x) = 2ax e-bx - bax2 e-bx.
⇒ 2 = bx. ⇒ x = 2/b.
Comment: A Gamma Distribution with α = 3 and θ = 1/b. For α > 1, Mode = θ(α - 1) = 2/b.
In order to integrate to one, the density must go to zero as x goes to infinity.
Therefore, the mode is never infinity.
24.77. B. 1. False. A random variable Y is lognormally distributed if ln(Y) is normally distributed. If

X is normally distributed, then exp(X) is lognormally distributed. 2. True. The LogNormal has a
lighter tail than the Pareto. One way to see this is that all the moments of the LogNormal exist, while
the moments of the Pareto only exist for n > α. Another is that the mean residual life of the
LogNormal goes to infinity less than linearly, while that of the Pareto increases linearly.
3. False. The mean of a Pareto only exists for α > 1.
24.78. B. 1. Assume one is summing n independent identically distributed random variables.

According to the Central Limit Theorem as n approaches ∞, this sum approaches a Normal
Distribution. Precisely how large n must be in order for the Normal Approximation to be reasonable
depends on the shape of the distribution and the definition of what is reasonable. Precisely how
large n must be would depend on details about the distribution, however a large skewness would
require a larger n. Statement 1 is not true. 2. True. 3. False. A random variable X is lognormally
distributed if ln(X) = Z, where Z is normally distributed.
24.79. C. This is Gamma Distribution with α = 2 and θ = β. k = 1/Γ(α) = 1/Γ(2) = 1/1! = 1.
24.80. B. 1. False. The Normal is symmetric while the LogNormal is skewed to the right.
2. True. The LogNormal has a lighter tail than the Pareto. One way to see this is that the mean
residual life of the LogNormal goes to infinity less than linearly, while that of the Pareto increases
linearly. 3. False. This is an important application of the Negative Binomial Distribution. The Negative
Binomial is the mixed distribution for the Gamma-Poisson process.
24.81. A. 1. T. 2. F. While the sum of independent Normal distributions is also Normal, Statement
#2 is false for two reasons. First, since each unit Normal has variance of 1, the sum of n independent
unit Normals has a variance of n and standard deviation of n . Second, since each unit Normal has
mean of 0, the sum of n independent unit Normals has a mean of 0. 3. F. If ln(Y) is normally
distributed, then Y has a LogNormal Distribution. Equivalently, exp(X) is LogNormally distributed if
X is Normally distributed.
24.82. C. The variance (for a single claim) of the Pareto Distribution is:
θ2α / { (α−2)(α−1)2 } = 92,160,000. The sum of 100 independent claims has 100 times this variance
or 9,216,000,000. The standard deviation is therefore 9,216,000,000 = 96,000.

The mean of a single claim from the Pareto is θ/(α-1) = 4800; for 100 claims the mean is 480,000.
Thus standardizing the variable 600,000 corresponds to:
(600,000 - 480,000)/96,000 = 1.25. Thus the chance of the sum being greater than 600,000 is
approximately: 1 - Φ(1.25) = 1 - 0.8944 = 0.1056.
Comment: The second moment is: 2θ2 / { (α−2)(α-1) } = 115,200,000.
Thus the variance = 115,200,000 - 48002 = 92,160,000.
24.83. A. If G(x) is the distribution function truncated from below at d, then

G(x) = {F(x)-F(d)}/S(d) for x > d. In this case, G(x) = {F(x)-F(100)}/ S(100), for x > 100.
(100 / 200)2 - {100 / (100 + x)}2
G(x) = = 1 - {200/(100+x)}2 , for x > 100.
(100 / 200)2
24.84. D. For Truncation from above at a limit L, FY(x) = FX(x) / FX(L), for x ≤ L.
Thus in this case with L = 50000, A = 1/ FX(50000 ) = 1/ Φ[{ln(50000) - µ} / σ] =
1/ Φ[{10.82 - 7} / 10 ] = 1/ Φ[1.208] = 1/.8865 = 1.128.
24.85. C. 1. True. 2. False. X has a Gamma Distribution with parameters 2α and θ.

3. True. The Standard Normal has a mean of 0 and a variance of 1. The means add and variances
add. Thus X has mean of 0 + 0 = 0 and variance 1 + 1 = 2.
24.86. C. For the Gamma Distribution the mean is αθ and the variance is αθ2 . Thus we have per
claim a mean of 5000 and a variance of 5,000,000. For the sum of 100 independent claims, one has
100 times the mean and variance. Thus the sum has a mean of 500,000 and a variance of 500
million. Thus the sum has a standard deviation of 22,361. $525,000 exceeds this mean by
25000 / 22,361 = 1.118 standard deviations. Thus the chance of the sum exceeding 525,000 is
approximately: 1 - Φ(1.12) = 1 - 0.8686 = 0.1314.
Comment: Alternately one can use the fact that the sum of 100 identical independent Gamma
Distributions has a Gamma Distribution with parameters 100α and θ.
Note that when applying the Normal Approximation to a continuous distribution such as the Gamma,
there is no need for the “continuity correction” that is applied in the case of a discrete frequency
distribution.
24.87. D. For this LogNormal Distribution the moments are E[X] = exp(µ + 0.5σ2) = e0.5 = 1.649.
E[X2] = exp(2µ + 2σ2) = e2. Thus the standard deviation is (e2 - e)0.5 = 2.161. Thus the interval
within one standard deviation of the mean is 1.649 ± 2.161. But for the LogNormal distribution x > 0,
so we are interested in the probability of x < (1.649 + 2.161) = 3.810.
F(3.810) = Φ((ln(3.810) - µ ) / σ) = Φ(1.34) = 0.9099.
Comment: Note that this differs from the probability of being within one standard deviation of the
mean for the normal distribution, which is .682. Also note that the parameter σ is not the standard
deviation of the LogNormal Distribution. Finally, note that the LogNormal Distribution is not
symmetric, and thus one has to compute the distribution function separately at each of the two
points. F(1.649 - 2.161) = 0 in this case, since the support of the LogNormal Distribution is x >0, so
that F(x) = 0 for x ≤0. If instead we were asked the probability of being within a half of a standard
deviation of the mean, this would be:
F(1.649 + 1.080) - F(1.649 - 1.080) = Φ(1.00) - Φ(-0.56) = 0.8413 - (1 - 0.7123) = 0.5536.
24.88. C. When α > 2 and therefore the first two moments exist for the Pareto:
E[X2 ] / E[X]2 = {2θ2 /(α-1)(α-2)} / {θ/(α-1)}2 = 2(α-1) / (α-2) > 2. Thus we check the ratio
E[X2 ] / E[X]2 , to see whether it is greater than 2. For X1 this ratio is 1.5. For X2 this ratio is 2.
For X3 this ratio is: (1.5) / 0.52 = 6.

Thus since this ratio is only greater than 2 for X3 , we conclude only for X3 could one “use a Pareto.”
Comment: This is an initial test that is useful in some real world applications.
Note that 1 + CV2 = E[X2 ] / E[X]2 . Thus E[X2 ] / E[X]2 > 2 is equivalent to CV2 > 1.
Thus this fact is equivalent to the fact that for the Pareto distribution, the coefficient of variation (when it
exists) is always greater than 1; the standard deviation is always greater than the mean.
24.89. B. The original distribution function: F(x) = 1 - e-0.05x.

The data truncated from below at 10, has distribution function:
{F(x) - F(10)} / S(10) = {e-0.5 - e-0.05x} / e-0.5 = 1 - e0.5 - 0.05x , x > 10
For x = 25, this has a value of 1 - e0.5 -1.25 = 0.53.
Comment: The Weibull Distribution for a shape parameter of 1 is an Exponential.
24.90. E. X > 1000 corresponds to Y > 6.908. Converting to the Standard Normal distribution:
(6.908 - 6.503) / 1.5 = 0.270. Using the Standard Normal Table, the chance of being less than or
equal to 0.27 is 0.6064, so the chance of being more is: 1 - 0.6064 = 0.3936.
Comment: S(1000) for a LogNormal Distribution, with µ = 6.503 and σ = 1.500.
24.91. B. The sum of 10 independent identically distributed Gammas is a Gamma, with the same
scale parameter and 10 times the original shape parameter. Thus the new Gamma has shape
parameter, α = (0.1)(10) = 1, while θ = 1. A Gamma with a shape parameter of 1 is an exponential
distribution. F(x) = 1 - exp(-x). 1 - F(1) = exp(-1) = 36.8%.
24.92. C. If the insurer makes a payment, the probability that an insurerʼs payment is less than or
equal to 25, is in terms of the original Weibull Distribution:
{F(35) - F(10)} / S(10) = {(1 - e-1.75) - (1 - e-0.5)} / e-0.5 = 0.713.
24.93. D. The mean of the Weibull for τ = 0.5 is: θ Γ(1+ 1 /.5) = θ Γ(3) = θ (2!) = 2θ.
The median of the Weibull for τ = 0.5 is such that 0.5 = F(m) = 1 - exp(-(m/θ)0.5).
Thus, -(m/θ)0.5 = ln0.5 ⇒ m = θ (-ln.5)2 = 0.4805 θ. ⇒ mean / median = 2 / 0.4805 = 4.162.

Comment: The ratio is independent of θ, since both the median and the mean are multiplied by the
scale parameter θ.
24.94. D. 1. True. 2. True. 3. False.

Comment: The sum of independent Normal Distributions is a Normal. The product of independent
LogNormal Distributions is a LogNormal.
24.95. D. The sum of 16 independent risks each with a Gamma Distribution is again a Gamma
Distribution, with the same scale parameter θ and new shape parameter 16α.
In this case, the aggregate losses have a Gamma Distribution with θ = 250 and α = (16)(1) = 16.
Thus the tail probability at 6000 is:
1 - F(6000) = 1 - Γ(α; 6000/θ) = 1 - Γ(16; 6000/ 250) = 1 - Γ(16; 24).
Comment: 1 - Γ(16; 24) = 0.0344.
24.96. B. The aggregate losses have a Gamma Distribution with θ = 250 and α = 16. The mean is
αθ = 4000 and variance is αθ2 = 1,000,000. Alternately, each risk has a mean of 1/0.004 = 250, and
a variance of 1/0.0042 = 62,500. The means and variances add for the sum of independent risks.
Thus the aggregate losses have a mean of (16)(250) = 4000
and a variance of: (16)(62,500) = 1,000,000.
In any case, the standard deviation is 1000 and the survival function at 6000 is approximately:
1- Φ[(6000-4000)/1000] = 1 - Φ(2) = 1 - 0.9773 = 0.0227.
Comment: Note that in approximating the continuous Gamma Distribution no continuity correction is
required. Note that the result from using the Normal Approximation here is less than the exact result
of .0344. Due to the skewness of the Gamma Distribution, the Normal Approximation
underestimated the tail probability; the Gamma has a heavier tail than the Normal.
24.97. E. Var[Z] = Var[0.25X] = 0.252 Var[X].

Covar[X,Z] = Covar[X, 0.25X] = 0.25Covar[X, X] = 0.25 Var[X].
Therefore, Corr[X, Z] = 0.25 Var[X] / Var[X] 0.252 Var[X] = 1.
Comments: Two variables that are proportional with a positive proportionality constant are perfectly
correlated and have a correlation of one.
24.98. C. The Gamma Distribution has skewness twice its coefficient of variation.
Thus the skewness is (2)(1) = 2.
Comment: The given Gamma is an Exponential with α = 1, CV = 1, and skewness = 2.
The skewness would be calculated as follows.
For the Gamma, E[X] = αθ, E[X2 ] = α(α+1)θ2 , E[X3 ] = α(α+1)(α+2)θ3 .
The variance is: E[X2 ] - E[X]2 = α(α+1)θ2 - α2θ2 = αθ2 .

α / θ2
Thus the CV = =1/ α.
α/θ
The third central moment = E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 =

α(α+1)(α+2)θ3 - 3{α(α+1)θ2 }αθ + 2α3θ3 = 2αθ3.
Thus the skewness = 2αθ3 / {αθ2 }1.5 = 2 / α = twice the CV.

24.99. E. The mean of the Pareto is θ/(α−1), for α > 1. The mode is zero, since f(x) =
(αθα)(θ + x)−(α + 1), x > 0, which decreases as x increases, so that the density is largest at zero.
The median is where F(x) = 0.5. Therefore, 0.5 = 1 - (θ/(θ+x))α, thus median = θ(21/α - 1).
Mean θ/(α−1)
Median θ(21/α - 1)
Mode 0
Thus the mode is smallest. On the exam, just pick values for α and θ and see what happens.
For example, for α = 3 and θ = 10, the mean is 10/(3-1) = 5 while the median is smaller at
10(21/3 -1) = 2.6. One can show that for the Pareto the mean is greater than the median, since
1/(α-1) > 21/α - 1, for α > 1. This inequality is equivalent to:
1 + 1/ (α-1) = α/(α-1) > 21/α. ⇔ (α-1)/α = 1 − 1/α < 2-1/α.
Let β = 1/α, then this inequality is equivalent to: 1 < β + 1/2β, for 1 > β > 0.
At β = 0, the right hand expression is 1; its derivative is: 1 + ln(1/2)/2β > 0.

Thus the right hand expression is indeed greater than 1 for β > 0.
This in turn implies that the mean is greater than the median.
Comment: For a continuous distribution with positive skewness typically: mean > median > mode
(alphabetical order.) Since the mean is an average of the claim sizes, it is more heavily impacted by
the rare large claim than the median; therefore, the Pareto with a heavy tail, has its mean greater than
it median.
24.100. D. S(t) = exp[-(t/ 2 )2 ] = exp[-t2 /2].

Prob[2 < t < 3 | 1 < t < 4] = {S(2) - S(3)} / {S(1) - S(4)} = (e-2 - e-4.5) / (e-0.5 - e-8) = 0.205.
24.101. C. The density of a Pareto Distribution is f(x) = (αθα)(θ + x)-(α + 1), 0 < x < ∞.
Thus this density is a Pareto Distribution with α = 3 and θ = 1.
It has mean: θ/(α-1) = 1/(3 - 1) = 1/2.
24.102. C. The survival function for the Weibull is: S(t) = exp(-(t/θ)τ).
τ = 2 and θ = 1.5 for smokers: S(t) = exp(-(t/1.5)2 ).
S(1) = e-0.4444 = .6412. S(2) = e-1.7778 = .1690.
τ = 2 and θ = 2.0 for nonsmokers: S(t) = exp(-(t/2)2 ).
S(1) = e-0.25 = 0.7788. S(2) = e-1 = 0.3679.
Assume for example a total of 30,000 people alive initially.
Then, one third or 10,000 are smokers and two thirds or 20,000 are nonsmokers.
# Smoker # # Non-Smoker # # Total
of Prob. of Smokers Smoker Prob. of Non-Smoker Non-Smoker #
Years Survival Surviving Deaths Survival Surviving Deaths Deaths
0 1 10,000 1 20,000
1 0.6412 6,412 3,588 0.7788 15,576 4,424 8,012
2 0.1690 1,690 4,722 0.3679 7,358 8,218 12,940
For example, the number of nonsmokers who survive through year one is 15,576, while the number
who survive through year two is 7358.
Therefore, 15,576 - 7,358 = 8,218 nonsmokers are expected to die during year two.
For 30,000 insureds, the actuarial present value of the payments is:
(100,000) {8012/1.05 + 12940/1.052 } = 1,936,743,764.
Therefore, the actuarial present value of this insurance is: 1,936,743,764 / 30,000 = 64,558.
Comment: Overall: S(1) = (1/3)(0.6412) + (2/3)(0.7788) = 0.7329 = (6412 + 15576)/30000.
24.103. B. Assuming the payments are made for a period of time t, the present value is an annuity
certain: 20000(1 - e-δt)/δ = 20000(1 - e-.05t)/.05.
For the Gamma Distribution, f(t) = θ−α tα−1 e-t/θ / Γ(α) = 1-2 t2-1 e-t/1 / Γ(2) = te-t.
The actuarial present value is:
∫ f(t) (Present value | ∫

t) dt = 20,000 f(t) (1 - e - 0.05t ) / 0.05 dt = 400,000{1 - ∫ f(t) e- 0.05t dt }
∞ ∞
= 400,000{1 - ∫0 t e- t e- .05t dt } = 400,000{1 - ∫0 t e- 1.05t dt } = 400,000(1 - 1.05-2) = 37,188.
Comment: I have used the fact about “Gamma type integrals”:
∞
∫0 tα - 1 e- t / θ dt = Γ(α)θα, for α = 2 and θ = 1/1.05.

For a length of disability with a Gamma Distribution and rate of payment of B, the actuarial present
value is: B(1 - (1 + θδ)−α)/δ. In this case, B = 2000, α = 2, θ = 1, and δ = 0.05, and the actuarial
present value is: (20000)(1 - {1 + (1)(0.05)}-2)/0.05 = 37,188.
The mean length of disability is: αθ = 2. 20000 times an annuity certain of length 2 has present
value: 20000(1 - e-(0.05)(2))/.05 = 38,065 ≠ 37,188.
∞ x =∞
24.104. E. S(1) = ∫1 x e- x dx = -xe- x - e x]
-
x =1
= 2e-1 = 0.7358.
After the introduction of the deductible, expect to pay: (100)(0.7358) = 74 claims.

Comment: A Gamma Distribution with α = 2 and θ = 1. F(1) = Γ[2; 1/1] = 1 - e-1 - 1e-1 = 0.2642.
24.105. C. For ordinary deductible d, expected payment per payment = (E[X] - E[X ∧ d]) / S(d) =
{θ / (α - 1)} - {θ / (α -1)} {1 - (θ / (θ + d))α - 1}
= {θ/(α-1)} / {θ/(θ+d)} = (θ+d)/(α-1) = θ + d.
{θ / (θ + d)}α
For Group R, the expected payment per payment is: 2000 + 500 = 2500.
For Group S, the expected payment per payment with an ordinary deductible of 200 would be:
3000 + 200 = 3200. However, with a franchise deductible each payment is 200 more, for an
average payment per payment of: 3200 + 200 = 3400. 3400 - 2500 = 900.
Comment: For an ordinary deductible d, the expected payment per payment is e(d).
For the Pareto distribution, e(x) = (θ + x)/(α -1). In this case, e(d) = (θ + d)/(2 - 1) = θ + d.
24.106. B. For a Gamma Distribution, CV = αθ2 / (αθ) = 1/ α . CV = 1 ⇒ α =1.

Therefore, this is an Exponential Distribution, with θ = 1 million.
E[(X - 2 million)+] = E[X] - E[X ∧ 2 million] = 1 million - (1 million)(1 - exp[-(2/1)]) = $135,335.
Comment: One can also compute E[(X - 2 million)+] by integrating the survival function from
2 million to infinity, or by remembering that for an Exponential Distribution, R(x) = e-x/θ.

The fact that this is an aggregate distribution does not change the mathematics of using an
Exponential Distribution. It would be uncommon for aggregate losses to follow an Exponential.
24.107. D. Expected cost per loss for policy Q is:

E[X ∧ 3000] = {2000/(3 - 1)}{1 - (2000/(2000 + 3000))2 } = 840.
Expected cost per loss for policy R is: E[X] - E[X ∧ d] =
2000/(3 - 1) - {2000/(3 - 1)} {1 - (2000/(2000 + d))2 } = 1000{2000/(2000 + d)}2 .
Set the two expected costs equal:
1000(2000/(2000 + d))2 = 840. ⇒ (2000 + d) = 2000 1000 / 840 = 2182. ⇒ d = 182.
24.108. D. The sum of two independent, identically distributed Exponentials is a Gamma

Distribution, with α = 2 and the same θ. Thus the time the probe continues to transmit has a Gamma
Distribution with α = 2 and θ = 1. This Gamma has density f(t) = te-t.
∞ ∞
S(t) = ∫1 f(t) dt = ∫1 t e- t dt = e-t + te-t. S(3) = 4e-3 = 0.199.
Alternately, independent, identically distributed exponential interarrival times implies a Poisson
Process. Therefore, if we had an infinite number of batteries, over three years the number of failures
is Poisson with mean 3. We are interested in the probability of fewer than 2 failures, which is:
e-3 + 3e-3 = 4e-3 = 0.199.
Comment: One could work out the convolution from first principles by doing the appropriate integral:
t t
f*f(t) = ∫0 f(x) f(t - x) dx . Alternately, the distribution function is ∫0 F(x) f(t - x) dx .
24.109. D. E[X] = θ/(α - 1) = 333.333. E[X2 ] = 2θ2/{(α - 1)(α - 2)} = 333,333.

Var(X) = 333,333 - 333.3332 = 222,222.
E[X3 ] = 6θ3/{(α - 1)(α - 2)(α - 3)} = 1,000,000,000.
Third Central Moment is: E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 =
1,000,000,000 - (3)(333.333)(333,333) + 2(333.3333 ) = 740,741,185.
Skewness = 740,741,185/222,2221.5 = 7.071.
Alternately, for the Pareto, skewness is: 2{(α+1)/(α-3)} (α - 2) / α = {(2)(5)/1} 2 / 4 = 7.071.
Comment: Since the skewness does not depend on the scale parameter, one could take θ = 1.
E[X] = 1/3. E[X2 ] = 1/3. Var(X) = 2/9. E[X3 ] = 1.
Skewness = {1 - 3(1/3)(1/3) + 2(1/3)3 } / (2/9)1.5 = 7.071.
24.110. A. The sum of 5 independent, identically distributed Gamma Distributions is another

Gamma, with the same θ = 100, and α = (5)(2) = 10. Mode = θ(α - 1) = (100)(10 - 1) = 900.
24.111. C. For the Weibull, S(200) = exp[-(200/1000).3] = 0.5395.

The mean number of losses is: rβ = (3)(5) = 15.
The expected number of (non-zero) payments is: (0.5395)(15) = 8.09.
24.112. B. The number of (non-zero) payments is Negative Binomial with r = 3

and β = (0.5395)(5) = 2.6975.
Therefore, the variance of the number of payments is: (3)(2.6975)(1 + 2.6975) = 29.92.
Comment: See “Mahlerʼs Guide to Frequency Distributions.”
2016-C-2, Loss Distributions, §25 Other 2 Parameter Dists. HCM 10/21/15, Page 335
Section 25, Other Two Parameter Distributions124
In Loss Models, there are other Distributions with 2 parameters, which are much less important to
know than the common distributions discussed previously.
The Inverse Gamma is the most important of these other distributions.
With the exception of the Inverse Gaussian, all of the remaining distributions have scale parameter θ,
one shape parameter, and are heavy-tailed, with only some moments that exist.125
Just as with the Pareto Distribution discussed previously, the LogLogistic, Inverse Pareto,
ParaLogistic and Inverse ParaLogistic, are special cases of the 4 parameter Transformed Beta
Distribution, discussed subsequently.
The Inverse Gamma and Inverse Weibull are special cases of the 3 parameter Inverse
Transformed Gamma, discussed subsequently.
LogLogistic:126
The LogLogistic Distribution is a special case of a Burr Distribution, for α = 1.

It has scale parameter θ and shape parameter γ.
(x / θ)γ 1 γ xγ − 1
F(x) = γ = γ. f(x) = γ .
1 + (x / θ) 1 + (θ / x) θ (1 + (x / θ) γ)2
The nth moment only exists for n < γ.
Inverse Pareto:127
If X follows a Pareto with parameters α and 1 then θ/X follows an Inverse Pareto with
parameters τ = α and θ. The Inverse Pareto is so heavy-tailed that it has no (finite) mean nor higher
moments. It has scale parameter θ and shape parameter τ.
⎛ x ⎞τ τθ x τ − 1
F(x) = ⎜ ⎟ = (1 + θ/x)−τ. f(x) = .
⎝x + θ⎠ (x + θ ) τ + 1
124
See the previous section for what I believe are more commonly used two parameter distributions.
125
While their names sound similar, the Inverse Gaussian and Inverse Gamma are completely different distributions.
126
127
Inverse Gaussian:
The Inverse Gaussian distribution has a tail behavior not very different than a Gamma Distribution.
It has two parameters µ and θ, neither of which is exactly either a shape or a scale parameter.
⎛x ⎞2
θ ⎜ - 1⎟
f(x) =
θ
exp - [
⎝µ
2x
⎠
] =
θ
exp -[ θx
2µ 2
+
θ
µ
-
θ
2x
.
]
2π x 1 .5 2π x 1.5
⎛x ⎞ ⎛x ⎞
[
F(x) = Φ ⎜ − 1⎟
⎝µ ⎠
θ
x
] [
+ e2θ / µ Φ − ⎜ + 1⎟
⎝µ ⎠
θ
x
]
⎛ x 1⎞ ⎛ x 1⎞
=Φ ⎜ [
⎝ µ
− ⎟
x⎠
θ ] [
+ e2θ / µ Φ − ⎜
⎝ µ
+ ⎟
x⎠
θ ] .
Mean = µ
µ
Variance = µ3 / θ Coefficient of Variation =
θ
µ
Skewness = 3 = 3CV. Kurtosis = 3 + 15µ/θ = 3 + 15CV2 .
θ
Thus the skewness for the Inverse Gaussian distribution is always three times the coefficient of
variation.128 Thus the Inverse Gaussian is likely to fit well only to data sets for which this is true.
Multiplying a variable that follows an Inverse Gaussian by a constant gives another Inverse
Gaussian.129
The Inverse Gaussian is infinitely divisible.130 If X follows an Inverse Gaussian, then given any
n > 1 we can find a random variable Y which also follows an Inverse Gaussian, such that
adding up n independent version of Y gives X.
128
For the Gamma, the skewness was twice the coefficient of variation.
129
Thus the Inverse Gaussian is preserved under Uniform Inflation. It is a “scale distribution”.
See Loss Models, Definition 4.2.
130
The Gamma Distribution is also infinitely divisible.
If X follows an Inverse Gaussian with parameters µ1 and θ1, and Y follows an Inverse Gaussian
with parameters µ2 and θ2, and if µ12 / θ1 = µ22 / θ2, then for X and Y independent, X+Y also
follows an Inverse Gaussian, but with parameters µ3 = µ1 + µ2 and θ3 such that µ32 / θ3 = µ12 / θ1 =
µ22 / θ2. 131
Thus an Inverse Gaussian can be thought of as a sum of other independent identically

distributed Inverse Gaussian distributions. Therefore, keeping β = µ2 / θ fixed, as µ gets larger
and larger, the Inverse Gaussian is the sum of more and more identical copies of an Inverse
Gaussian with parameters 1 and 1/β. Thus keeping µ2 / θ fixed and letting µ go to infinity, an
Inverse Gaussian approaches a Normal distribution.
It can be shown that if X follows an Inverse Gaussian with parameters µ and θ, and if
Y = θ(X-µ)2 / (µ2 X), then Y follows a Chi-Square Distribution with one degree of freedom.132
In other words, Y has the same distribution as the square of a unit Normal Distribution.
If X(t) is a Brownian Motion with positive drift ν and volatility σ, then the time that the process
first reaches α > 0 follows an Inverse Gaussian Distribution with µ = α/ν and θ = α2/σ2.
Density of the Inverse Gaussian Distribution:
Exercise: Determine the derivative with respect to x of: Φ[ θ (x1/2/µ - x-1/2)].

[Solution: Φ[y] is the cumulative distribution function of a Standard Normal.
Its derivative with respect to y is the density of a Standard Normal.
d Φ[y] / dy = φ(y) = (1/ 2 π ) exp[-y2 /2].
Therefore, d Φ[y] / dx = dy/dx (1/ 2 π ) exp[-y2 /2].
d Φ[ θ (x1/2/µ - x-1/2)] / dx =
θ {(1/2)x-1/2/µ + (1/2) x-3/2}(1/ 2 π ) exp[-θ(x1/2/µ - x-1/2)2 /2] =
(1/2) (1/ 2 π ) θ (x-1/2/µ + x-3/2) exp[-(θ/2)(x/µ2 - 2/µ + 1/ x)] .]
131
One can verify that the means and the variances add, as they must for the sum of any two independent variables.
See Insurance Risk Models by Panjer & Willmot, page 115. This a somewhat more complicated version of the similar
result for a Gamma. The sum of two independent Gammas with the same scale parameters, is a Gamma with the same
scale parameter and the sum of the shape parameters.
132
See for example page 116 of Risk Models by Panjer and Willmot or page 412 of Volume 1 of Kendallʼs Advanced
Theory of Statistics, by Stuart and Ord.
Exercise: Determine the derivative with respect to x of: e2θ/µ Φ[- θ (x1/2/µ +x-1/2)] .
[Solution: d Φ[y] / dx = dy/dx (1/ 2 π ) exp[-y2 /2].
d e2θ/µ Φ[- θ (x1/2/µ + x-1/2)] / dx =
e2θ/µ θ {(-1/2)x-1/2/µ + (1/2) x-3/2}(1/ 2 π ) exp[-θ(x1/2/µ + x-1/2)2 /2] =
(1/2) (1/ 2 π ) θ (-x-1/2/µ + x-3/2) e2θ/µ exp[-(θ/2)(x/µ2 + 2/µ + 1/ x)/2] =
(1/2) (1/ 2 π ) θ (-x-1/2/µ + x-3/2) exp[-(θ/2)(x/µ2 - 2/µ + 1/ x)] .]
Exercise: Given the Distribution Function

F(x) = Φ[ θ (x1/2/µ - x-1/2)] + e2θ/µ Φ[- θ (x1/2/µ +x-1/2)], determine the density function f(x).
[Solution: dF/ dx = (1/2) (1/ 2 π ) θ (x-1/2/µ + x-3/2) exp[-(θ/2)(x/µ2 - 2/µ + 1/ x))] +

(1/2) (1/ 2 π ) θ (-x-1/2/µ + x-3/2) exp[-(θ/2)(x/µ2 - 2/µ + 1/ x)] =
θx θ θ
(1/ 2 π ) θ (x-3/2) exp[-(θ/2)(x/µ2 - 2/µ + 1/ x)] =

θ
exp -[ 2µ2
+
µ
-
2x].]
2π x1.5
Thus we have been able to confirm that the stated Distribution Function of an Inverse Gaussian
Distribution does in fact correspond to the stated density function of an Inverse Gaussian.
Moments of the Inverse Gaussian Distribution:
Exercise: For an Inverse Gaussian, set up the integral in order to compute the nth moment.
∞ ∞
θ n - 1.5 θx θ θ
[Solution: E[Xn ] = ∫0 f(x) xn dx =
∫0 2π
x exp[- 2 +
2µ µ
-
2x
] dx
∞
θ θx θ
= eθ/µ
2π ∫0 xn - 1.5 exp[-
2µ 2
-
2x
] dx . ]
This integral is in a form that gives a modified Bessel Function of the third kind.133
∞
∫0 x ν -1 exp [-β xp - γ x - p ] dx = (2/p) (γ/β)ν/2p Kν/p (2 βγ ).
133
See Insurance Risk Models by Panjer and Willmot or formula 3.478.4 in Table of Integrals, Series, and Products,
by Gradshetyen and Ryzhik.
Therefore, using the above formula with p = 1, ν = n - 1/2, β = θ / (2µ2), and γ = θ/2,
θ θ 2θ n
E[Xn ] = eθ/µ (2) (µ2)(n-0.5)/2 K(n - 1/2)/1 (2 θ/2) = eθ/µ µ Kn - 1/2 (θ/µ). 134
2π 2µ 2 µπ
Exercise: What is the 4th moment of an Inverse Gaussian with parameters µ = 5 and θ = 7?
2θ 4 2.8 4
[Solution: E[X4 ] = eθ/µ µ K3.5(θ/µ) = e1.4 5 K3.5(1.4) =
µπ π
(4.0552)(0.94407)(625)(4.80757) = 11503.3.
Comment: Where one has to look up in a table or use a software package to compute the value of
K3.5(1.4) = 4.80757, the modified Bessel Function of the third kind.135]
ParaLogistic:136
The ParaLogistic Distribution is a special case of the Burr Distribution, with its two shape parameters
equal, for α = γ. This unlike most of the other named special cases, in which a parameter is set equal
to one. This general idea of setting two shape parameters equal can be used to produce additional
special cases. The only other time this is done in Loss Models, is to produce the Inverse
ParaLogistic Distribution.
The ParaLogistic Distribution has scale parameter θ and shape parameter α.
⎛ 1 ⎞α
F(x) = 1 - ⎜ α⎟ .
⎝ 1 + (x / θ) ⎠
α 2 xα − 1
f(x) = .
θ α {1 + (x / θ)α}α + 1
The nth moment only exists for n < α2. This follows from the fact that moments of the
Transformed Beta Distribution only exist for n < αγ, with in this case α = γ.
134
The formula for the cumulants of the Inverse Gaussian Distribution is more tractable than that for the moments.
The nth cumulant for n ≥ 2 is: µ2n-1 (2n-3)! / {θn-1(n-2)! 2n-2}.
See for example, Kendallʼs Advanced Theory of Statistics, Volume I, by Stuart and Ord. Using this formula for the
cumulants, one can obtain the formula for the skewness and the kurtosis listed above.
135
I used Mathematica.
136
Inverse ParaLogistic:137
The Inverse ParaLogistic Distribution is a special case of the Inverse Burr Distribution, with its two
shape parameters equal, for τ = γ.
If X follows a ParaLogistic with parameters α and 1, then θ/X follows an Inverse ParaLogistic with
parameters τ = α and θ.
The Inverse ParaLogistic Distribution has scale parameter θ and shape parameter τ.
⎧ (x / θ)τ ⎫τ
F(x) = ⎨ ⎬ = {1 + (θ /x)τ}−τ.
⎩1 + (x / θ)τ ⎭
τ2 (x / θ)τ2
f(x) = .
x {1 + (x / θ)τ }τ +1
The nth moment only exists for n < τ.
Inverse Weibull:138
If X follows an Weibull Distribution with parameters 1 and τ, then θ/X follows an Inverse Weibull with
parameters θ and τ. The Inverse Weibull is heavier-tailed than the Weibull; the moments of the
Inverse Weibull only exist for n < τ, while the Weibull has all of its (positive) moments exist. The
Inverse Weibull has scale parameter θ and shape parameter τ. The Inverse Weibull Distribution is
a special case of the Inverse Transformed Gamma Distribution with α = 1.
⎛ θ ⎞τ
[ ]
F(x) = exp -⎜ ⎟ .
⎝ x⎠
τ θτ ⎛ θ ⎞τ
f(x) =
xτ + 1
[ ]
exp - ⎜ ⎟ .
⎝ x⎠
137
138
See Appendix A in Loss Models.
Inverse Gamma:139 140
If X follows a Gamma Distribution with parameters α and 1, then θ/x follows an Inverse Gamma
Distribution with parameters α and θ. Thus this distribution is no more complicated conceptually than
the Gamma Distribution. α is the shape parameter and θ is the scale parameter. The Inverse
Gamma Distribution is a special case of the Inverse Transformed Gamma Distribution with τ = 1.
θα e - θ / x .
The Distribution Function is : F(x) = 1 - Γ(α ; θ/x), and the density function is: f(x) = α + 1
x Γ[α]
If X follows an Inverse Gamma Distribution with parameters α and θ, then 1/X has
distribution function Γ(α ; θx), which is a Gamma Distribution with parameters α and 1/θ.
Note that the density has a negative power of x times an exponential of 1/x.
This is how one recognizes an Inverse Gamma density.141
The scale parameter θ is divided by x in the exponential.
The negative power of x has an absolute value one more than the shape parameter α.
Exercise: A probability density function is proportional to e-11/x x-2.5.

What distribution is this?
[Solution: This is an Inverse Gamma Distribution with α = 1.5 and θ = 11.
Comment: The proportionality constant in front of the density is 111.5 / Γ(1.5) = 36.48/ 0.8862 =
41.16. There is no requirement that α be an integer. If α is non-integral then one needs access to a
software package that computes the (complete) Gamma Function.]
The Distribution Function is related to that of a Gamma Distribution: Γ(α ; x/θ).

If x/θ follows an Inverse Gamma Distribution with scale parameter of one, then θ/x follows a Gamma
Distribution with a scale parameter of one.
The Inverse Gamma is heavy-tailed, as can be seen by the lack of the existence of certain
moments.142 The nth moment of an Inverse Gamma only exists for n < α.
139
140
An Inverse Gamma Distribution is sometimes called a Pearson Type V Distribution.
141
The Gamma density has an exponential of x times x to a power.
142
In the extreme tail its behavior is similar to that of a Pareto distribution with the same shape parameter α.
Note that the Inverse Gamma density function integrates to unity from zero to infinity.143
∞
e- θ / x
∫ x
α
α + 1 dx = Γ(α) / θ , α > 0.
0
This fact will be useful for working with the Inverse Gamma Distribution. For example, one can
compute the moments of the Inverse Gamma Distribution:
∞ ∞ ∞
E[Xn ] = ∫ xn f(x)dx = ∫ xn θα e -θ/ x / { Γ(α) x α+ 1 } dx = {θα/Γ(α)} ∫ e -θ / x x -( α + 1− n) dx
0 0 0
= {θα/Γ(α)} Γ(α - n) /θ(α-n) = θn Γ(α - n) / Γ(α), α - n > 0.
Alternately, the moments of the Inverse Gamma also follow from the moments of the Gamma
Distribution which are E[Xn ] = θn Γ(α+n) / Γ(α).144 If X follows a Gamma with unity scale parameter,
then Z = θ/X has an Inverse Gamma Distribution, with parameters α and θ.
Thus the Inverse Gamma has moments: E[Zn ] = E[(θ/X)n ] = θn E[X-n] = θn Γ(α-n) / Γ(α), α > n.
Specifically, the mean of the Inverse Gamma = E[θ/X] = θ Γ(α-1) / Γ(α) = θ /(α−1).
As will be discussed in a subsequent section, the Inverse Gamma Distribution can be used to mix
together Exponential Distributions.
For the Inverse Gamma Distribution:

{Skewness(Kurtosis + 3)}2 = 4 (4 Kurtosis - 3 Skewness2 ) (2 Kurtosis - 3 Skewness2 - 6).145
143
This follows from substituting y = 1/x in the definition of the Gamma Function. Remember it via the fact that all
probability density functions integrate to unity over their support.
144
This formula works for n positive or negative, provided n > − α.
145
See page 216 of Volume I of Kendallʼs Advanced Advanced Theory of Statistics.
Type V of the Pearson system of distributions is the Inverse Gamma.
Inverse Gamma Distribution
Support: x > 0 Parameters: α > 0 (shape parameter), θ > 0 (scale parameter)
D. f. : F(x) = 1 - Γ(α ; θ/x)
θα e - θ / x ⎛θ ⎞ α e- θ / x
P. d. f. : f(x) = = ⎜ ⎟ .
xα + 1 Γ[α] ⎝ x ⎠ x Γ[α]
θn θk
Moments: E[Xk] = n = , α > k.
( α − 1)...(α − k)
∏ (α − i)
i=1
θ θ
Mean = α>1 Mode =
α −1 α+1
θ2
Second Moment = α >2
(α − 1) (α − 2)
θ2
Variance = α >2
(α − 1)2 (α − 2)
1
Coefficient of Variation = Standard Deviation / Mean = , α> 2.
α −2
4 α−2 3 ( α − 2 ) (α + 5 )
Skewness = , α > 3. Kurtosis = , α > 4.
α −3 (α − 3 ) ( α − 4 )
θ
Limited Expected Value: E[X ∧ x] = {1 − Γ[α−1; θ/x]} + x Γ[α; θ/x]
α −1
R(x) = Excess Ratio = Γ[α−1; θ/x] − (α−1) (x/θ) Γ[α; θ/x] .
θ Γ [α -1; θ / x]
e(x) = Mean Excess Loss = - x, α > 1
α − 1 Γ [ α ; θ / x]
X ∼ Gamma(α, 1) ⇔ θ/X ∼ Inverse Gamma(α, θ).

Calculating the Distribution Function for an Inverse Gamma:
As mentioned previously in “Mahlerʼs Guide to Frequency,” one can write an Incomplete Gamma
function for integer α = n as the sum of Poisson probabilities for the number of events greater than or
equal to n:146
n-1 ∞
∑ ∑
xi e- x xi e- x
Γ(n ; x) = 1 - = .
i! i!
i=0 i=n
Now the Inverse Gamma Distribution Function is: F(x) = 1 - Γ(α ; θ/x).
α-1
∑
(θ / x)i e-θ / x
Thus for integer α, F(x) = .
i!
i=0
Exercise: For an Inverse Gamma Distribution with α = 5 and θ = 100, compute F(25) and F(50).
[Solution: F(25) = e-4 (1 + 4 + 42 /2 + 43 /6 + 44 /24) = 0.6288.
F(50) = e-2 (1 + 2 + 22 /2 + 23 /6 + 24 /24) = 0.9473.]
146
Problems:

You have an Inverse Gamma Distribution with parameters α = 5 and θ = 10.
25.1 (1 point) What is the density function at x = 0.7?

A. less than 0.01
E. at least 0.04

A. less than 2.0
E. at least 2.6

A. less than 2.0
E. at least 2.6
25.4 (1 point) What is the mode?
25.5 (1 point) What is the integral from zero to infinity of 1,000,000 e-15/x x-7?
A. less than 8
E. at least 11
25.6 (1 point) Sizes of loss are assumed to follow a LogLogistic Distribution, with γ = 4 and
θ = 1000. What is the probability that a loss exceeds 1500?
A. less than 13%
E. at least 16%
25.7 (1 point) Losses follow a LogLogistic Distribution with γ = 4 and θ = 1000. What is the mode?
25.8 (1 point) For an Inverse Pareto Distribution, with τ = 3 and θ = 10,

what is the probability density function at 7?
A. less than 1.7%
E. at least 2.0%
25.9 (1 point) For an Inverse Pareto Distribution, with τ = 6 and θ = 80, what is the mode?
25.10 (2 points) For a ParaLogistic Distribution, with α = 4 and θ = 100, what is the mean?
You may use the facts that: Γ(1/4) = 3.6256, Γ(1/2) = 1.7725, and Γ(3/4) = 1.2254.
A. 59 B. 61 C. 63 D. 65 E. 67
25.11 (1 point) For an Inverse ParaLogistic Distribution, with τ = 2.3 and θ = 720, what is F(2000)?
A. less than 82%
E. at least 85%
25.12 (2 points) For an Inverse Weibull Distribution, with τ = 5 and θ = 20, what is the variance?
You may use: Γ(0.2) = 4.59084 , Γ(0.4) = 2.21816 , Γ(0.6) = 1.48919 , and Γ(0.8) = 1.16423.
A. less than 55
E. at least 70
25.13 (1 point) For an Inverse Weibull Distribution, with τ = 5 and θ = 20, what is the mode?
12,500 e- 10 / x
25.14 (2 points) f(x) = , x > ∞.
3 x6
You take a sample of size 20.
Using the Normal Approximation, determine the probability that the sample mean is greater than 3.
A. 2% B. 3% C.4% D. 5% E. 6%
25.15 (2 points) What is the Distribution Function at 9 of an Inverse Gaussian Distribution,

with parameters µ = 7 and θ = 4?
A. Less than 70%
B. At least 70% but less than 75%
C. At least 75% but less than 80%
D. At least 80% but less than 85%
E. At least 85%
25.16 (2 points) What is the Probability Density Function at 16 of an Inverse Gaussian Distribution,
with parameters µ = 5 and θ = 7?
A. Less than 0.0040
E. At least 0.0055
25.17 (2 points) Accident year paid losses are the total amount paid for insured accidents that occur
during a given year. Since it takes time to report and settle claims, the accident year paid losses
develop over time until they reach their ultimate value.
For a certain line of insurance, the average percentage of ultimate losses for a given accident year
paid by a given time follows a LogLogistic Distribution with γ = 2.5 and θ = 24, where x is the time in
months from the middle of the accident year to the evaluation date of reported losses.
For Accident Year 2013, what is the expected percentage of ultimate losses that are not paid by
December 31, 2015?
A. 28% B. 30% C. 32% D. 34% E. 36%
25.18 (3 points) For an Inverse Gamma Distribution with α = 4 and θ = 500,

compute F(100) and F(200).
α-1
∑
xi e- x
Hint: For alpha integer, Γ(α ; x) = 1 - .
i!
i=0
Use the following information for the next two questions.

Let W = 0.7X + 0.3Y, where X and Y are independent random variables.
X follows a Gamma distribution with α = 3 and θ = 10.
Y follows a Inverse Gamma distribution with α = 12 and θ = 300.
25.19 (2 points) What is the mean of W?

A. less than 25
E. at least 29
25.20 (3 points) What is the variance of W?

A. less than 145
E. at least 160
25.21 (2 points) Accident year losses are the total dollars of loss reported for insured accidents that
occur during a given year. Since it takes time to report and settle claims, the accident year losses
develop over time until they reach their ultimate value.
For a certain line of insurance, the average percentage of ultimate losses for a given accident year
reported by a given time follows a LogLogistic Distribution with γ = 2 and θ = 15, where x is the time
in months from the middle of the accident year to the evaluation date of reported losses.
For Accident Year 2008, what is the expected percentage of ultimate losses that are reported by
December 31, 2010?
A. 78% B. 80% C. 82% D. 84% E. 86%
25.1. C. f(x) = θα e−θ/x / {Γ(α) xα+1} , f(0.7) = 105 e-10/0.7 / {Γ(5) 0.76 } = 0.0221.
Comment: Using the formulas in Appendix A of Loss Models, f(x) = (θ/x)α e−θ/x /{xΓ(α)}.
f(0.7) = (10/0.7)5 e-10/0.7 / {0.7 Γ(5) } = 0.0221.
25.2. D. Mean = θ / (α−1) = 10 / (5-1) = 2.5.
25.3. B. Variance = θ2 / {(α−1)2 (α-2)} = (102 ) / {42 3} = 2.0833.

Alternately, the second moment is: θ2 / {(α−1) (α−2)} = (102 ) / {(4)(3)} = 8.333.
Thus the variance = 8.333 - 2.52 = 2.083.
25.4. mode = θ/(α+1) = 10/6 = 1.67.
∞
25.5. D. ∫0 x - (α + 1) e - θ / x dx = Γ(α) / θα.
Letting θ = 15 and α = 6, the integral from zero to infinity of e-15/x x-7 is:
Γ(6) / 156 = 120 / 113909625 = 0.0000105. Thus the integral of 1 million times that is: 10.5.
Comment: e-15/x x-7 is proportional to the density of an Inverse Gamma Distribution with θ = 15 and
α = 6. Thus its integral from zero to infinity is the inverse of the constant in front of the Inverse
Gamma Density, since the density itself must integrate to unity.
Alternately, one could let y = 15/x and convert the integral to a complete Gamma Function.
25.6. E. F(x) = (x/θ)γ / {1+ (x/θ)γ}. S(x) = 1 / {1 + (x/θ)γ} = 1/(1 + 1.54 ) = 0.165.
25.7. For γ > 1, mode = θ {(γ-1)/(γ+1)}1/γ = 1000 (3/5)1/4 = 880.
25.8. B. f(x) = τθ x τ−1 /(x+θ)τ+1. f(7) = (3)(10)(72 )/(7+10)4 = 0.0176.
25.9. For τ > 1, mode = θ (τ - 1)/2 = 80(5/2) = 200.

25.10. E. Γ(1.25) = (1/4)Γ(.25) = (1/4)(3.6256) = .9064.

Γ(3.75) = (2.75)(1.75)(.75)Γ(.75) = (3.6094)(1.2254) = 4.423.
E[X] = θ Γ(1+1/α)Γ(α−1/α) / Γ(α) = 100 Γ(1.25) Γ(3.75) / Γ(4). = (100)(.9064)(4.423)/6 = 66.8.
25.11. A. F(x) = ((x /θ)τ/(1+(x /θ)τ))τ = (1+(θ /x )τ)−τ. F(2000) = (1+(720/2000)2.3)-2.3 = 0.811.
25.12. A. E[X] = θ Γ(1 - 1/τ) = 20 Γ(.8) = (20)(1.16423) = 23.285.

E[X2 ] = θ2 Γ(1−2/τ) = 400 Γ(.6) = (400)(1.48919) = 595.68.
variance = 595.68 - 23.2852 = 53.5.
25.13. mode = θ {τ/(τ+1)}1/τ = (20) (5/6)1/5 = 19.3.
25.14. E. The given density is that of an Inverse Gamma, with α = 5 and θ = 10.
Mean = θ/(α-1) = 10/4 = 2.5.
θ2
Second Moment = = 100 / 12. Variance = 25/3 - 2.52 = 2.0833.
(α − 1) (α − 2)
Thus the variance of the sample mean is: 2.0833 / 20 = 0.10417
3 - 2.5
S(3) ≅ 1 - Φ[ ] = 1 - Φ[1.55] = 6.06%.
0.10417
25.15. C. For the Inverse Gaussian, F(x) = Φ[ θ (x1/2/µ - x-1/2)] + e2θ/µ Φ[- θ (x1/2/µ +x-1/2)].
With µ = 7 and θ = 4, F(9) = Φ[2 (3/7 - 1/3)] + e8/7 Φ[-2(3/7 +1/3)] =

Φ[0.19] + (3.1357)Φ[-1.52] = 0.5753 + (3.13570)(0.0643) = 0.777.
Comment: Using the formulas in Appendix A of Loss Models, z = (x-µ)/µ = (9-7)/7 = 2/7.
y = (x+µ)/µ = (9+7)/7 = 16/7. F(x) = Φ[z θ / x )] + e2θ/µ Φ[-y θ / x ].
F(9) = Φ[(2/7) 4 / 9 ] + e8/7 Φ[-(16/7) 4 / 9 ] = Φ[0.19] + (3.1357)Φ[-1.52] = 0.777.
25.16. E. For the Inverse Gaussian, f(x) = θ / (2 π) x-1.5 exp(-θ(x/µ -1)2 / (2x)).
With µ = 5 and θ = 7, f(16) = 7 / (2 π) 16-1.5 exp[-7(11/5)2 / 32] = (1.0556) (1/64) e-1.0588 =

0.0057.
25.17. E. For this LogLogistic, F(x) = (x/24)2.5 / {1 + (x/24)2.5}.

The middle of Accident Year 2013 is July 1, 2013.
There are 30 months between July 1, 2013 and December 31, 2015.
Thus we want: F(30) = (30/24)2.5 / {1 + (30/24)2.5} = 63.6%. S(30) = 1 - 63.6% = 36.4%.
Comment: Loosely based on CAS Exam 7, 5/14, Q. 5a.
25.18. For the Inverse Gamma Distribution: F(x) = 1 - Γ(α; θ/x).

α-1
∑
(θ / x)i e-θ / x
Thus for integer α, F(x) = .
i!
i=0
F(100) = e-5 (1 + 5 + 52 /2 + 53 /6) = 0.2650.

F(200) = e-2.5 (1 + 2.5 + 2.52 /2 + 2.53 /6) = 0.7576.
Comment: The given hint is Theorem A.1 in Loss Models.
25.19. E. Mean of Gamma is αθ = 30. Mean of Inverse Gamma = θ/(α−1) = 300/11 = 27.3.
Mean of W = (0.7)(30) + (0.3)(27.3) = 29.2.
25.20. C. Second moment for Gamma is α(α+1)θ2 = 1200.
Variance of Gamma = 1200 - 302 = 300 = αθ2.
Second moment for Inverse Gamma is θ2/ {(α−1)(α−2)} = 818.2.

Variance of Inverse Gamma = 8.18.2 - 27.32 = 72.9.
Therefore, the Variance of W = (0.72 )(300) + (0.32 )(72.9) = 153.6.
Comment: Note that this is a linear combination rather than a mixture.
25.21. B. For this LogLogistic, F(x) = (x/15)2 / {1 + (x/15)2 }.

The middle of Accident Year 2008 is July 1, 2008.
There are 30 months between July 1, 2008 and December 31, 2010.
Thus we want: F(30) = (30/15)2 / {1 + (30/15)2 } = 80%.
Comment: Loosely based on CAS Exam 7, 5/11, Q. 2a.
2016-C-2, Loss Distributions, §26 Three Parameter Distribs. HCM 10/21/15, Page 352
Section 26, Three Parameter Distributions
There are five Three Parameter Distributions in Loss Models: Transformed Gamma, Inverse
Transformed Gamma, Burr, Inverse Burr, and Generalized Pareto, each generalizations of one or
more of the two parameter distributions. The extra parameter provides extra flexibility, which
potentially allows a closer fit to data.147 You are unlikely to be asked questions involving the 3
parameter distributions.
Transformed Gamma Distribution:148
Transformed Gamma with τ = 1 is the Gamma.

Transformed Gamma with α = 1 is the Weibull.
⎛ x⎞ τ
F(x) = Γ[α ; (x/θ)τ]. f(x) =

[ ]
τ xτα -1 exp -⎜ ⎟
⎝ θ⎠
.
τα
θ Γ(α)
θ is the scale parameter for the Transformed Gamma. α is a shape parameter in the same way it is
for the Gamma. τ is a shape parameter, as for the Weibull.
The relationships between the Exponential, Weibull, Gamma, and Transformed Gamma
Distributions are shown below:
power
transformation
Exponential
⇒ Weibull
⇑
α =1 α =1
⇑
Gamma ⇒ Transformed Gamma
power
transformation
Mean = θ Γ(α + 1/τ) /Γ(α). Variance = θ2 {Γ(α)Γ(α + 2/τ) - Γ2(α + 1/τ)} / Γ(α)2.
147
The potential additional accuracy comes at the cost of extra complexity.
148
See Appendix A and Figure 5.3 in Loss Models.
Therefore: 1 + CV2 = Γ(α)Γ(α + 2/τ) / Γ2(α + 1/τ).
It turns out that: (Skewness)(CV3) + 3CV2 + 1 = Γ2(α)Γ(α + 3/τ) / Γ2(α+1/τ).149
The Transformed Gamma Distribution is defined in terms of the Incomplete Gamma Function,
F(x) = Γ[α ; (x/θ)τ]. Thus using the change of variables y = (x/θ)τ, the density of a Transformed
Gamma Distribution can be derived from that for a Gamma Distribution.
Exercise: Derive the density of the Transformed Gamma Distribution from that of the Gamma
Distribution.
[Solution: Let y = (x/θ)τ. If y follows a Gamma Distribution with parameters α and 1, then x follows a
Transformed Gamma Distribution with parameters α and θ.
If y follows a Gamma Distribution with parameters α and 1, then f(y) = yα−1 e-y / Γ(α).
Then the density of x is given by: f(y)(dy/dx) = {((x/θ)τ)α−1 exp(-(x/θ)τ) / Γ(α)} {τθ−τxτ−1} =
θ−τατxτα−1 exp[-(x/θ)τ] / Γ(α), as shown in Appendix A of Loss Models.]
Exercise: What is the mean of a Transformed Gamma Distribution?

∞ ∞ ∞
τ xτα - 1 exp[-(x / θ)τ] τ
τ xτα exp[-(x / θ) ]
[Solution: ∫0 x f(x) dx =
∫0 x
θ -τα Γ[α]
dx = ∫0 θ -τα Γ[α]
dx.
Let y = (x/θ)τ, and thus x = θ y1/τ , dx = θ y1/τ −1 dy / τ, then the integral for the first moment is:
∞ ∞
τ yα θτα exp[-y] θ y 1/ τ - 1
∫0 θ - τα Γ[α] τ
dy = ∫0 θ yα + 1/ τ - 1 exp[-y] dy / Γ(α) = θ Γ(α+ 1/τ) / Γ(α).]
149
See Venter “Transformed Beta and Gamma Distributions and Aggregate Losses,” PCAS 1983. These relations
can be used to apply the method of moments to the Transformed Gamma Distribution. Also an appropriate mixing of
Transformed Gammas via a Gamma produces a Transformed Beta Distribution. The mixing of Exponentials via a
Gamma to produce a Pareto is just a special case of this more general result.
Exercise: What is the nth moment of a Transformed Gamma Distribution?

∞ ∞ ∞
τ x τα - 1 exp[-(x / θ)τ ] τ
τ xτα + n - 1 exp[-(x / θ) ]
[Solution: ∫0 xn f(x) dx = ∫0 xn
θτα Γ[α]
dx = ∫0 θτα Γ[α]
dx.
Let y = (x/θ)τ, and thus x = θy1/τ, dx = θy1/τ −1 dy/τ, then the integral for the nth moment is:
∞ ∞
τ y α + (n - 1) / τ θ τα + n - 1 exp[-y] θ y 1/ τ - 1
∫0 θτα Γ[α ] τ
dy = ∫0 θn y α + (n / τ) - 1 exp[-y] dy / Γ(α)
= θn Γ(α+ n/τ) / Γ(α). This is the formula shown in Appendix A of Loss Models.]
Exercise: What is the 3rd moment of a Transformed Gamma Distribution with

α = 5, θ = 2.5, and τ = 1.5?
[Solution: θn Γ(α+ n/τ)/Γ(α) = 2.53 Γ(5+ 3/1.5)/Γ(5) = 2.53 { Γ(7)/Γ(5)} = 2.53 (6)(5) = 468.75.]
Limit of Transformed Gamma Distributions:
One can obtain a LogNormal Distribution as an appropriate limit of Transformed Gamma

Distributions.150 First note that Γ[α; y] is a Gamma Distribution with scale parameter of 1, and thus
mean and variance of α. As α gets large, this Gamma Distribution, approaches a Normal
Distribution. Thus for large α, Γ[α; y] ≅ Φ[(y -α)/ α ].

Now take a limit of Transformed Gamma Distributions as τ goes to zero, while
α = (1 + τµ)/τ2σ2 and θτ = τ2σ2, where µ and σ are selected constants.151

As τ goes to zero, both α and θ go to infinity.
For each Transformed Gamma we have F(x) = Γ[α; y], with y = (x/θ)τ = xτ/(τ2σ2).
(y -α)/ α = (xτ/τ2σ2 - (1 + τµ)/τ2σ2)/ { 1 + τµ /τσ} = {xτ / 1 + τµ - 1 + τµ } / τσ. As tau goes

to zero both the number and denominator go to zero. To get the limit we use LʼHospitalʼs rule and
differentiate both the numerator and denominator with respect to τ:
limit xτ / 1 + τµ - 1 + τµ } / τσ = limit {ln(x)xτ/ 1 + τµ - µxτ/{2(1 + τµ)1.5} - µ/ {2 1 + τµ }}/ σ

τ →0 τ →0
= {ln(x) - µ/ 2 - µ/ 2}/ σ = {ln(x) - µ}/ σ.
150
151
Mu and sigma will turn out to be the parameters of the limiting LogNormal Distribution.
Thus as alpha gets big and tau get small we have

Γ[α; y] ≅ Φ[(y - α) / α ] ≅ Φ[{ln(x) - µ}/ σ],
which is a LogNormal Distribution. Thus one can obtain a LogNormal Distribution as an appropriate
limit of Transformed Gamma Distributions.152
Inverse Transformed Gamma:153
If x follows a Transformed Gamma Distribution, then 1/x follows an Inverse

Transformed Gamma Distribution.
Inverse Transformed Gamma with τ = 1 is the Inverse Gamma.

Inverse Transformed Gamma with α = 1 is the Inverse Weibull.
⎛ θ ⎞τ
F(x) = 1 - Γ[α ; (θ/x)τ]. f(x) =

[ ]
τ θτα exp -⎜ ⎟
⎝ x⎠
.
x τα + 1 Γ[α]
θ is the scale parameter for the Inverse Transformed Gamma. α is a shape parameter in the same
way it is for the Inverse Gamma. τ is a shape parameter, as for the Inverse Weibull.
The relationships between the Inverse Exponential, Inverse Weibull, Inverse Gamma, and Inverse
Transformed Gamma Distributions are shown below:
power
transformation
Inverse
Exponential ⇒ Inverse
Weibull
α =1 ⇑ ⇑ α =1
⇒
Inverse Inverse
Gamma Transformed Gamma
power
transformation
The Inverse Transformed Gamma, and its special cases, are heavy-tailed distributions.
The nth moment only exists is n < ατ.
152
One can also obtain a LogNormal Distribution as an appropriate limit of Inverse Transformed Gamma Distributions.
153
See Appendix A and Figure 5.3 of Loss Models.
The mean excess loss increases approximately linearly for large x.
In the same way that the Transformed Gamma distribution is a generalization of the Gamma
Distribution, the Inverse Transformed Gamma is a Generalization of the Inverse Gamma.
y = xτ
Gamma ⇒ Transformed Gamma
y = 1/x
⇓ ⇓ y = 1/x
Inverse Gamma ⇒ Inverse Transformed Gamma

y = xτ
Generalized Pareto:154
Generalized Pareto with τ = 1 is the Pareto.
Generalized Pareto with α = 1 is the Inverse Pareto.
Γ[α + τ] θα xτ -1
F(x) = β[τ, α ; x/(θ+x)]. f(x) = .
Γ[α ] Γ[τ] (θ + x)α + τ
θ is the scale parameter for the Generalized Pareto. α is a shape parameter in the same way it is for
the Pareto. τ is an additional shape parameter. While the Pareto may be obtained by mixing
Exponential Distributions via a Gamma Distribution, the Generalized Pareto can be obtained by
mixing Gamma Distributions via a Gamma Distribution. Thus in the same way that a Gamma is a
generalization of the Exponential, so is the Generalized Pareto of the Pareto.
If x follows a Generalized Pareto, then so does 1/x.155

Specifically, using the fact that β(a,b; x) = 1 - β(b,a; 1-x), 1 - F(1/x) = 1 - β[τ, α ; (1/x)/(θ+1/x)] =
1 - β[τ, α ; (1/θ)/(1/θ +x)] = β[α, τ; 1- (1/θ)/(1/θ +x)] = β[α, τ; x/(1/θ +x)].

Thus if x follows a Generalized Pareto with parameters α, θ, τ, then 1/x follows
a Generalized Pareto, but with parameters τ, 1/θ, α.
154
As discussed in “Mahlerʼs Guide to Statistics”, for CAS Exam ST, the F-Distribution is a special case of the
Generalized Pareto. The Generalized Pareto is sometimes called a Pearson Type VI Distribution.
155
Which is why there is not an “Inverse Generalized Pareto Distribution.”
Burr Distribution:
Burr with γ = 1 is the Pareto.
Burr with α = 1 is the LogLogistic. Burr with α = γ is the ParaLogistic.
⎛ 1 ⎞α α γ xγ −1 ⎛ 1 ⎞ α+ 1
F(x) = 1 - ⎜ γ⎟ f(x) = ⎜ γ⎟
⎝ 1 + (x / θ) ⎠ θγ ⎝ 1 + (x / θ) ⎠
θ is the scale parameter for the Burr distribution. α is a shape parameter in the same way it is for the
Pareto. γ is an additional shape parameter. The Burr is obtained from the Pareto by introducing a
power transformation; if xγ follows a Pareto Distribution, then x follows a Burr Distribution. If x follows
a Burr Distribution with parameters α, θ, and γ, then (x/θ)γ follows a Pareto Distribution with shape
parameter of α and scale parameter of 1.
While the Pareto may be obtained by mixing Exponential Distributions via a Gamma Distribution,
the Burr can be obtained by mixing Weibull Distributions via a Gamma Distribution. Thus in the
same way that a Weibull is a generalization of the Exponential, so is the Burr of the Pareto.
Inverse Burr:
If x follows a Burr Distribution, then 1/x follows an Inverse Burr Distribution.
Inverse Burr with γ = 1 is the Inverse Pareto.
Inverse Burr with τ = 1 is the LogLogistic. Inverse Burr with α = γ is the Inverse ParaLogistic.
⎛ (x / θ) γ ⎞ τ τ γ x γτ − 1 ⎛ 1 ⎞τ+1
γ −τ
F(x) = ⎜ γ ⎟ = {1 + (θ /x) } f(x) = ⎜ γ⎟
⎝ 1 + (x / θ) ⎠ θγτ ⎝ 1 + (x / θ) ⎠
θ is the scale parameter for the Burr distribution. α is a shape parameter in the same way it is for the
Inverse Pareto. γ is an additional shape parameter. The Inverse Burr is obtained from the Inverse
Pareto by introducing a power transformation; if xγ follows a Inverse Pareto Distribution, then x

follows a Inverse Burr Distribution.
While the Inverse Pareto may be obtained by mixing Inverse Exponential Distributions via a
Gamma Distribution, the Inverse Burr can be obtained by mixing Inverse Weibull Distributions via a
Gamma Distribution. Thus in the same way that an Inverse Weibull is a generalization of the Inverse
Exponential, so is the Inverse Burr of the Inverse Pareto.
Log-t Distribution:156
The log-t distribution has the same relationship to the t distribution, as does the LogNormal
Distribution to the Standard Normal Distribution.157
If Y has a t distribution with r degrees of freedom, then exp(σY + µ) has a log-t distribution, with
parameters r, µ and σ.
In other words, if X has a log-t distribution with parameters r, µ and σ, then (ln(X) - µ)/σ has a
t distribution, with r degrees of freedom.
Exercise: What is the distribution function at 1.725 for a t distribution with 20 degrees of freedom?
[Solution: Using a t table, at 1.725 there is a total of 10% in the left and right hand tails. Therefore, at
1.725 the distribution function is 95%.]
Exercise: What is the distribution function at 11.59 for a log-t distribution with parameters
r = 20, µ = -1, and σ = 2?
[Solution: {ln(11.59) - (-1)}/2 = 1.725. Thus we want the distribution function at 1.725 of a
t distribution with 20 degrees of freedom. This is 95%.]
As with the t-distribution, the distribution function of the log-t distribution can be written in terms
of incomplete beta functions, while the density involves complete gamma functions.
Since the t-distribution is heavier tailed than the Normal Distribution, the log-t distribution is heavier
tailed than the LogNormal Distribution. In fact, none of the moments of the log-t distribution exist!
156
157
The (Studentʼs) t distribution is discussed briefly in the next section.
Problems:
26.1 (1 point) The Generalized Pareto Distribution is a generalization of which of the following
Distributions?
A. Pareto B. Gamma C. Weibull D. LogNormal E. Inverse Gaussian
26.2 (2 points) For f(x) = (10-10) x9 exp[-(0.01x2 )] / 12

what is the value of integral from zero to infinity of x2 f(x) dx?
Hint: For the Transformed Gamma distribution, f(x) = τ(x/θ)τα exp(-(x/θ)τ) / {x Γ(α)}, and
E[Xn ] = θn Γ(α + n/τ) / Γ(α).
A. less than 450
E. at least 510
26.3 (1 point) Match the Distributions.

1. Transformed Gamma with α = 1. a. Pareto
2. Transformed Gamma with τ = 1. b. Weibull
3. Burr with γ = 1. c. Gamma

• Claim sizes follow a Burr Distribution with parameters α = 3, θ = 32, and γ = 2.
• You observe 11 claims.
• The number of claims and claim sizes are independent.
Determine the probability that the smallest of these claim is greater than 7.
A. less than 20%
E. at least 35%
26.5 (2 points) Which of the following is an expression for the variance of a Generalized Pareto
Distribution, with α > 2?
A. θ2 α / {(α-τ)2 (α-2)}
B. θ2 τ (α+τ-1) / {(α-1)2 (α-2)}
C. θ2 (α+τ-1) / {(α-τ)(α-1)(α-2)}
D. θ2 τ α / {(α-τ)2 (α- (τ+1))}
26.6 (1 point) Claim sizes follow an Inverse Burr Distribution with parameters τ = 3, θ = 100, and
γ = 4. Determine F(183).
A. less than 60%
E. at least 75%
26.7 (1 point) What is the second moment of an Inverse Transformed Gamma Distribution with
parameters α = 5, θ = 15, τ = 2?
Hint: E[Xk] = θk Γ[α - k/τ] / Γ[α], k < ατ.
A. less than 20
E. at least 50
26.8 (1 point) Match the Distributions.

1. Inverse Transformed Gamma with α = 1. a. LogLogistic
2. Generalized Pareto with α = 1. b. Inverse Weibull
3. Inverse Burr with τ = 1. c. Inverse Pareto
26.9 (1 point) What is the mode of a Transformed Gamma distribution, with α = 4.85, θ = 813, and
τ = 0.301?
A. less than 2000
E. at least 3500
26.10 (1 point) What is the mode of a Generalized Pareto distribution, with α = 2.5, θ = 100, and
τ = 1.3?
26.11 (3 points) X follows a Burr distribution with parameters α1, θ, γ.
Y follows a Burr distribution with parameters α2, θ, γ,

where the last two parameters are the same as for X.
X and Y are independent.
Determine the probability that Y > X.
26.12 (1 point) What is the mode of an Inverse Burr distribution, with τ = 0.8, θ = 1000, and
γ = 1.5?
26.13 (1 point) What is the mode of an Inverse Transformed Gamma distribution, with α = 3.7,
θ = 200, and τ = 0.6?
26.14 (1 point) What is the median of a Burr distribution, with α = 1.85, θ = 273,700, and γ = 0.97?
E. at least 130,000
26.15 (2 points) X follows a Burr Distribution with parameters α = 6, θ = 20, and γ = 0.5.
Let X be the average of a random sample of size 200.
Use the Central Limit Theorem to find c such that Prob( X ≤ c) = 0.90.
A. 2.4 B. 2.5 C. 2.6 D. 2.7 E. 2.8
26.16 (4B, 5/97, Q.24) (2 points) The random variable X has the density function
f(x) = 4x / (1+x2 )3 , 0 < x < ∞ . Determine the mode of X.
A. 0
B. Greater than 0, but less than 0.25
E. At least 0.75
26.1. A. The Generalized Pareto Distribution is a generalization of the Pareto Distribution, with three
rather than two parameters.
Comment: Questions canʼt get any easier than this! For τ = 1, the Generalized Pareto is a Pareto.
For α = 1, the Generalized Pareto is an Inverse Pareto.
26.2. D. This is a Transformed Gamma distribution, with α = 5, θ = 10, and τ = 2.

The integral is the second moment, which for the Transformed Gamma is:
θ2Γ(α + 2/τ) / Γ(α) = 100 Γ(5 + 1) / Γ(5) = (100)(5) = 500.
26.3. D. 1. Transformed Gamma with α = 1 is a Weibull. 2. Transformed Gamma with

τ = 1 is a Gamma. 3. Burr with γ = 1 is a Pareto.
26.4. B. S(7) = 1 - F(7) = {1/(1+(7/32)2 )}3 = 0.8692.

The smallest of these 11 claims is greater than 7, if and only if all the 11 claims are greater than 7.
The chance of this is: S(7)11 = 0.869211 = 21.4%.
Comment: This is an example of an order statistic.
26.5. B. The Generalized Pareto Distribution has for α > n, moments: θn Γ(α − n)Γ(τ + n)/Γ(α)Γ(τ).
The first moment is: θΓ(α − 1)Γ(τ + 1)/Γ(α)Γ(τ) = θ τ /(α-1).
The second moment is: θ2Γ(α − 2)Γ(τ + 2)/Γ(α)Γ(τ) = θ2τ(τ+1) / {(α-1)(α-2)}.
Thus the variance is: θ2τ(τ+1) /{(α-1)(α-2)} - {θτ /(α-1)}2 =
{θ2τ/{(α-1)2 (α-2)}} {(τ+1)(α-1) - τ(α-2)} = θ2 τ (α+τ-1) / {(α-1)2 (α-2)}.
Comment: For τ =1, this is a Pareto, with variance: θ2α / {(α−2)(α−1)2}, for α > 2.
26.6. E. F(x) = {1 + (θ/x)γ}−τ . F(183) = {1 + (0.5464)4 }-3 = 0.774.
26.7. E. E[X2 ] = θ2 Γ(α - 2/τ) / Γ(α) = 225 Γ(4) / Γ(5) = 225/4 = 56.25.
26.8. D. 1. b, 2.c, 3. a.
26.9. D. For τα = 1.46 > 1, the mode = θ{(ατ −1)/τ}1/τ = (813)(0.45985/0.301)1/0.301 = 3323.
Comment: f(x) = τ(x/θ)τα exp(-(x/θ)τ) / {x Γ(α)}.
ln f(x) = ln(τ) + (τα - 1)ln(x) - τα ln(θ) - (x/θ)τ -ln(Γ(α)).

The mode is where the density is largest, and therefore where ln f(x) is largest.
d ln[f(x)]
d ln f(x) / dx = (τα - 1)/x - τxτ−1/θτ. Setting = 0, (τα - 1)/x = τxτ−1/θτ .
dx
Therefore, x = θ{(τα −1)/τ}1/τ = (813){1.528)3.322 = 3323.
26.10. For τ > 1, mode = θ(τ - 1)/(α+1) = (100)(0.3)/3.5 = 8.57.

Comment: The formula for the mode is shown in Appendix A attached to the exam.
⎛ 1 ⎞α α γ xγ −1 ⎛ 1 ⎞ α+ 1
26.11. For the Burr, S(x) = ⎜ ⎟ , and f(x) = ⎜ γ⎟ .
⎝ 1 + (x / θ) γ ⎠ θγ ⎝ 1 + (x / θ) ⎠
Prob[Y > X] = ∫ Prob[Y > x] fX(x) dx = ∫ SY(x) fX(x) dx =
∞ ∞
⎛ ⎞ α 2 α1 γ x γ - 1 ⎛ ⎞ α1 + 1 α γ ⎛ ⎞ α1 + α 2 + 1
∫0 ∫0
1 1 1
⎜ γ⎟ γ ⎜ γ ⎟ dx = 1γ xγ - 1 ⎜ γ ⎟ dx .
⎝ 1 + (x / θ) ⎠ θ ⎝ 1 + (x / θ) ⎠ θ ⎝ 1 + (x / θ) ⎠
Now since the density of the Burr Distribution integrates to one:

∞
⎛ ⎞ α +1 θγ
∫0 x γ-1 1
⎜ ⎟ dx = .
⎝ 1 + (x / θ) γ ⎠ αγ
γ
α1 γ θ α1
Thus, Prob[Y > X] = γ = .
θ (α1 + α 2) γ α1 + α 2
Comment: If for example α1 = 3 and α2 = 5, then X has a heavier righthand tail than Y,
and Prob[Y > X] = 3/8.
For γ = 1, the Burr is a Pareto Distribution.
26.12. For τγ > 1, mode = θ {(τγ - 1) / (γ+1)}1/γ = (1000)(0.2/2.5)1/1.5 = 185.7.
26.13. Mode = θ {τ / (ατ + 1)}1/τ = (200)(0.6/3.22)1/0.6 = 12.16.

26.14. C. F(x) = 1 - (1/(1+(x/θ)γ))α. Therefore, 0.5 = 1 - (1/(1+(x/273700)0.97))1.85.

1.455 = 1 + x0.97 /188,000. Therefore, x = 121.5 thousand.
26.15. E. E[X] = θ Γ[1 + 1/γ] Γ[α - 1/γ] / Γ[α] = (20) Γ[3] Γ[4] / Γ[6] = (20) (2!) (3!) / (5!) = 2.
E[X2 ] = θ2 Γ[1 + 2/γ] Γ[α - 2/γ] / Γ[α] = (202 ) Γ[5] Γ[2] / Γ[6] = (202 ) (4!) (1!) / (5!) = 80.
Var[X] = 80 - 22 = 76. Var[ X ] = 76/200 = 0.38. E[ X ] = E[X] = 2.
Using the Normal Approximation, the 90th percentile is at: 2 + 1.282 0.38 = 2.790.
26.16. C. The mode is where the density is at a maximum. We check the end points, but
f(0) = 0 and f(∞) = 0, so the maximum is in between where f′(x) = 0.
f′(x) = {4(1+x2 )3 - (4x)(6x)(1+x2 )2 }/(1+x2 )6 . Setting the derivative equal to zero,
4(1+x2 )3 - (4x)(6x)(1+x2 )2 = 0. Therefore 4(1+x2 ) = 24x2 . x2 = 1/5. x = 1/ 5 = 0.447.
Comment: One can confirm that the mode is approximately 0.45 numerically:
x f(x)
0 0.000
0.05 0.199
0.1 0.388
0.15 0.561
0.2 0.711
0.25 0.834
0.3 0.927
0.35 0.990
0.4 1.025
0.45 1.035
0.5 1.024
0.55 0.996
0.6 0.954
This is a Burr Distribution with α = 2, θ = 1, and γ = 2. As shown in Appendix A of Loss Models,
the mode is for γ > 1: θ {(γ-1) / (αγ +1)}1/γ = (1/5)1/2.

2016-C-2, Loss Distributions, §27 Beta Function HCM 10/21/15, Page 366
Section 27, Beta Function and Distribution
The quantity xa-1(1-x)b-1 for a > 0, b > 0, has a finite integral from 0 to 1. This integral is called the
(complete) Beta Function. The value of this integral clearly depends on the choices of the
(a - 1)! (b- 1)! Γ(a) Γ(b)
parameters a and b.158 This integral is: = .
(a + b - 1)! Γ(a + b)
The Complete Beta Function is a combination of three Complete Gamma Functions:
1
(a - 1)! (b- 1)! Γ(a) Γ(b)
β[a,b] = ∫0 xa - 1 (1- x)b - 1 dx =
(a + b - 1)!
=
Γ(a + b)
.
Note that β(a, b) = β(b, a).
Exercise: What is the integral from zero to 1 of x3 (1-x)6 ?

[Solution: β(4, 7) = Γ(4)Γ(7) / Γ(4+7) = 3! 6! / 10! = 1/840 = 0.001190.]
One can turn the complete Beta Function into a distribution on the interval [0,1] in a manner similar to
how the Gamma Distribution was created from the (complete) Gamma Function on [0,∞]. Let:
x x
Γ(a + b)
∫0 ∫0 ta - 1 (1- t)b - 1 dt .
(a + b - 1)!
F(x) = ta - 1 (1- t)b - 1 dt =
(a - 1)! (b- 1)! Γ(a) Γ(b)
x
F(x) = ∫0 ta - 1 (1- t)b - 1 dt / β[a, b] ≡ β(a , b; x).
f(x) = {(a+b-1)! / (a-1)! (b-1)!} xa-1(1-x)b-1, 0 ≤ x ≤1.
Then the distribution function is zero at x = 0 and one at x = 1. The latter follows from the fact that we
have divided by the value of the integral from 0 to 1 of ta-1(1-t)b-1.
This corresponds to the form of the incomplete Beta function shown in the Appendix A of Loss
Models. F(x) = β(a, b; x), 0 ≤ x ≤ 1.159 This two parameter distribution is a special case of what Loss
Models calls the Beta distribution, for θ = 1; the Beta Distribution in Loss Models is
F(x) = β(a, b; x/θ), 0 ≤ x ≤ θ.
158
The results have been tabulated and this function is widely used in many applications.
See for example the Handbook of Mathematical Functions, by Abramowitz, et. al.
159
This distribution is sometimes called a Beta Distribution of the first kind, or a Pearson Type I Distribution.
The following relationship, is sometimes useful: β(a, b; x) = 1 - β(b, a; 1 - x).
a a (a +1) ab
β(a,b;x) has mean: , second moment: , and variance: .
a + b (a + b) (a + b + 1) 2
(a + b) (a+ b +1)
The mean is between zero and one; for b < a the mean is greater than 0.5.
For a fixed ratio of a/b, the mean is constant and for a and b large β(a,b; x) approaches a Normal
Distribution. As a or b get larger the variance decreases. For either a or b extremely large, virtually all
the probability is concentrated at the mean.
Here are various Beta Distributions (with θ = 1):
a = 1 and b = 5 a = 2 and b = 4
5
2.0
4
1.5
3
1.0
2
1 0.5
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
a = 5 and b = 1
a = 4 and b = 2
5
2.0
4
1.5
3
1.0
2
0.5 1
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
For a > b the Beta Distribution is skewed to the left. For a < b it is skewed to the right.
For a = b it is symmetric. For a ≤ 1, the Mode = 0. For b ≤ 1, the Mode = 1.
β(a,b; x), the Beta distribution for θ = 1 is closely connected to the Binomial Distribution. The
Binomial parameter q varies from zero to one, the same domain as the Incomplete Beta Function.
The Beta density is proportional to the chance of success to the power a-1, times the chance of
failure to the power b-1. The constant in front of the Beta density is (a+b-1) times the binomial
coefficient for (a+b-2) and a-1. The Incomplete Beta Function is a conjugate prior distribution for the
Binomial. The Incomplete Beta Function for integer parameters can be used to compute the
sum of terms from the Binomial Distribution.160
Summary of Beta Distribution:
Support: 0 ≤ x ≤ θ Parameters: a > 0 (shape parameter), b > 0 (shape parameter)

θ > 0 (similar to a scale parameter, determines the support)
(a + b - 1)! x/ θ a - 1
F(x) = β(a, b ; x/θ) ≡ ∫ t (1- t)b - 1 dt .
(a - 1)! (b- 1)! 0
1 Γ(a + b)
f(x) = (x/θ)a (1 - x/θ)b-1 / x = (x/θ)a (1 - x/θ)b-1 / x
β(a, b) Γ(a) Γ(b)
(a + b - 1)!
= (x/θ)a-1 (1 - x/θ)b-1 / θ, 0 ≤ x ≤ θ.
(a - 1)! (b- 1)!
For a = 1, b = 1, the Beta Distribution is the uniform distribution from [0, θ].
Γ(a + b) Γ(a + n) (a + b - 1)! (a + n - 1)!

E[Xn] = θn = θn
Γ(a + b + n) Γ(a) (a + b + n - 1)! (a - 1)!
a(a + 1) ... (a + n - 1)
= θn .
(a + b)(a + b + 1) ... (a + b + n - 1)
a a (a +1) ab
Mean = θ E[X2 ] = θ2 Variance = θ2
a+b (a + b) (a + b + 1) (a + b)2 (a+ b +1)
Coefficient of Variation = Standard Deviation / Mean = (b / {a (a+b+1)}).5
a + b + 1
Skewness = 2 (b - a)
(a + b + 2) a b
a - 1
Mode = θ for a > 1 and b > 1
a + b - 2
Limited Expected Value = E[X ∧ x] = θ{a/(a+b)}β(a+1, b; x/θ) + x(1-β(b, a; x/θ)).

160
See ”Mahlerʼs Guide to Frequency Distributions.” On the exam you should either compute the sum of binomial
terms directly or via the Normal Approximation. Note that the use of the Beta Distribution is an exact result, not an
approximation. See for example the Handbook of Mathematical Functions, by Abramowitz, et. al.
Beta Distribution for a = 3, b = 3, and θ = 1:
1.75
1.5
1.25
0.75
0.5
0.25
0.2 0.4 0.6 0.8 1
Since the density of the Beta Distribution integrates to one over its support:
θ
∫ (t / θ)a - 1 (1 - t / θ)b - 1 dt = θ β(a, b). This is sometimes called a Beta type integral.
0
Uniform Distribution:
The Uniform Distribution from 0 to θ is a Beta Distribution with a = 1 and b = 1.

Specifically, DeMoivreʼs Law is a Beta Distribution with a = 1, b = 1, and θ = ω.
The future lifetime of a life aged x under DeMoivreʼs Law is a Beta Distribution with a = 1,
b = 1, and θ = ω - x.
Modified DeMoivreʼs Law:
The Modified DeMoivreʼs Law has S(x) = (1 - x/ω)α, 0 ≤ x ≤ ω. α = 1 is DeMoivreʼs Law.

The Modified DeMoivreʼs Law is a Beta Distribution with a = 1, b = α, and θ = ω.
The future lifetime of a life aged x under the Modified DeMoivreʼs Law is a Beta Distribution with
a = 1, b = α, and θ = ω - x.
Generalized Beta Distribution:
Appendix A of Loss Models also shows the Generalized Beta Distribution, which is obtained from
the Beta Distribution via a power transformation. F(x) = β(a, b ; (x/θ)τ). For τ = 1, the Generalized
Beta Distribution reduces to the Beta Distribution.
Studentʼs-t Distribution:
The Studentʼs-t Distribution from Statistics can be written in terms of the Incomplete Beta Function.
If U is a Unit Normal variable and χ2 follows a chi-square distribution with ν degrees of freedom, then
U
follows a Students-t distribution, ν degrees of freedom.161
χ2 / ν
For parameter ν, the density is:

f(x) = {Γ((ν+1)/2)/ (Γ(ν/2) πν )} (1+x2 /ν)-(ν+1)/2 = (1+x2 /ν)-(ν+1)/2 / β(ν/2,1/2), -∞ < x < ∞.
The Distribution Function is:162

F(x) = β[ν/2, 1/2; ν/(ν+x2 )] /2 for x ≤ 0,
F(x) = 1 - β[ν/2, 1/2; ν/(ν+x2 )] /2 for x ≥ 0.
F-Distribution:
The F-Distribution (variance ratio distribution) from Statistics can be written in terms of the
Incomplete Beta Function.163 If χ12 follows a chi-square distribution with ν1 degrees of freedom and
χ 22 follows a chi-square distribution with ν2 degrees, then
χ12 / ν1
follows an F-distribution, with ν1 and ν2 degrees of freedom.164
χ22 / ν2
The Distribution Function is:165

F(x) = β[ν1 /2, ν2 /2; ν1 x / (ν2 +ν1 x)] = 1- β[ν2 /2, ν1 /2; ν2 /(ν2 +ν1 x)], x > 0.
Conversely one can get the Incomplete Beta Function from the F-Distribution, provided 2a and 2b
are integral:
β(a,b; x) = F[bx/ a(1-x)], where F is the F-Distribution with 2a and 2b degrees of freedom.
161
A chi-square distribution with ν degrees of freedom is the sum of ν squares of independent Unit Normals.
162
See Section 26.7 of the Handbook of Mathematical Functions by Abramowitz, et. al.
163
The F-Distribution is a form of what is sometimes called a Beta Distribution of the Second Kind.
164
For example, χ1 2 could be the estimated variance from a sample of ν1 drawn from a Normal Distribution and χ2 2
could be the estimated variance from an independent sample of ν2 drawn from a Normal Distribution with the same
variance as the first. The variance-ratio test uses the F-Distribution to test the hypothesis that the two Normal
Distributions have the same variance. See the syllabus of CAS Exam 3L.
165
See Section 26.6 of the Handbook of Mathematical Functions by Abramowitz, et. al.
Relation of the Beta to the Gamma Distribution:
The Complete Beta Function is a combination of three Complete Gamma Functions:

1
(b- 1)! Γ(a) Γ(b)
β[a,b] = ∫ xa - 1 (1- x)b - 1 dx = (a(a- 1)!
+ b - 1)!
=
Γ(a + b)
.
0
In addition, there are other connections between the Gamma and Beta. If X is a random draw
from a Gamma Distribution with shape parameter α and scale parameter θ, and Y is a random
draw from a Gamma Distribution with shape parameter β and same scale parameter θ, then
Z = X / (X+Y) is a random draw from a Beta Distribution with parameters α, β, and
scale parameter 1.
In addition, the Gamma Distribution can be obtained as the limit of an appropriately chosen
sequence of Beta Distributions.
In order to demonstrate this relationship weʼll use Sterlingʼs formula for the Gamma of large
arguments as well as the fact that the limit as z → ∞ of (1 + c/z)z is ec.166
Let β(a,b; x) be a Beta Distribution. Let y be a chosen constant. Let x go to zero such that at the
same time the second parameter b of the Beta Distribution goes to infinity while the
relationship b = 1 + y/x holds. Then the integral that enters into the definition of β(a,b; x) is:
x x
∫0 ta -1 (1 - t)b -1 dt = ∫0 ta -1 (1- t)y / x dt .

Change variables by taking s = ty /x, then since t = sx/y, the above integral is:
y y
∫0 (sx / y) ∫0 sa -1 {1 - s / (y / x)}y / x ds =
a-1 y/ x
(1 - sx / y) (x / y) ds = (x/y)a
(x/y)a ∫0 sa -1 e- s ds . Since for small x , y/x is large and therefore {1 - s/(y/x)}y/x ≅ e-s.
166
The latter fact follows from taking the limit of ln{(1 + c/z)z} = z ln(1 + c/z) ≅ z (c/z - (c/z)2 /2 ) = c - c/ 2z2 ≅ c.
Meanwhile, the constant in front of the integral that enters into the definition of β(a,b; x) is:
Γ(a+b) / { Γ(a)Γ(b)} = Γ(a) {Γ(a+ 1+ y/x) / Γ(1 + y/x)}.
For very small x the argument of the Complete Gamma Function is very large.
Sterlingʼs Formula says for large z: Γ(z) ≅ e-zzz−0.5 2 π .
Thus for small x:

Γ(a+ 1+ y/x) / Γ( 1+ y/x) ≅
e-(a+ 1+ y/x)(a+ 1+ y/x)(a+ 0.5+ y/x) (2π)0.5 / {e-(1+ y/x)(1+ y/x)(0.5+ y/x) (2π)0.5}
= e-a (a+ 1+y/x)a {(a+ 1+ y/x) / (1+ y/x)}(.5+ y/x) ≅ e-a (y/x)a {1+ ax/y) y/x ≅
e-a (y/x)a ea = (y/x)a .
Thus for large b, Γ(a+b) / { Γ(a)Γ(b)} ≅ (y/x)a / Γ(a)
Putting the two pieces of the Incomplete Beta Function together:
x
Γ (α + β )
β(a,b; x) =
Γ (α) Γ (β) ∫0 ta -1 (1 - t)b -1 dt ≅
y y
(x / y)s
∫0 ∫0 sa -1 e- s ds = Γ(a; y).
1
(x/y)a sa -1 e- s ds =
Γ (α ) Γ (α)
Thus the Incomplete Gamma Function Γ(a; y) has been obtained as a limit of an appropriate
sequence of Incomplete Beta Functions β(a, b; x), with b = 1 + y/x, as x goes to zero.
Problems:

X follows a Beta Distribution with a = 3, b = 8, and θ = 1
27.1 (1 point) What is the density function at x = 0.6?

A. 0.05 B. 0.1 C. 0.2 D. 0.3 E. 0.4

A. less than 0.20
E. at least 0.35

A. less than 0.20
E. at least 0.35

A. less than 0.02
D. at least 0 .04 but less than 0.05
E. at least 0.05
27.5 (2 points) For a Beta Distribution with parameters a = 0.5, b = 0.5, and θ = 1, what is the
mode?
A. 1/4 B. 1/3 C. 1/5 D. 3/4 E. None of A, B, C, or D
27.6 (2 points) For a Generalized Beta Distribution with τ = 2, determine the second moment.
a b ab
A. θ2 B. θ2 C. θ2
a + b a + b (a + b) (a + b + 1)
a (a + 1) b (b + 1)
D. θ2 E. θ2
(a + b) (a + b + 1) (a + b) (a + b + 1)

• While partially disabled, an injured workersʼ “lost earnings capacity” is defined
based on the wage he could now earn as:
(preinjury wage - postinjury wage)/(preinjury wage) = lost wages / preinjury wage.
• Assume an injured workers “lost earnings capacity” is distributed via a Beta Distribution
with parameters a = 3, b = 2, and θ = 1.
• While partially disabled, Workersʼ Compensation weekly benefits are 70% times
the workersʼ lost wages, limited to 56% of the workersʼ preinjury wage.
27.7 (1 point) What is the average lost earnings capacity of injured workers?
A. less than 0.53
E. at least 0.62
27.8 (3 points) Where β(a,b; x) is the Incomplete Beta Function as defined in Appendix A of Loss
Models, what is the average ratio of weekly partial disability benefits to preinjury weekly wage?
Hint: Use the formula for E[X ∧ x] for the Beta Distribution, in Appendix A of Loss Models.
A. 0.42β(4, 2; 0.2) + 0.56β(2, 3; 0.8)
B. 0.56β(4, 2; 0.8) + 0.42β(2, 3; 0.2)
C. 0.42β(4, 2; 0.8) + 0.56β(2, 3; 0.2)
D. 0.56β(4, 2; 0.2) + 0.42β(2, 3; 0.8)
27.9 (2 points) On an exam, the grades of students are distributed via a Beta Distribution with
a = 4, b = 1, and θ = 100.
A grade of 70 or more passes.
What is the average grade of a student who passes this exam?
A. 82 B. 84 C. 86 D. 88 E. 90
27.10 (2, 5/83, Q.12) (1.5 points) Let X have the density function
f(x) = Γ(α + β) xα−1 (1 - x)β−1 / {Γ(α) Γ(β)}, for 0 < x < 1, where α > 0 and β > 0.
If β = 6 and α = 5, what is the expected value of (1 - X)-4?
A. 42 B. 63 C. 210 D. 252 E. 315
27.1. C. f(x) = {(a+b-1)! / (a-1)! (b-1)!} (x/θ)a-1{1-(x/θ)}b-1/θ = {10! / (2! 7!)} 0.62 0.47 = 0.212
27.2. B. f(x) = {10! / (2! 7!)}x2 (1-x)7 = 360x2 (1-x)7 . fʼ(x) = 720x(1-x)7 + 2520x2 (1-x)6 .
0 = fʼ(x) = 720x(1-x)7 + 2520x2 (1-x)6 . ⇒ 720(1 - x) = 2520x. ⇒ x = 0.222.
Comment: For a > 1 and b > 1, the mode is: θ(a - 1) / (a + b - 2) = (1)(3 - 1) / (3 + 8 - 2) = 2/9.
A graph of the density of this Beta Distribution:
density
3.0
2.5
2.0
1.5
1.0
0.5
x
0.2 0.4 0.6 0.8 1.0
27.3. C. Mean = θa/(a+b) = 3 / 11 = 0.2727.
27.4. A. Second moment = θ2a(a + 1 )/ {(a + b)(a + b + 1)} = (3)(4) / {(11)(12)} = 0.09091.
Variance = 0.09091 - 0.27272 = 0.0165.
27.5. E. f(x) is proportional to: x-1/2 (1 - x)-1/2.

fʼ(x) is proportional to: (-1/2)x-3/2 (1 - x)-1/2 + (1/2)x-1/2 (1 - x)-3/2.
Setting this equal to zero, and solving, x = 1/2. At x = 1/2, x-1/2 (1 - x)-1/2 = 2.
As x → 0, x-1/2 (1 - x)-1/2 → ∞. As x → 1, x-1/2 (1 - x)-1/2 → ∞.
Thus checking the endpoints of the support, the modes are at 0 and 1.
Comment: 1/2 is a minimum for this density,
f(x) = {Γ(1/2 + 1/2) / (Γ(1/2)Γ(/12))} x-1/2 (1 - x)-1/2 = x-1/2 (1 - x)-1/2 / π, 0 < x < 1:
density
1.5
1.0
0.5
x
0.2 0.4 0.6 0.8 1.0
Γ[a + b] Γ[a + k / τ]
27.6. A. E[Xk] = θk .
Γ[a] Γ[a + b + k / τ]
Γ[a + b] Γ[a + 2 / 2] Γ[a + b] Γ[a + 1] Γ[a + 1] Γ[a + b]

E[X2 ] = θ2 = θ2 = θ2
Γ[a] Γ[a + b + 2 / 2] Γ[a] Γ[a + b + 1] Γ[a] Γ[a + b + 1]
a
= θ2 .
a + b
27.7. D. The mean of a Beta Distribution is: θa/(a+b) = 3/(3+2) = 0.6.

27.8. C. Let y be the injured workersʼ lost earnings capacity. Let w be the workersʼ
pre-injury wage. Then the lost wages are yw. Thus the benefits are 0.7yw, limited to 0.56w.
Thus the ratio of weekly partial disability benefits to pre-injury weekly wage, r, is: 0 .7y limited to
0.56. For y ≥ 0.56/0.7 = 0.8, r = 0.56; for y ≤ 0.8, r = 0.7y. To get the average r, we need to
integrate with respect to f(y)dy for y = 0 to 1, dividing into two pieces depending on whether y is
less than or greater than 0.8.
0.8 1 0.8
∫0 (0.7y) f(y) dy + 0.8∫ (0.56) f(y) dy = 0.7{ ∫0 y f(y) dy + 0.8 S(0.8) } = 0.7 E[Y ∧ 0.8].
Thus the average ratio r has been written in terms of the Limited Expected Value; for the Beta
Distribution, E[X ∧ x] = θ(a/(a+b))β(a+1,b; x/θ) + x(1-β(b,a; x/θ)).
Thus for θ = 1, a = 3 and b = 2: 0.7 E[X ∧ 0.8] = 0.7{(3/(3+2))β(3+1,2;0.8) + 0.8(1-β(2,3;.8)} =
0.42β(4,2; 0.8) + 0.56β(2,3; 0.2).
Comments: 0.42β(4,2; 0.8) + 0.56β(2,3; 0.2) = (0.42)(0.7373) + (0.56)(0.1808) = 0.411.
Since β(a,b;1-x) = 1 - β(b,a;x), the solution could also be written as:
0.42β(4,2; 0.8) + 0.56{1 - β(3,2; 0.8)}.
27.9. D. f(x) = x3 / 25,000,000.

100
S(70) = ∫70 x3 dx / 25,000,000 = 0.7599.
100 100
∫70 x f(x) dx = 70∫ x4 dx / 25,000,000 = 66.5544.

The average grade of a student who passes this exam is: 66.5544 / 0.7599 = 87.58.
Comment: The average grade of all students is: (100)(4)/(4 + 1) = 80.
1 1
Γ(11) Γ(11)
27.10. A. E[(1 - X)-4] =
Γ(5) Γ(6) ∫0 x4 (1 - x)5 (1 - x)- 4 dx =
Γ(5) Γ(6) ∫0 x4 (1 - x) dx =
Γ(11) Γ(5) Γ(2) (10)(9)(8)(7)
= = 42.
Γ(5) Γ(6) Γ(7) (5)(4)(3)(2)
Alternately, let y = 1 - x. Then y has a Beta Distribution with a = 6, b = 5, and θ = 1.

Γ(6 + 5) Γ(6 - 4) Γ(1) Γ(2) (10!)(1!) (10)(9)(8)(7)
E[(1 - X)-4] = E[Y-4] = = = = = 42.
Γ(6) Γ(6 + 5 - 4) Γ(6) Γ(7) (5!)(6!) (5)(4)(3)(2)
Comment: X has a Beta Distribution with a = 5, b = 6, and θ = 1.

2016-C-2, Loss Distributions, §28 Transformed Beta HCM 10/21/15, Page 380
Section 28, Transformed Beta Distribution
You are extremely unlikely to be asked questions on your exam involving the 4 parameter
Transformed Beta Distribution.
Define the Transformed Beta Distribution on (0, ∞) in terms of the Incomplete Beta Function:
F(x) = β[τ, α ; xγ / (xγ + θγ)] = 1 - β[α , τ ; θγ / ( xγ + θγ)].
f(x) = {Γ(α+τ)/Γ(τ)Γ(α)} γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ).
As shown in Figure 5.4 and Appendix A of Loss Models, the Pareto, Generalized Pareto, Burr,
LogLogistic, and other distributions are special cases of the Transformed Beta Distribution.
In this parameterization, θ acts as a scale parameter, since everywhere x appears in F(x) it is divided
by θ. γ is a power transformation parameter as in the Loglogistic and Burr Distributions.
α is a shape parameter as in the Pareto. τ is another shape parameter as in the Generalized Pareto
or Inverse Pareto. With four parameters the Transformed Beta Distribution has a lot of flexibility to fit
different data sets.
For 0 < x < ∞, xγ / (xγ + θγ) is between 0 and 1, the domain of the Incomplete Beta Function.
Thus this transformation allows one to get a size of loss distribution from the Incomplete Beta
Function, which only has a domain from zero to one.
The moments of the Transformed Beta Distribution are:

θn Γ(α − n/γ) Γ(τ+ n/γ) / {Γ(α)Γ(τ)}, αγ > n > -ατ.
Only some moments exist; if αγ >1 the mean exists. If αγ >2 the second moment exists, etc.
Mean = θ Γ(α − 1/γ) Γ(τ+ 1/γ) / {Γ(α)Γ(τ)}, αγ > 1.
Provided αγ > 1, the mean excess loss exists and increases to infinity approximately linearly;
for large x, e(x) ≅ x / (αγ - 1). This tail behavior carries over to special cases such as: the Burr,
Generalized Pareto, Pareto, LogLogistic, ParaLogistic, Inverse Burr, Inverse Pareto, and Inverse
ParaLogistic. All have mean excess losses, when they exist, that increase approximately linearly for
large x.167
167
The mean excess loss of a Pareto, when it exists, is linear in x. e(x) = (x+θ)/(α-1).
Special Cases of the Transformed Beta:168
By setting any one of the three shape parameters equal to one, the Transformed Beta becomes a
three parameter distribution.169
For τ = 1, the Transformed Beta is a Burr.
For γ = 1, the Transformed Beta is a Generalized Pareto.
For α = 1, the Transformed Beta is an Inverse Burr.
In turn, one can fix one of the remaining shape parameters in one of these three parameter
distributions and obtain a two parameter distribution, each of which is also a special case of the
For γ = 1, the Burr is a Pareto.

For α = 1, the Burr is a LogLogistic.
For α = γ, the Burr is a ParaLogistic.
For τ = 1, the Generalized Pareto is a Pareto.

For α = 1, the Generalized Pareto is an Inverse Pareto.
For γ = 1, the Inverse Burr is an Inverse Pareto.

For τ = 1, the Inverse Burr is a LogLogistic.
For τ = γ, the Inverse Burr is an Inverse ParaLogistic.
Limiting Cases of the Transformed Beta Distribution:170
By taking appropriate limits one can obtain additional distributions from the Transformed Beta
Distribution. Examples include the Transformed Gamma, Inverse Transformed Gamma and the
LogNormal Distributions.
The first example is that the Transformed Gamma Distribution is a limiting case of the Transformed
Beta. In order to demonstrate this relationship weʼll use Sterlingʼs formula for the Gamma of large
arguments, which says that for large z: Γ(z) ≅ e-zzz-0.5 2 π .
168
See Figures 5.2 and 5.4 in Loss Models.
169
One can fix any of the parameters at any positive value, but for example the three parameter distribution that
results from fixing α = 2 does not have a name, because it does not come up as often in applications. Fixing the
scale parameter is much less common for practical applications than fixing one or more shape parameters.
170
See Figure 5.4 in Loss Models.
Weʼll also use the fact that the limit as z → ∞ of (1 + c/z)z is ec.
The latter fact follows from taking the limit of ln{(1 + c/z)z} = z ln(1 + c/z) ≅ z (c/z - (c/z)2 /2) =
c - c/2z2 ≅ c.
Exercise: Use Sterlingʼs Formula to approximate Γ(a+b)/{Γ(a)Γ(b)}, for very large a.
[Solution: Sterlingʼs Formula says for large z: Γ(z) ≅ e-zzz-0.5 2 π . Thus for large a:
Γ(a+b)/Γ(a) ≅ e-(a+b)(a+b)a+b-0.5 2 π / {e-a aa−.5 2 π } = e-b (1+b/a)a (a+b)b a / (a + b)
≅ e-b eb ab ≅ ab . Thus for large a, Γ(a+b)/{Γ(a)Γ(b)} ≅ ab / Γ(b).]
A Transformed Beta Distribution with parameters α, θ, γ and τ has density
f(x) = {Γ(α+τ)/Γ(τ)Γ(α)} γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ). Weʼll take limits of the Gamma Functions in front
and the rest of the density as two separate pieces and then put the results together.
Set q = θ/α1/γ and let α go to infinity, while holding q constant.171
Given the chosen constant q, then θ = qα1/γ.
Then the density of the Transformed Beta, other than the Gamma Functions is:
γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ) = γ q−γτα −τ xγτ−1 (1 + (x/q)γ /α)−(α + τ) =
γ q−γτα −τ xγτ−1 (1 + (x/q)γ /α)−τ /(1 + (x/q)γ /α)α ≅ γ q−γτα −τ xγτ−1/exp((x/q)γ) =
γ q−γτα −τ xγτ−1 exp(-(x/q)γ).
Where Iʼve used the fact that the limit as z → ∞ of (1+c/z)z is ec, with z = α and c = (x/q)γ.
Meanwhile, the Gammas in front of density are: Γ(α+τ)/{Γ(τ)Γ(α)}.
As shown in the above exercise, for large α, Γ(α+τ)/{Γ(τ)Γ(α)} ≅ ατ / Γ(τ).
Putting the two pieces of the density of the Transformed Beta Distribution together:
f(x) = {Γ(α+τ)/Γ(τ)Γ(α)} γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ) ≅
{ α τ /Γ(τ) }γ q−γτα −τ xγτ−1 exp[-(x/q)γ] = γ q−γτ xγτ−1 exp[-(x/q)γ] / Γ(τ).
This is the density of a Transformed Gamma Distribution, with scale parameter q, with what is
normally the α parameter given as τ, and what is normally the τ parameter given as γ.
171
q will turn out to be the scale parameter of the limiting Transformed Gamma Distribution.
Thus the Transformed Gamma Distribution has been obtained as a limit of a series of Transformed
Beta Distributions, with q = θ/α1/γ and letting α go to infinity, while holding q constant. Then q is the
scale parameter of the limiting Transformed Gamma. The τ parameter of the limiting Transformed
Gamma is the γ parameter of the Transformed Beta. The α parameter of the limiting Transformed
Gamma is the τ parameter of the Transformed Beta.172
In terms of Distribution Functions:

lim β[τ, α ; xγ / (xγ + α qγ)] = Γ[τ ; (x/q)γ].
α →∞
For the Transformed Beta Distribution, and its special cases, as alpha approaches infinity, in the limit
we get a Transformed Gamma, and its special cases:
Transformed Beta → Transformed Gamma
Generalized Pareto → Gamma
Burr → Weibull
Pareto → Exponential
Note that in each case since we taken the limit as alpha approaches infinity, the limiting distribution
has one fewer shape parameter than the distribution whose limit we are taking.
Exercise: What is the limit of a Pareto Distribution, as α goes to infinity while θ = 100α?
[Solution: This is special case of the limit of a Transformed Beta. Using the above result, in this case,
the limit is an Exponential Distribution with scale parameter 100.
Alternately, for the Pareto S(x) = 1/{1+ (x/θ)}α = 1/{1 + (x/100)/α}α. As α approaches infinity,
S(x) approaches 1/exp(x/100) = e-x/100. This is an Exponential Distribution with mean 100.]
Exercise: What is the limit of a Burr Distribution, with γ = 3, as α goes to infinity while θ = 25α1/3?
the limit is an Weibull Distribution with scale parameter 25 and τ = 3. Alternately, for the Burr with
γ = 3, S(x) = 1/{1+ (x/θ)3 }α = 1/{1+ (x/25)3 /α}α. As α approaches infinity, S(x) approaches
1/exp[(x/25)3 ] = exp[-(x/25)3 ]. This is an Weibull Distribution with scale parameter 25 and τ = 3.]
172
The fact that the tau parameter of the limiting Transformed Gamma Distribution is not the same tau as that of the
Transformed Beta Distribution is due to the manner in which Loss Models has chosen to parametrize both
distributions.
Similarly, for the Transformed Beta Distribution, and its special cases, as tau approaches infinity,
in the limit we get a Inverse Transformed Gamma, and its special cases:
Transformed Beta → Inverse Transformed Gamma
Generalized Pareto → Inverse Gamma
Inverse Burr → Inverse Weibull
Inverse Pareto → Inverse Exponential
A Transformed Beta Distribution with parameters α, θ, γ and τ has density
f(x) = {Γ(α+τ)/(Γ(τ)Γ(α))} γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ). Set q = θτ1/γ and let τ go to infinity, while
holding q constant. Given the chosen constant q, then θ = q/τ1/γ.
Then the density of the Transformed Beta, other than the Gamma Functions is:
γ θ−γτ xγτ−1 (1 + (x/θ)γ)−(α + τ) = γ θ−γτ xγτ−1 (x/θ)-γ(α + τ) (1 + (θ / x)γ)−(α + τ) =
γ θγα x−(γα+1) (1 + (θ / x)γ) −τ (1 + (θ / x)γ) −α = γ qγα τ−α x−(γα+1) (1 + (q / x)γ/τ)−τ (1 + (q / x)γ/τ)−α ≅
γ qγα τ−α x−(γα+1) exp(-(x/q)γ) (1) ≅ γ qγα τ−α x−(γα+1) exp(-(x/q)γ).173
As shown previously in an exercise, for large τ, {Γ(α+τ)/Γ(τ)Γ(α)} ≅ τα / Γ(α).

Putting the two pieces of the density of the Transformed Beta Distribution together:
f(x) = {Γ(α+τ)/(Γ(τ)Γ(α))} γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ) ≅
{τα / Γ(α)} γ qγα τ−α x−(γα+1) exp[-(x/q)γ] = γ qγα τ−α x−(γα+1) exp[-(x/q)γ] / Γ(α).
This is the density of an Inverse Transformed Gamma Distribution, with scale parameter q, with the
usual α parameter, and what is normally the τ parameter given as γ.
Thus the Inverse Transformed Gamma Distribution has been obtained as a limit of a series of
Transformed Beta Distributions, with q = θτ1/γ and let τ go to infinity, while holding q constant. Then q
is the scale parameter of the limiting Inverse Transformed Gamma. The τ parameter of the limiting
Inverse Transformed Gamma is the γ parameter of the Transformed Beta. The α parameter of the
limiting Inverse Transformed Gamma is the α parameter of the Transformed Beta.
In terms of Distribution Functions: lim β[τ, α ; xγ / (xγ + q γ /τ)] = 1 - Γ[α ; (q/x)γ].

τ→∞
173
Where Iʼve use the fact that the limit as z → ∞ of (1 + c/z)-z is e-c, with z = τ and c = (x/q)γ.
Note that this result could have been obtained from the previous one:
lim β[τ, α ; xγ / (xγ + α qγ)] = Γ[τ ; (x/q)γ]. Since β[a, b;x] = 1- β[a, b; 1-x],
α →∞
lim β[τ, α ; xγ / (xγ + q γ /τ)] = lim 1 - β[ α, τ ; qγ/τ / (xγ + q γ/τ)] =

τ→∞ τ→∞
lim 1 - β[ α, τ ; qγ / (qγ + τ xγ)] = 1 - Γ[τ ; (q/x)γ].

τ→∞
Exercise: What is the limit of a Generalized Pareto Distribution, with α = 7, as τ goes to infinity while
θ = 33/τ?
the limit is an Inverse Gamma Distribution with scale parameter 33 and α = 7.
Alternately, for the Generalized Pareto Distribution with α = 7,
Γ(7 + τ) −τ τ−1 Γ(7 + τ)
f(x) = θ x {1+(x/θ)}−(7 + τ) = 33−τττ xτ−1 {1+τ(x/33)}−(7 + τ) =
Γ(τ) Γ(7) Γ(τ) Γ(7)
Γ(7 + τ)
33−τ ττ xτ−1 {τ(x/33)}−τ {1 + (33/x)/τ}−τ {1+τ(x/33)}-7 =
Γ(τ) Γ(7)
Γ(7 + τ) -1
x {( 1 + (33/x)/τ}−τ {1+τ(x/33)}-7.
Γ(τ) Γ(7)
As τ approaches infinity, this approaches:
{τ7/Γ(7)} x-1 exp(-33/x) {τ(x/33)}-7 = 337 x-8 exp(-33/x) /Γ(7).

This is the density of an Inverse Gamma Distribution with scale parameter 33 and α = 7.]
Exercise: What is the limit of an Inverse Burr Distribution, with γ = 4, as τ goes to infinity while
θ = 13/τ1/4 ?
the limit is an Inverse Weibull Distribution with scale parameter 13 and τ = 4.
Alternately, for the Inverse Burr with γ = 4, S(x) = 1/{1 + (θ /x)4 }τ = 1/{1 + (13/x)4 /τ}τ.
As τ approaches infinity, S(x) approaches 1/exp[(13/x)4 ] = exp[-(13/x)4 ].
This is an Inverse Weibull Distribution with scale parameter 13 and τ = 4.]
Problems:
28.1 (1 point) Which of the following are special cases of the Transformed Beta Distribution?
1. Beta Distribution
2. ParaLogistic Distribution
3. Inverse Gaussian Distribution
A. None of 1, 2 or 3 B. 1 C. 2 D. 3 E. None of A, B, C, or D
28.2 (1 point) Which of the following can be obtained as limits of Transformed Beta Distributions?
1. Weibull Distribution
2. Inverse Gamma Distribution
3. Single Parameter Pareto Distribution
A. None of 1, 2 or 3 B. 1 C. 2 D. 3 E. None of A, B, C, or D
28.3 (3 points) Determine the limit of Pareto Distributions, as θ and α go to infinity while θ = 10α.
28.4 (2 points) Calculate the density function at 14, f(14), for a Transformed Beta Distribution with
α = 3, θ = 10, τ = 4, and γ = 6.
Γ(α + τ)
Hint: f(x) = γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ).
Γ(τ) Γ(α)
A. less than 0.02
E. at least 0.05
28.5 (1 point) Match the Distributions:

1. Pareto a. Transformed Beta with α = 1 and τ = 1
2. LogLogistic b. Transformed Beta with τ = 1 and γ = 1
3. Inverse Pareto c. Transformed Beta with α = 1 and γ = 1
A. 1a, 2b, 3c B. 1a, 2c, 3b C. 1b, 2a, 3c
D. 1b, 2c, 3a E. 1c, 2b, 3a
28.6 (2 points) What is the limit of Inverse Pareto Distributions, as τ goes to infinity while θ = 7/τ?
A. An Exponential Distribution, with scale parameter 7.
B. An Exponential Distribution, with scale parameter 1/7.
C. An Inverse Exponential Distribution, with scale parameter 7.
D. An Inverse Exponential Distribution, with scale parameter 1/7.
28.1. C. 1. While the Transformed Beta Distribution can be derived from the Beta Distribution,
the Beta has support [0,1], while the Transformed Beta Distribution has support x > 0.
The Beta is not a special case of a Transformed Beta.
2. The ParaLogistic Distribution is a special case of a Transformed Beta, with τ = 1 and γ = α.
3. The Inverse Gaussian Distribution is not a special case of a Transformed Beta Distribution.
28.2. E. 1. Yes. By taking appropriate limits of Burr Distributions, Transformed Beta Distributions
with τ = 1, one can obtain a Weibull Distribution.
2. Yes. By taking appropriate limits of Generalized Pareto Distributions, Transformed Beta
Distributions with γ = 1, one can obtain an Inverse Gamma Distribution
3. No. The Single Parameter Pareto has support x > θ, while the Transformed Beta Distribution has
support x > 0.
θα (10α)α 1
28.3. S(x) = = = .
(θ + x)α (10α + x) α {1 + x / (10α)}α
lnS(x) = -α ln[1 + x/(10α)].
Now for α big, x/(10α) is small, and ln[1 + x/(10α)] ≅ x/(10α).
Therefore, for alpha big, lnS(x) ≅ -α x/(10α) = -x/10.
Thus for alpha large, S(x) ≅ exp[-x/10].

The limit is an Exponential Distribution with mean 10.
Γ(α + τ)
28.4. B. f(x) = γ (x/θ)γτ (1+(x/θ)γ)−(α + τ)/x.
Γ(τ) Γ(α)
Γ(7) (60)(6)(3214.2)
f(22) = (6) (1.4)24 (1+(1.4)6 )-7/ 14 = = 0.025.
Γ(4) Γ(3) (3,284,565)(14)
28.5. C. 1b, 2a, 3c.

28.6. C. This is special case of the limit of a Transformed Beta.

The limit is an Inverse Transformed Gamma with α = 1 and γ = 1 and scale parameter 7.
That is an Inverse Exponential with scale parameter 7.
Alternately, for the Inverse Pareto, F(x) = {x/(x+θ)}τ = (1+θ/x)−τ = {1+(7/x)/τ}−τ.
As τ approaches infinity F(x) approaches 1/exp(7/x) = exp(-7/x).
This is an Inverse Exponential with scale parameter 7.
2016-C-2, Loss Distributions, §29 Additional Distributions HCM 10/21/15, Page 390
Section 29, Producing Additional Distributions
Given a light-tailed distribution, one can produce a more heavy-tailed distribution by looking at the
inverse of x. Let G(x) = 1 - F(1/x).174 For example, if F is a Gamma Distribution, then G is an
Inverse Gamma Distribution. For a Gamma Distribution with θ = 1, F(x) = Γ[α; x].
Letting y = θ/x, G(y) = 1 - F(x) = 1 - F(θ/y) = Γ[α; θ/y], the Inverse Gamma Distribution.
Given a Distribution, one can produce another distribution by adding up independent identical
copies. For example, adding up α independent Exponential Distributions gives a Gamma
Distribution. As α approaches infinity one approaches a very light-tailed Normal Distribution.
One can get a more heavy-tailed distribution by the change of variables y = ln(x).
Let G(x) = F[ln(x)]. For example, if F is the Normal Distribution, then G is the heavier-tailed
LogNormal Distribution. Loss Models refers to this as "exponentiating", since if y = ln(x),
then x = ey.
One can get new distributions by the change of variables y = x1/τ. Loss Models refers to this as
"raising to a power". Let G(x) = F[x1/τ], τ > 0. For example, if F is the Exponential Distribution with
mean θ, then G is a Weibull Distribution, with scale parameter θτ. For τ > 1 the Weibull Distribution
has a lighter tail than the Exponential Distribution. For τ < 1 the Weibull Distribution has a heavier tail
than the Exponential Distribution.
For τ > 0, Loss Models refers to the new distribution as transformed, for example Transformed
Gamma versus Gamma.175
If τ < 0, then G(x) = 1 - F[x1/τ].

For τ < 0, this is called the inverse transformed distribution, such as the Inverse Transformed
Gamma versus the Gamma. This can be usefully thought of as two separate changes of variables:
raising to a positive power and inverting.
For the special case, τ = -1, this is the inverse distribution as discussed previously, such as the
Inverse Gamma versus the Gamma.
174
We need to subtract from one, so that G(0) = 0 and G(∞) = 1.
175
However, some distribution retain their special names. For example the Weibull is not called the transformed
Exponential, nor is the Burr called the transformed Pareto.
Exercise: X is Exponential with mean 10. Determine the form of the distribution of Y = X3 .
[Solution: F(x) = 1 - exp[-x/10]. Y = X3 . X = Y1/3.
FY(y) = FX[x] = FX[y1/3] = 1 - exp[-y1/3/10] = 1 - exp[-(y/1000)1/3].
A Weibull Distribution with θ = 1000 and τ = 1/3.

Alternately, FY(y) = Prob[Y ≤ y] = Prob[X3 ≤ y] = Prob[X ≤ y1/3] = 1 - exp[-y1/3/10].
FY(y) = 1 - exp[-(y/1000)1/3]. A Weibull Distribution with θ = 1000 and τ = 1/3.

Alternately, f(x) = exp[-x/10]/10.
fY(y) = fX[y1/3] |dx/dy| = exp[-y1/3/10]/10 (1/3)y-2/3 = (1/3) (y/1000)1/3 exp[-(y/1000)1/3] / y.
A Weibull Distribution with θ = 1000 and τ = 1/3.

Comment: A change of variables as in calculus class.]
One can get additional distributions as a mixture of distributions. As will be discussed in a

subsequent section, the Pareto can be obtained as a mixture of an Exponential by an Inverse
Gamma.176 Usually such mixing produces a heavier tail; the Pareto has a heavier tail than an
Exponential. The Negative Binomial which can be obtained as a mixture of Poissons via a Gamma
has a heavier tail than the Poisson. Loss Models refers to this as Mixing. Another method of getting
new distributions is to weight together two or more existing distributions. Such mixed distributions,
referred to by Loss Models as n-point or two-point mixtures, are discussed in a subsequent
section.177
One can get additional distributions as a ratio of independent variables each of which follows a
known distribution. For example an F-distribution is a ratio of Chi-Squares.178 As a special
case, the Pareto can be obtained as a ratio of an Exponential variable and a Gamma
variable.179 The Beta Distribution can be obtained as a combination of two Gammas.180
Generally the distributions so obtained have heavier tails.
176
Loss Models Example 5.4 shows that mixing an exponential via an inverse exponential yields a Pareto
Distribution. This is just a special case of the Inverse Gamma-Exponential, with mixed distribution a Pareto.
Example 5.6 shows that mixing an Inverse Weibull via a Transformed Gamma with the same τ parameter, gives an
Inverse Burr Distribution.
177
Loss Models, Section 4.2.3.
178
The F-Distribution from Statistics is related to the Generalized Pareto Distribution.
179
See p. 47, Loss Distributions by Hogg & Klugman.
180
If X is a random draw from a Gamma Distribution with shape parameter α and scale parameter θ, and Y is a random
draw from a Gamma Distribution with shape parameter β and scale parameter θ,
then Z = X / (X+Y) is a random draw from a Beta Distribution with parameters α and β.
Finally, one can introduce a scale parameter. If one had the distribution F(x) = 1 - e-x, one can create
a family of distributions by substituting x/θ everywhere x appears. θ is now a scale parameter.
F(x) = 1 - e-x/θ. Introducing a scale parameter does not affect either the tail behavior or the shape of
the distribution. Loss Models refers to this as "multiplying by a constant".
10
Exercise: X is uniform from 0 to 0.1. Y = - 10. Determine the distribution of Y.
x
[Solution: F(x) = 10x, 0 ≤ x ≤ 0.1. X = 10/(Y + 10)2 .

x = 0 ⇔ y = ∞. x = 0.1 ⇔ y = 0. Small Y ⇔ Large X. ⇒ Need to take 1 - F(x).
FY(y) = 1 - FX[10/(y + 10)2 ] = 1 - 102 /(y + 10)2 . A Pareto Distribution with α = 2 and θ = 10.
10 10
Alternately, FY(y) = Prob[Y ≤ y] = Prob[ - 10 ≤ y] = Prob[ ≤ y + 10] =
x x
Prob[10/X ≤ (y + 10)2 ] = Prob[X ≥ 10/(y + 10)2 ] = 1 - Prob[X ≤ 10/(y + 10)2 ] =

1 - (10){10/(y + 10)2 } = 1 - 102 /(y + 10)2 .
FY(y) = 1 - 102 /(y + 10)2 , a Pareto Distribution with α = 2 and θ = 10.
Alternately, f(x) = 10, 0 ≤ x ≤ 0.1.
fY(y) = fX[10/(y + 10)2 ] |dx/dy| = (10) 20/(Y + 10)3 = (2) 102 /(Y + 10)3 .
A Pareto Distribution with α = 2 and θ = 10.]
Percentiles:
A one-to-one monotone transformation, such as ln(x), ex, or x2 , preserves the percentiles, including
the median. For example, the median of a Normal Distribution is µ, which implies that the median of a
LogNormal Distribution is eµ.

Problems:
29.1 (4 points) X follows a Standard Normal Distribution, with mean zero and standard deviation
of 1. Y = 1/X.
(a) (1 point) What is the density of y?
(b) (2 points) Graph the density of y.
(c) (1 point) What is E[Y]?
29.2 (1 point) X follows an Exponential Distribution with mean 1. Let Y = θ exp[X/α].

29.3 (3 points) X follows a Weibull Distribution with parameters θ and τ. Let Y = -ln[X].
What are the algebraic forms of the distribution and density functions of Y?
29.4 (3 points) ln(X) follows a LogNormal Distribution with µ = 1.3 and σ = 0.4.
Determine the density function of X.
29.5 (3 points) X follows a Standard Normal Distribution, with mean 0 and standard deviation of 1.
Let Yτ = X2 , for τ > 0. Determine the form of the Distribution of Y.
29.6 (2 points) Let X follows the density f(x) = e-x / (1 + e-x)2 , -∞ < x < ∞.
Let Y = θ eX / γ , for θ > 0, γ > 0. Determine the form of the distribution of Y.
29.7 (2 points) Let X be a uniform distribution from 0 to 1.

1 - p
Let Y = θ ln[ ], for θ > 0, 1 > p > 0. Determine the form of the distribution function of Y.
1 - pX
29.8 (2 points) X follows an Exponential Distribution with hazard rate λ. Let Y = exp[-δX].
29.9 (3 points) X is uniform on (0, 1). Y is uniform on (0, x ).

(a) What is the distribution of Y?
(b) Determine E[Y].
29.10 (1 point) X follows an Exponential Distribution with mean 10. Let Y = 1/X.

• The random variable X has a Normal Distribution, with mean zero and standard deviation σ.
• The random variable Y also has a Normal Distribution, with mean zero and standard deviation σ.
• X and Y are independent.
• R2 = X2 + Y2 .
Determine the form of the distribution of R.
Hint: The sum of squares of ν independent Standard Normals is a Ch-Square Distribution with v
degrees a freedom, which is a Gamma Distribution with α = ν/2 and θ = 2.
29.12 (CAS Part 2 Exam, 1965, Q. 44) (1.5 points)

Given the probability density function f(x) = x/2, 0 ≤ x ≤ 2.
Find the probability density function for y, where y = x2 /2.
29.13 (2, 5/90, Q.36) (1.7 points) If Y it uniformly distributed on the interval (0, 1) and if
Z = -a ln(1 - Y) for some a > 0, then to which of the following families of distributions does Z belong?
A. Pareto B. LogNormal C. Normal D. Exponential E. Uniform
• The random variable X has a Pareto distribution, with parameters θ and α.
• Y is defined to be ln(1 + X/θ).
Determine the form of the distribution of Y.
A. Negative Binomial B. Exponential C. Pareto D. Lognormal E. Normal
29.15 (IOA 101, 4/00, Q.13) (3.75 points) Suppose that the distribution of a physical coefficient,
X, can be modeled using a uniform distribution on (0, 1).
A researcher is interested in the distribution of Y, an adjusted form of the reciprocal of the coefficient,
where Y = (1/X) - 1.
(i) (2.25 points) Determine the probability density function of Y.
(ii) (1.5 points) Determine the mean of Y.
29.16 (1, 11/01, Q.13) (1.9 points) An actuary models the lifetime of a device using the random
variable Y = 10X0.8, where X is an exponential random variable with mean 1 year.
Determine the probability density function f(y), for y > 0, of the random variable Y.
(A) 10 y0.8 exp[-8y-0.2]
(B) 8 y-0.2 exp[-10y0.8]
(C) 8 y-0.2 exp[-(0.1y)1.25]
(D) (0.1y)1.25 exp[-1.25(0.1y).25]
(E) 0.125 (0.1y)0.25 exp[-(0.1y)1.25]
29.17 (CAS3, 11/05, Q.19) (2.5 points)

Claim size, X, follows a Pareto distribution with parameters α and θ.
A transformed distribution, Y, is created such that Y = X1/τ.

Which of the following is the probability density function of Y?
A. τθyτ−1 / (y + θ)τ+1
B. αθατy τ−1 / (yτ + θ)α+1
C. θαθ / (y + α)θ+1
D. ατ(y/θ)τ / {y[1 + (y/θ)τ]α+1}
E. αθα / (yτ + θ)α+1
29.18 (CAS3, 5/06, Q.27) (2.5 points)

The following information is available regarding the random variables X and Y:
• X follows a Pareto Distribution with α = 2 and θ = 100.
• Y = ln[1 + (X/θ)]
Calculate the variance of Y.
A. Less than 0.10
E. At least 0.40
29.1. (a) f(x) = exp[-.5 x2 ]/ 2 π . x = 1/y. dx/dy = -1/y2 .

g(y) = f(x) |dx/dy| = (exp[-0.5/y2 ]/y2 ) / 2π , -∞ < y < ∞.
(b) This density is zero at zero, is symmetric, and has maximums at ±1/ 2 :
density
0.30
0.25
0.20
0.15
0.10
0.05
x
-6 -4 -2 2 4 6
∞
(c) E[Y] = 2 ∫0 (exp[-0.5 / y2]/ y) dy / 2 π . Now as y → ∞, exp[-0.5/y2 ] → e0 = 1.
Therefore, for large values of y, the integrand is basically 1/y, which has no finite integral since
ln[∞] = ∞. Therefore, the first moment of Y does not exist.
Comment: This is an Inverse Normal Distribution, none of whose positive moments exist.
29.2. F(x) = 1 - exp[-x]. y = θ exp[x/α]. ⇒ exp[x/α] = y/θ. ⇒ exp[x] = (y/θ)α.
Substituting into F(x), F(y) = 1 - (θ/y)α, a Single Parameter Pareto Distribution.

Comment: While x goes from 0 to ∞, y goes from θ exp[0/α] = θ to ∞.
In general, if ln[Y] follows a Gamma, then Y follows what is called a LogGamma.
A special case is when ln[Y] is Exponential with mean α. Then Y follows a LogExponential, which is
just a Single Parameter Pareto Distribution with θ = 1.
29.3. F(x) = 1 - exp[-(x/θ)τ], x > 0. y = -ln[x]. ⇒ x = e-y. x = 0 ⇔ y = ∞, and x = ∞ ⇔ y = -∞.

Since large x corresponds to small y, we need to substitute into S(x) rather than F(x).
Substituting into S(x), F(y) = exp[-(e-y/θ)τ] = exp[-e-τy/θτ], -∞ < y < ∞.
Differentiating, f(y) = exp[-e-τy/θτ] τ e-τy/θτ, -∞ < y < ∞.
Alternately, for the Weibull Distribution, f(x) = τxτ−1 exp(-(x/θ)τ) / θτ.
f(y) = f(x) |dx/dy| = {τ e-y(τ−1) exp(-(e-y/θ)τ) / θτ} e-y = exp[-e-τy/θτ] τ e-τy/θτ, -∞ < y < ∞.
Comment: This distribution is sometimes called the Gumbel Distribution.
For τ = 1 and θ = 1, F(y) = exp[-e-y], and f(x) = exp[-y - e-y], -∞ < y < ∞, which is a form of what is
called the Extreme Value Distribution, the Fisher-Tippet Type I Distribution, or the Doubly
29.4. The LogNormal Distribution has support starting at 0, so we want ln(x) > 0. ⇒ x > 1.
F(x) = LogNormal Distribution at ln(x): Φ[{ln(ln(x)) - 1.3}/0.4].
f(x) = φ[{ln(ln(x)) - 1.3}/0.4] d ln(ln(x))/dx =
{exp[-{ln(ln(x)) - 1.3}2 /(2 0.42 )]/ (.4 2 π ) } / {x ln(x)} =
exp[-3.125{ln(ln(x)) - 1.3}2 ] / {0.4 x ln(x) 2π }, x > 1.
Comment: Beyond what you are likely to be asked on your exam. Just as the LogNormal
Distribution has a much heavier righthand tail than the Normal Distribution, the
“LogLogNormal” Distribution has a much heavier righthand tail than the LogNormal
Distribution.
29.5. f(x) = exp[-x2 /2]/ 2 π . X = Yτ/2.

Since x is symmetric around zero, but x2 ≥ 0, we need to double the density of x.
g(y) = 2 f(x) |dx/dy| = 2{exp[-yτ/2]/ 2 π }(τ/2)yτ/2 − 1 = τ yτ/2 − 1 exp[-yτ/2] / 2π .
The density of a Transformed Gamma Distribution is: f(x) = τ xτα−1 exp[-xτ/ θτ] / {θτα Γ(α)}.
Matching parameters, τα = τ/2, and 2 = θτ. The density of y is a Transformed Gamma
Distribution with parameters α = 1/2, τ, and θ = 21/τ.
Comment: θτα Γ(α) = (θτ)α Γ(1/2) = 21/2 π = 2π .

If τ = 1, then Y has a Gamma Distribution with α = 1/2 and θ = 2,
which is a Chi-Square Distribution with one degree of freedom.
29.6. By integration, F(x) = 1/(1 + e-x) = ex / (1 + ex), -∞ < x < ∞.

γ
γ (y / θ)
ex = (y / θ) . Therefore, FY(y) = γ , y > 0. This is a Loglogistic Distribution.
1 + (y / θ)
Comment: The original distribution is called a Logistic Distribution. The Loglogistic has a similar
relationship to the Logistic Distribution, as the LogNormal has to the Normal.
1 - p 1 - p
29.7. y/θ = ln[ x ]. ⇒ ey/θ = . ⇒ (1-p) e-y/θ = 1 - px. ⇒ px = 1 - (1-p) e-y/θ. ⇒
1 - p 1 - px
x ln(p) = ln[1 - (1-p) e-y/θ]. ⇒ x = ln[1 - (1-p) e-y/θ] / ln(p).

For x = 1, y = 0, while as x approaches zero, y approaches infinity.
Since X is uniform, FX(x) = x. ⇒ FY(y) = 1 - ln[1 - (1-p) e-y/θ ] / ln(p), y > 0.
- y/ θ
-e (1- p)
Comment: The density is fY(y) = , y > 0.
- y/ θ
{1 - (1- p)e } θ ln[p]
The distribution of Y is called an Exponential-Logarithmic Distribution.
As p approaches 1, the distribution of Y approaches an Exponential Distribution.
The Exponential-Logarithmic Distribution has a declining hazard rate.
If frequency follows a Logarithmic Distribution, and severity is Exponential, then the minimum
of the claim sizes follows an Exponential-Logarithmic Distribution.
Here is a graph comparing the density of an Exponential with mean 100
and an Exponential-Logarithmic Distribution with p = 0.2 and θ = 100:
density
0.025
0.020
Exponential-Logarithmic
0.015
0.010
Exponential
0.005
size
50 100 150 200 250 300
29.8. F(x) = 1 - exp[-λx]. X = -ln[Y]/δ.

When x is big y is small and vice-versa. As x goes from zero to infinity, y goes from 1 to 0.
Therefore, we get the distribution function of Y by plugging into the survival function of X:
F(y) = exp[-λ (-ln[y]/δ)] = yλ/δ, 0 < y < 1. ⇒ f(y) = (λ/δ) yλ/δ - 1, 0 < y < 1.
Y follows a Beta Distribution, with parameters θ = 1, a = λ/δ, and b = 1.
Comment: If X is the future lifetime, and δ is the force of interest, then Y is the present value of a life
insurance that pays 1. The actuarial present value of this insurance is:
a λ/δ λ
E[Y] = θ = = .
a + b λ/δ + 1 λ + δ
As discussed in Life Contingencies, if the distribution of future lifetimes is Exponential with hazard
λ
rate λ, then is the actuarial present value of a life insurance that pays 1.
λ + δ
29.9. a. F[y | x] = y/ x for 0 ≤ y ≤ x . F[y | x] = 1 for y > x.

⎧ 1 for 0 ≤ x ≤ y2
In other words, F[y | x] = ⎨ .
⎩y / x for y2 ≤ x ≤ 1
y2 1 x =1
Thus, F[y] = ∫0 1 dx + y∫2 y / x dx = y2 + 2y x ] = y2 + 2y - 2y2 = 2y - y2 , 0 ≤ y ≤ 1.
x = y2
1
b. S(y) = 1 + y2 - 2y. E[Y] =
∫0 1 + y2 - 2y dy = 1 + 1/3 - 1 = 1/3.
Alternately, f(x) = 2 - 2y, 0 ≤ y ≤ 1.
This is a Beta Distribution with a = 1, b = 2, and θ = 1.
E[Y] = θ a / (a+ b) = 1/3.
29.10. F(x) = 1 - e-x/10. Let G be the distribution function of Y.

G(y) = 1 - F(x) = 1 - F(1/y) = exp[-0.1/y].
This is an Inverse Exponential Distribution with θ = 0.1.
Comment: We need to subtract from one, so that G(0) = 0 and G(∞) = 1.
29.11. For σ = 1, R2 is the sum of two unit Normals, and thus a Chi-Square with 2 degrees of
freedom, which is an Exponential Distribution with θ = 2.
Now R = (R2 )1/2, so we have a power transformation, and thus R is Weibull with τ = 2.
Specifically, the survival function of R is: S(r) = survival function of R2 = exp[-r2 /2].
Now if σ ≠ 1, we just have a scale transformation, and r is divided by σ wherever r appears in the
survival function:
r2 ⎛ r ⎞2
S(r) = exp[- ] = exp[- ⎜ ⎟ ].
2 σ2 ⎝σ 2⎠
R follows a Weibull Distribution with τ = 2 and θ = σ 2.

Comment: This is called a Rayleigh Distribution.
In general, if X is Exponential with mean 1, then X1/τ is Weibull with θ =1 and τ.
29.12. F(x) = x2 /4. Thus F(y) = (2y) / 4 = y/2, 0 ≤ y ≤ 2. f(y) = 1/2, 0 ≤ y ≤ 2.

Alternately, f(y) = f(x) / (dy/dx) = (x/2) / x = 1/2.
29.13. D. F(z) = Prob[Z ≤ z] = Prob[-a ln(1 - Y) ≤ z] = Prob[ln( 1 - Y) ≥ -z/a] =

Prob[1 - Y ≥ e-z/a] = Prob[1 - e-z/a ≥ Y] = 1 - e-z/a. An Exponential Distribution with θ = a.
Comment: For Y uniform on [0, 1], Prob[Y ≤ y] = y.
This is the basis of one way to simulate an Exponential Distribution.
Z = -a ln(Y), also follows an Exponential Distribution with θ = a, which is the basis of another way
to simulate an Exponential Distribution.
29.14. B. If y = ln(1+ x/θ), then dy/dx = (1/θ) / (1+x/θ) = 1/(θ+x). Note that ey = 1+ x/θ.
g(y) = f(x) / |dy/dx| = {(αθα)(θ + x)−(α + 1)} / (1/(θ+x)) = α(1 + x/θ)−α = α(ey)−α = αe−αy.
Thus y is distributed as per an Exponential.
Comment: See for example page 107 of Insurance Risk Models by Panjer and Willmot, not on the
Syllabus.
29.15. (i) F(x) = x, 0 < x < 1. y = 1/x - 1. ⇒ x = 1/(1 + y). ⇒ F(y) = 1 - 1/(1+y), 0 < y < ∞.
⇒ f(y) = 1/(1+y)2 , 0 < y < ∞.

Alternately, f(x) = 1, 0 < x < 1. dy/dx= -1/x2 . f(y) = f(x)/(|dy/dx|) = 1/(1+y)2 .
When x is 0, y is ∞, and when x = 1, y is 0. ⇒ f(y) = 1/(1+y)2 , 0 < y < ∞.
(ii) Y follows a Pareto Distribution with α = 1 (and θ = 1), and therefore the mean does not exist.
Alternately, E[Y] is the integral from 0 to ∞ of y/(1+y)2 , which does not exist, since for large y the
integrand acts like 1/y.
29.16. E. S(x) = exp[-x]. y = 10x0.8. ⇒ x = (y/10)1.25. ⇒ S(y) = exp[-(y/10)1.25].

f(y) = 1.25 y0.25 exp[-(y/10)1.25] / 101.25 = 0.125 (0.1y)0 . 2 5 exp[-(0.1y)1 . 2 5] .
Comment: Y follows a Weibull Distribution with τ = 1.25 and θ = 10.
29.17. B. Y= X1/τ. x = yτ. dx/dy = τ yτ−1.
f(x) = αθα/(x + θ)α+1.
f(y) = dF/dy = dF/dx dx/dy = {αθα/(x + θ)α+1} τ yτ−1 = {αθα/(yτ + θ)α+1} τ yτ−1
= αθατyτ−1 / (yτ + θ)α+1.
Alternately, F(x) = 1 - {θ/(x + θ)}α. x = yτ. ⇒ F(y) = 1 - {θ/(yτ + θ)}α.
Differentiating with respect to y, f(y) = αθατyτ−1 / (yτ + θ)α+1.

Comment: Basically, just a change of variables from calculus. The result is a Burr Distribution, but
with a somewhat different treatment of the scale parameter than in Loss Models.
If τ = 1, one should just get the density of the original Pareto. This is not the case for choices A and
C, eliminating them. While it is not obvious, choice D does pass this test.
29.18. C. Y = ln(1 + (X/θ)). ⇔ X = θ(eY - 1) = 100(eY - 1).

F(x) = 1 - {100/(100 + x)}2 . F(y) = 1 - {100/(100 + 100(ey - 1))}2 = 1 - e-2y.
Thus Y follows an Exponential Distribution with θ = 1/2, and variance θ2 = 1/4.
2016-C-2, Loss Distributions, §30 Tails HCM 10/21/15, Page 403
Section 30, Tails of Loss Distributions
Actuaries are often interested in the behavior of a size of loss distribution as the size of claim gets
very large. The question of interest is how quickly the right-hand tail probability, as quantified in the
survival function S(x) = 1 - F(x), goes to zero as x approaches infinity. If the tail probability goes to
zero slowly, one describes that as a "heavy-tailed distribution."
For example, for the Pareto distribution S(x) = {θ/(θ+x)}α, which goes to zero as per x−α.
If the tail probability goes to zero quickly, then one describes the distribution as "light-tailed".
For the Exponential Distribution, S(x) = e-x/θ, which goes to zero very quickly as x → ∞.
The heavier tailed distribution will have both its density and its survival function go to zero more
α θα
slowly as x approaches infinity. For example, for a Pareto, f(x) = , which goes to zero
(θ + x)α + 1
more slowly than the density of an Exponential Distribution, f(x) = e-x/θ /θ.
For example, here is a comparison starting at 300 of the Survival Function of an Exponential
Distribution with θ = 100 versus that of a Pareto Distribution with θ = 200, α = 3, and mean of 100:
S(x)
0.06
0.05
0.04
0.03
0.02 Pareto
0.01 Expon.
x
400 500 600 700 800 900 1000
The Pareto with a heavier righthand tail has its Survival Function go to zero more slowly as x
approaches infinity, than the Exponential. The Exponential has less probability in its righthand tail
than the Pareto. The Exponential has a lighter righthand tail than the Pareto.
Exercise: Compare S(1000) for the Exponential Distribution with θ = 100 versus that of a
Pareto Distribution with θ = 200, α = 3, and mean 100.
[Solution: For the Exponential, S(1000) = e-1000/100 = 0.00454%.
For the Pareto, S(1000) = (200/1200)3 = 0.46296%.
Comment: The Pareto Distribution has a much higher probability of a loss of size greater than 1000
than does the Exponential Distribution.]
Exercise: What are the mean and second moment of a Pareto Distribution with parameters
α = 3 and θ = 10?
2 θ2
[Solution: The mean is: θ/(α−1) = 10/2 = 5. The second moment is: = 200 / 2 = 100.]
(α − 1) (α − 2)
Exercise: What are the mean and second moment of a LogNormal Distribution with parameters
µ = 0.9163 and σ = 1.1774?
[Solution: The mean is: exp(µ + σ2/2) = exp(1.6094) = 5.
The second moment is: exp(2µ + 2σ2) = exp(4.605) = 100.]
Thus a Pareto Distribution with parameters α = 3 and θ = 10 and a LogNormal Distribution with
parameters µ = 0.9163 and σ = 1.1774 have the same mean and second moment, and therefore
the same variance. However, while their first two moments match, the Pareto has a heavier tail. This
can be seen by calculating the density functions for some large values of x.
Exercise: What are f(10), f(100), f(1000) and f(10,000) for a Pareto Distribution with parameters
α = 3 and θ = 10?
α θα
[Solution: For a Pareto, f(x) = . So that f(10) = 3000/ 204 = 0.01875,
( )
θ + x α + 1
f(100) = 2.05 x 10-5 , f(1000) = 2.88 x 10-9, f(10,000) = 2.99 x 10-13.]

Exercise: What are f(10), f(100) , f(1000) and f(10,000) for a LogNormal Distribution with
parameters µ = 0.9163 and σ = 1.1774?
[Solution: For a LogNormal f(x) =

[(
exp -
ln(x) − µ)2
2σ2 ] , so that
x σ 2π
f(10) = 0.0169 , f(100) = 2.50 x 10-5 , f(1000) = 8.07 x 10-10 , f(10,000) = 5.68 x 10-16.]
x Pareto10
3Density LogNormal
0.9163
1.1774
Density
10 1.87e-2 1.69e-2
100 2.05e-5 2.50e-5
1000 2.88e-9 8.07e-10
10000 2.99e-13 5.68e-16
While at 10 and 100 the two densities are similar, by the time we get to 1000, the LogNormal
Density has started to go to zero more quickly. This LogNormal has a lighter tail than this Pareto.
In general any LogNormal has a lighter tail than any Pareto Distribution.
For the LogNormal, ln f(x) = -0.5 ({ln(x)−µ} /σ)2 - ln(x) - ln(σ) - ln(2π)/2.
For very large x this is approximately: -0.5 ln(x)2 /σ2.
For the Pareto, ln f(x) = ln(α) + αln(θ) - (α+1) ln(θ + x).

For very large x this is approximately: -(α+1) ln(x).
Since the square of ln(x) eventually gets much bigger than ln(x), the log density of the Lognormal
(eventually) goes to minus infinity faster than that of the Pareto. In other words, for very large x, the
density of the Lognormal goes to zero more quickly than the Pareto. The LogNormal is lighter-tailed
than the Pareto.
There are number of methods by which one can distinguish which distribution or empirical data set
has the heavier tail. Light-tailed distributions have more moments that exist. For example, the
Gamma Distribution has all of its (positive) moments exist. Heavy- tailed distributions do not have
higher moments exist. For example, for the Pareto, only those moments for n < α exist.
In general, computing the nth moment involves integrating xn f(x) with upper limit of infinity. Thus if
f(x) goes to zero as x-m as x approaches infinity, then the integrand is xn-m for large x; thus the mean
only exist if n-m < -1, in other words if m > n +1. The nth moment will only exist if f(x) goes to zero
faster than x-(n+1).
α γ xγ −1 ⎛ 1 ⎞ α+ 1
For example, the Burr Distribution has f(x) = ⎜ γ⎟ go to zero as per
θγ ⎝ 1 + (x / θ) ⎠
x(γ−1) − γ(α+1) = x−(γα +1), so the nth moment exists only if αγ > n.
For example, a Burr Distribution with α = 2.2 and γ = 0.8 has a first moment but fails to have a
second moment, since αγ = 1.76 ≤ 2.
If it exists, the larger the coefficient of variation, the heavier-tailed the distribution. For example, for the
α
Pareto with α > 2, the Coefficient of Variation = , which increases as α approaches 2.
α - 2
Thus as α decreases, the tail of the Pareto gets heavier.
Skewness:
Similarly, when it exists, the larger the skewness, the heavier the tail of distribution. The Normal
Distribution is symmetric and thus has a skewness of zero. For the common size of loss distributions,
the skewness is usually positive when it exists.
The Gamma, Pareto and LogNormal all have positive skewness. For small τ the Weibull has
positive skewness, but has negative skewness for large enough τ.
The Gamma Distribution has skewness of 2 / α , which is always positive.

The skewness of the Pareto Distribution does not exist for α ≤ 3.
α +1 α−2
For α > 3, the Pareto skewness is: 2 > 0.
α−3 α
exp(3σ2) - 3 exp(σ2) + 2
For the LogNormal Distribution the skewness = .
{exp(σ2 ) - 1}1.5
The denominator is positive, since exp( σ2) > 1 for σ2 > 0.
The numerator is positive since it can be written as y3 - 3 y + 2, for y = exp( σ2) > 1.
The derivative is 3y2 - 3 > 0 for y > 1.
At y = 1 this denominator is zero, thus for y >1 this denominator is positive.
Thus the skewness of the LogNormal is positive.
For the Weibull Distribution the skewness is:

{Γ(1+ 3/τ) − 3Γ(1+ 1/τ) Γ(1+ 2/τ) + 2(Γ(1+ 1/τ))3 } / {Γ(1+ 2/τ) −(Γ(1+ 1/τ)2 )}1.5.
Note that the skewness depends on the shape parameter τ but not on the scale parameter θ.
Note that for very large tau, the skewness of the Weibull is approximately -1.5 / tau.
For large tau the skewness is negative, but goes to zero as tau goes to infinity.
The Weibull has positive skewness for τ < 3.6 and a negative skewness for τ > 3.6.
Mean Excess Loss (Mean Residual Lives):
Heavy-tailed distributions have mean excess losses (mean residual lives), e(x) that increase to
infinity as x approaches infinity.181 For example, for the Pareto the mean excess loss increases
linearly.
Light-tailed distributions have mean excess losses (mean residual lives) that increase slowly or
decrease as x approaches infinity. For example, the Exponential Distribution has a constant mean
excess loss, e(x) = θ.
Hazard Rate (Force of Mortality):182
The hazard rate / force of mortality is defined as:

h(x) = f(x) / S(x).
If the force of mortality is large, then the chance of being alive at large ages very quickly goes to
zero. If the hazard rate (force of mortality) is large, then the density drops off quickly to zero. Thus if
the hazard rate is increasing, the tail is light. Conversely, if the hazard rate decreases as x
approaches infinity, then the tail is heavy.
The hazard rate for an Exponential Distribution is constant, h(x) = 1/θ.
Relation of the Tail Hazard Mean

to the Exponential Rate Residual Life
Heavier Decreasing Increasing
Lighter Increasing Decreasing
181
Mean Excess Losses (Mean Residual Lives) are discussed further in a subsequent section.
182
Hazard rates are discussed further in a subsequent section.
Heavier vs. Lighter Tails:
Heavier or lighter tail is a comparative concept; there are no strict definitions of heavy-tailed and
light-tailed:
Heavier Tailed Lighter Tailed
f(x) goes to zero more slowly f(x) goes to zero more quickly
Few Moments exist All (positive) moments exist
Larger Coefficient of Variation183 Smaller Coefficient of Variation
Higher Skewness184 Lower Skewness185
e(x) Increases to Infinity186 e(x) goes to a constant187
Decreasing Hazard Rate Increasing Hazard Rate
183
Very heavy tailed distributions may not even have a (finite) coefficient of variation.
184
Very heavy tailed distributions may not even have a (finite) skewness.
185
Very light tailed distributions may have a negative skewness
186
The faster the mean excess loss increases to infinity the more heavy the tail.
187
For very light-tailed distributions (such as the Weibull with τ > 1) the mean excess loss may go to zero as x
approaches infinity.
Here is a list of loss distributions, arranged in increasing heaviness of the tail:188
Mean Excess Loss Do All Positive

Distribution (Mean Residual Life) Moments Exist
Normal decreases to zero approximately as 1/x Yes
Weibull for τ > 1 decreases to zero less quickly than 1/x Yes
Trans. Gamma for τ > 1 decreases to zero less quickly than 1/x Yes
Gamma for α > 1

189
decreases to a constant Yes
Exponential constant Yes
Gamma for α < 1 increases to a constant Yes
Inverse Gaussian increases to a constant Yes
Weibull for τ < 1 increases to infinity less than linearly Yes
Trans. Gamma for τ < 1 increases to infinity less than linearly Yes
LogNormal increases to infinity just less than linearly Yes190
Pareto increases to infinity linearly No
Single Parameter Pareto increases to infinity linearly No
Burr increases to infinity linearly No
Generalized Pareto increases to infinity linearly No
Inverse Gamma increases to infinity linearly No
Inverse Trans. Gamma increases to infinity linearly No
188
The Pareto, Single Parameter Pareto, Burr, Generalized Pareto, Inverse Transformed Gamma and Inverse Gamma
all have tails that are not very different. The Gamma and Inverse Gaussian have tails that are not very different. The
Weibull and Transformed Gamma have tails that are not very different.
189
The Gamma Distribution with α < 1 is heavier tailed than the Exponential (α = 1).
The Gamma Distribution with α > 1 is lighter tailed than the Exponential (α = 1).
One way to remember which one is heavier than an Exponential, is that as α → ∞, the Gamma Distribution is a sum of
many independent identically distributed Exponentials, which approaches a Normal Distribution.
The Normal Distribution is lighter tailed, and therefore so is a Gamma Distribution for α > 1.
190
While the moments exist for the LogNormal, the Moment Generating Function does not.
Comparing Tails:
There is an analytic technique one can use to more precisely compare the tails of distributions.
One takes the limit as x approaches infinity of the ratios of the densities.191
Exercise: What is the limit as x approaches infinity of the ratio of the density of a Pareto Distribution
with parameters α and θ to the density of a Burr Distribution with parameters, α, θ and γ.
[Solution: For the Pareto f(x) = (αθα)(θ + x) − (α + 1)

For the Burr, (using g to distinguish it from the Pareto),
g(x) = αγ(x/θ)γ (1+(x/θ)γ)−(α + 1) /x
lim f(x) / g(x) = lim (αθα)(θ + x)−(α + 1) / {αγ(x/θ)γ (1+(x/θ)γ)−(α + 1) /x} =

x→ ∞ x→ ∞
lim θαx−(α + 1) / {(γxγ−1/θγ)θγ(α + 1)x−γ(α + 1)} = lim θα−γα x α(γ−1) / γ.

x→ ∞ x→ ∞
For γ > 1 the limit is infinity. For γ < 1 the limit is zero.
For γ = 1 the limit is one; for γ =1, the Burr Distribution is a Pareto.]
Let f(x) and g(x) be the two densities, then if:

lim f(x) / g(x) = ∞, f has a heavier tail than g.
x→ ∞
lim f(x) / g(x) = 0, f has a lighter tail than g.

x→ ∞
lim f(x) / g(x) = positive constant, f has a similar tail to g.

x→ ∞
Exercise: Compare the tails of Pareto Distribution with parameters α and θ, and a Burr Distribution
with parameters, α, θ and γ.
[Solution: The comparison depends on the γ, the second shape parameter of the Burr.
For γ > 1, the Pareto has a heavier tail than the Burr.
For γ < 1, the Pareto has a lighter tail than the Burr.
For γ = 1, the Burr is equal to the Pareto, thus they have similar, in fact identical tails.]
191
See Loss Models, Section 3.4.2.
Note, Loss Models uses the notation f(x) ~ g(x), x →∞, when lim f(x)/g(x) = 1.
x→ ∞
Two distributions have similar tails if f(x) ~ c g(x), x →∞, for some constant c > 0.
Instead of taking the limit of the ratio of densities, one can equivalently take the limit of the ratios of the
survival functions.192
Exercise: What is the limit as x approaches infinity of the ratio of the Survival Function of a Pareto
Distribution with parameters α and θ to the Survival Function to a Burr Distribution with parameters,
α, θ and γ.
[Solution: For the Pareto S(x) = θα(θ + x)−α = (1 + x/θ)−α.
For the Burr, (using T to distinguish it from the Pareto), T(x) = {1 + (x/θ)γ}−α.
lim S(x) / T(x) = lim {(1 +(x/θ)γ) / (1 +x/θ) }α = lim {(x/θ)γ−1}α = lim θα−γα xα(γ−1).
x→ ∞ x→ ∞ x→ ∞ x→ ∞
For γ >1 the limit is infinity. For γ < 1 the limit is zero. For γ =1 the limit is one; for γ =1, the Burr
Distribution is a Pareto.]
Therefore the comparison of the tails of the Burr and Pareto depends on the value of γ, the second
shape parameter of the Burr. For γ > 1 the Burr has a lighter tail than the Pareto.
For γ < 1 the Burr has a heavier tail than the Pareto. For γ = 1, the Burr is equal to the Pareto, thus
they have similar, in fact identical, tails.
This makes sense, since for γ > 1, xγ increases more quickly than x. Thus a Burr with γ = 2 has x2 in
the denominator of its survival function, where the Pareto only has x. Thus the survival function of a
Burr with γ = 2 goes to zero more quickly than the Pareto, indicating it is lighter-tailed than the Pareto.
The reverse is true if γ = 1/2. Then the Burr has x in the denominator of its survival function, where
the Pareto only has x.
These same technique also can be used to compare the tails of distributions from the same family.
192
The derivative of the survival function is minus the density. Since as x approaches infinity, S(x) approaches zero,
one can apply L'Hospital's Rule. Let the two densities be f and g. Let the two survival functions be S and T. Limit as x
approaches infinity of S(x)/T(x) = limit x approaches infinity of S'(x)/T'(x) =
limit x approaches infinity of - f(x)/(- g(x)) = limit x approaches infinity of f(x)/g(x).
Exercise: The first Distribution is a Gamma with parameters α and θ. The second Distribution is a
Gamma with parameters a and q. Which distribution has the heavier tail?
[Solution: The density of the first Gamma is: f1 (x) ~ xα-1 exp(-x/θ).
The density of the second Gamma is: f2 (x) ~ xa-1 exp(-x/q). f1 (x)/f2 (x) ~ xα-a exp(x(1/q - 1/θ)).
If 1/q - 1/θ > 0, then the limit of f1 (x)/f2 (x) as x approaches infinity is ∞.
If 1/q - 1/θ < 0, then the limit of f1 (x)/f2 (x) as x approaches infinity is 0.
If 1/q - 1/θ = 0, then the limit of f1 (x)/f2 (x) as x approaches infinity is ∞ if α > a, and 0 if a > α.
Thus we have that: If θ > q, then the first Gamma is heavier-tailed.

If θ < q, then the second Gamma is heavier-tailed.
If θ = q and α > a, then the first Gamma is heavier-tailed.
If θ = q and α < a, then the second Gamma is heavier-tailed.
Comment: Multiplicative constants such as Γ(α) or θ−α, which appear in the density, have been
ignored since they will not affect whether the limit of the ratio of densities goes to zero or infinity.]
Thus we see that the tails of two Gammas while not very different are not precisely similar.193
Whichever Gamma has the larger scale parameter is heavier-tailed. If they have the same scale
parameter, whichever Gamma has the smaller shape parameter is heavier-tailed.194
Inverse Gaussian Distribution vs. Gamma Distribution:
The skewness of the Inverse Gaussian Distribution, 3 µ / θ , is always three times its coefficient of
variation, µ / θ . In contrast, the Gamma Distribution has it skewness 2/ α , is always twice times its
coefficient of variation, 1/ α . Thus if a Gamma and Inverse Gaussian have the same mean and
variance, then the Inverse Gaussian has a larger skewness; if a Gamma and Inverse Gaussian have
the same mean and variance, then the Inverse Gaussian has a heavier tail.
A data set for which a Gamma is a good candidate usually also has an Inverse Gaussian as a
good candidate. The fits of the two types of curves differ largely based on the relative
magnitude of the skewness of the data set compared to its coefficient of variation. For data sets
with less volume, there may be no way statistically to distinguish the fits.
193
Using the precise mathematical definitions in Loss Models. Casualty actuaries rarely use this concept to compare
the tails of two Gammas. It would be more common to compare a Gamma to let's say a LogNormal. (A LogNormal
Distribution has a significantly heavier-tail than a Gamma Distribution.)
194
If they have the same scale parameters and the same shape parameters, then the two Gammas are identical and
have the same tail.
Tails of the Transformed Beta Distribution:
Since many other distributions are special cases of the Transformed Beta
Distribution,195 it is useful to know its tail behavior. The density of the Transformed Beta Distribution is:
Γ(α + τ)
γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ) .
Γ(τ) Γ(α)
For large x the density acts as xγ τ−1 / xγ (α + τ) = 1/ xγ α + 1. If we multiply by xn we get
xn-γα − 1; if we then integrate to infinity, we get a finite answer provided n - γα -1 > -1.
Thus the nth moment exists for n > γα.
The larger the product γα, the more moments exist and the lighter the (righthand) tail. The α shape
parameter is that of a Pareto. The γ shape parameter is the power transform by which the Burr is
obtained from the Pareto. Their product, γα, determines the (righthand) tail behavior of the
Transformed Beta Distribution and its special cases. Provided αγ >1, the mean excess loss exists
and increases to infinity approximately linearly; for large x, e(x) ≅ x / (αγ - 1).
This tail behavior carries over to special cases such as: the Burr, Generalized Pareto, Pareto,
LogLogistic, ParaLogistic, Inverse Burr, Inverse Pareto, and Inverse ParaLogistic.
All have mean excess losses, when they exist, that increase approximately linearly for large x.196
One can examine the behavior of the left hand tail197, as x approaches zero, in a similar manner.
For small x the density acts as xγ τ−1. If we multiply by x-n we get xγ τ−1−n; if we then integrate to
zero, we get a finite answer provided γτ - 1 - n > -1. Thus the negative nth moment exists for γτ > n.
Thus the behavior of the left hand tail is determined by the product of the two shape parameters of
the Inverse Burr Distribution.
Thus we see that of the three shape parameters of the Transformed Beta, τ (one more than the
power to which x is taken in the Incomplete Beta Function, i.e., the first parameter of the Incomplete
Beta Function) affects the left hand tail, α (the shape parameter of the Pareto) affects the righthand
tail, and γ (the power transform parameter of the Burr and LogLogistic) affects both tails.
195
See Figures 5.2 and 5.4 in Loss Models.
196
The mean excess loss of a Pareto, when it exists, is linear in x. e(x) = (x+θ)/(α-1).
197
Since casualty actuaries are chiefly concerned with the behavior of loss distributions in the righthand tail, as x
approaches infinity, assume that unless specified otherwise, "tail behavior" refers to the behavior in the righthand
tail as x approaches infinity.
An Example of Distributions fit to Hurricane Data:
Hogg & Klugman in Loss Distributions show the results of fitting different distributions to a set of
hurricane data.198 The hurricane data set is truncated from below at $5 million and consists of 35
storms with total losses adjusted to 1981 levels of more than $5 million.199 This serves as a good
example of the importance of the tails of the distribution to practical applications. Here are the
parameters of different distributions fit by Hogg & Klugman via maximum likelihood, as well as their
means, coefficients of variation (when they exist) and their skewnesses (when they exist):
Distribution Coefficient
Type Parameters Parameters Parameters Mean of
($ million) Variation Skewness
Weibull θ = 88588730 τ = .51907 166 2.12 6.11
LogNormal µ = 17.953 σ = 1.6028 226 3.47 52.26
Pareto α = 1.1569 θ = 73674000 470 N.D. N.D.
Burr α = 3.7697 θ = 585453983 γ = .65994 197 3.75 N.D.
Gen. Pareto α = 2.8330 θ = 862660000 τ = .33292 157 2.79 N.D.
It is interesting to compare the tails of the different distributions by comparing the estimated
probabilities of a storm greater than $1 billion or $5 billion:
Distribution Probability Probability Probability Estimated Annual Frequency

Type of storm of storm of storm of Hurricanes Greater than
> 5 million > 1 billion > 5 billion $1 billion $5 billion
Weibull 79.86% 2.9637% 0.0300% 4.0591% 0.0410%
LogNormal 94.26% 4.1959% 0.3142% 4.8686% 0.3646%
Pareto 92.68% 4.5069% 0.7475% 5.3185% 0.8821%
Burr 85.28% 3.5529% 0.2122% 4.5567% 0.2722%
Gen. Pareto 43.56% 0.0142% 0.0002% 0.0358% 0.0004%
The lighter-tailed Weibull produces a much lower estimate of the chance of a huge hurricane than a
heavier-tailed distribution such as the Pareto. The estimate from the LogNormal, which is
heavier-tailed than a Weibull, but lighter-tailed than a Pareto, is somewhere in between.
198
Loss Distributions was on the syllabus of the old Part 4B exam .
199
The data is shown in Table 4.1 of Loss Distributions, and Table 11.8 of Loss Models.
See Exercise 11.2 in Loss Models.
In millions of dollars, the trended hurricane sizes are: 6.766, 7.123, 10.562, 14.474, 15.351, 16.983, 18.383,
19.030, 25.304, 29.112, 30.146, 33.727, 40.596, 41.409, 47.905, 49.397, 52.600, 59.917, 63.123, 77.809,
102.942, 103.217, 123.680, 140.136, 192.013, 198.446, 227.338, 329.511, 361.200, 421.680, 513.586,
545.778, 750.389, 863.881, 1638.000.
There were 35 hurricanes greater than 5 million in constant 1981 dollars observed in 32 years.
Thus one could estimate the frequency of such hurricanes as 35/32 = 1.09 per year. Then using the
curves fit to the data truncated from below one could estimate the frequency of hurricanes greater
than size x as: (1.09)S(x) / S(5 million). For example, for the Pareto Distribution the estimated
annual frequency of hurricanes greater than 5 billion in 1981 dollars is: (1.09)(.7475%)/(92.68%) =
.8821%. This is a mean return time of: 1/.8821% = 113 years. The return times estimated using the
other curves are much longer:
Distribution Mean Return Time (Years)

Type of storm of storm
> $1 billion > $5 billion
Weibull 25 2,438
LogNormal 21 274
Pareto 19 113
Burr 22 367
Gen. Pareto 2,796 229,068
It is interesting to note that even the most heavy-tailed of these curves would seem with
twenty-twenty hindsight to have underestimated the chance of large hurricanes such as
Hurricane Andrew.200 The small amount of data does not allow a good estimate of the extreme
tail; the observation of just one very large hurricane would have significantly changed the
results. Also due to changing or cyclic weather patterns and the increase in homes near the
coast, this may just not be an appropriate technique to apply to this particular problem. The
preferred technique currently is to simulate possible hurricanes using meteorological data and
estimate the likely damage using exposure data on the location and characteristics of insured
homes combined with engineering and physics data.201
200
The losses in Hogg & Klugman are adjusted to 1981 levels. Hurricane Andrew in 8/92 with nearly $16 billion in
insured losses, probably exceeded $7 billion dollars in loss in 1981 dollars. It is generally believed that Hurricanes
that produce such severe losses have a much shorter average return time than a century.
201
See for example, "A Formal Approach to Catastrophe Risk Assessment in Management", by Karen M. Clark,
PCAS 1986, or "Use of Computer Models to Estimate Loss Costs," by Michael A. Walters and Francois Morin,
PCAS 1997.
Coefficient of Variation versus Skewness, Two Parameter Distributions:
For the following two parameter distributions: Pareto, LogNormal, Gamma and Weibull, the
Coefficient of Variation and Skewness depend on a single shape parameter.
Values are tabulated below:
Shape Pareto LogNormal Gamma Weibull

Parameter C.V. Skew C.V. Skew C.V. Skew C.V. Skew
0.2 N.A. N.A. 0.202 0.056 2.236 4.472 15.843 190.1
0.4 N.A. N.A. 0.417 0.355 1.581 3.162 3.141 11.35
0.6 N.A. N.A. 0.658 1.207 1.291 2.582 1.758 4.593
0.8 N.A. N.A. 0.947 3.399 1.118 2.236 1.261 2.815
1 N.A. N.A. 1.311 9.282 1.000 2.000 1.000 2.000
1.2 N.A. N.A. 1.795 26.840 0.913 1.826 0.837 1.521
1.4 N.A. N.A. 2.470 87.219 0.845 1.690 0.724 1.198
1.6 N.A. N.A. 3.455 331 0.791 1.581 0.640 0.962
1.8 N.A. N.A. 4.953 1503 0.745 1.491 0.575 0.779
2 N.A. N.A. 7.321 8208 0.707 1.414 0.523 0.631
2.2 3.317 N.A. 11.201 53948 0.674 1.348 0.480 0.509
2.4 2.449 N.A. 17.786 426061 0.645 1.291 0.444 0.405
2.6 2.082 N.A. 29.354 4036409 0.620 1.240 0.413 0.315
2.8 1.871 N.A. 50.391 4.6e+7 0.598 1.195 0.387 0.237
3 1.732 N.A. 90.012 6.2e+8 0.577 1.155 0.363 0.168
3.2 1.633 25.720 167 1.0e+10 0.559 1.118 0.343 0.106
3.4 1.558 14.117 324 2.0e+11 0.542 1.085 0.325 0.051
3.6 1.500 10.222 652 4.6e+12 0.527 1.054 0.309 0.001
3.8 1.453 8.259 1366 1.3e+14 0.513 1.026 0.294 -0.045
4 1.414 7.071 2981 4.3e+15 0.500 1.000 0.281 -0.087
5 1.291 4.648 268337 2.7e+24 0.447 0.894 0.229 -0.254
6 1.225 3.810 6.6e+7 1.5e+35 0.408 0.816 0.194 -0.373
7 1.183 3.381 4.4e+10 7.6e+47 0.378 0.756 0.168 -0.463
8 1.155 3.118 7.9e+13 3.5e+62 0.354 0.707 0.148 -0.534
9 1.134 2.940 3.9e+17 1.4e+79 0.333 0.667 0.133 -0.591
10 1.118 2.811 5.2e+21 5.2e+97 0.316 0.632 0.120 -0.638
The shape parameters for these distributions are:
Pareto α LogNormal σ Gamma α Weibull τ

As mentioned previously, for the Gamma Distribution the skewness is twice the CV:202
Skewness
10
6 alpha < 1
2 E
alpha > 1
CV
1 2 3 4 5
For α > 1 the Gamma Distribution is lighter-tailed than an Exponential, and has CV < 1 and
skewness < 2. Conversely, for α < 1 the Gamma Distribution is heavier-tailed than an Exponential,
and has CV > 1 and skewness > 2. The Exponential Distribution (α = 1), shown above as E, has
CV = 1 and skewness = 2.
For the Pareto Distribution the skewness is more than twice the CV, when they exist.
For the Pareto, the CV > 1 and the skewness > 2:
Skewness
10
Pareto
8
6
Gamma
4
2 E
CV
1 2 3 4 5
202
For the Inverse Gaussian, the Skewness is three times the CV.
As α goes to infinity, the Pareto approaches the Exponential which has CV = 1 and
skewness = 2. As α approaches 3, the skewness approaches infinity.
Here is a similar graph for the LogNormal Distribution versus the Gamma Distribution:
Skewness
10
8 LogNormal
6 Gamma
2 E
CV
1 2 3 4 5
For the LogNormal, as σ approaches zero, the coefficient of variation and skewness each approach
zero. For σ = 1, CV = 1.311 and the skewness = 9.282.
As σ approaches infinity both the skewness and CV approach infinity.
Finally, here is a similar graph for the Weibull Distribution versus the Gamma Distribution:
Skewness
10
8 Weibull, tau < 1
6 Gamma
2 E
Weibull, tau > 1
CV
1 2 3 4 5
For τ > 1 the Weibull Distribution is lighter-tailed than an Exponential, and has CV < 1 and
skewness < 2. Conversely, for τ < 1 the Weibull Distribution is heavier-tailed than an Exponential,
and has CV > 1 and skewness > 2. The Exponential Distribution (τ = 1), shown above as E, has
CV = 1 and skewness = 2.
The CV is positive by definition. The skewness is positive for curves skewed to the right
and negative for curves skewed to the left. The Pareto, LogNormal and Gamma all have
positive Skewness. The Weibull has positive skewness for tau < 3.60235 and negative skewness
for tau > 3.60235.
Existence of Moment Generating Functions:203
The moment generating function for a continuous loss distribution is given by:204
∞
M(t) = ∫0 f(x) ext dx = E[ext].
For example for the Gamma Distribution:
M(t) = (1 - θt)−α, for t < 1/θ.
The moments of the function can be obtained as the derivatives of the moment generating function
at zero. Thus if the Moment Generating Function exists (within an interval around zero) then so do all
the moments. However the converse is not true.
The moment generating function, when it exists, can be written as a power series in t:205
n=∞
M(t) = ∑ E[Xn] tn / n!.
n=0
203
See also Definition 12.2.2 in Actuarial Mathematics or Definition 3.8 in Loss Models.
204
With support from zero to infinity. In general the integral goes over the support of the probability distribution.
205
This is just the usual Taylor Series, substituting in the moments for the derivatives at zero of the Moment
Generating Function.
In order for the moment generating function to converge (in an interval around zero), the moments
E[Xn ] may not grow too quickly as n gets large. This is yet another way to distinguish lighter and
heavier tailed distributions. Those with Moment Generating Functions are lighter-tailed than those
without Moment Generating Functions.
Thus the Weibull for τ > 1, whose m.g.f. exists, is lighter-tailed than the Weibull with τ < 1, whose
m.g.f. does not. The Transformed Gamma has the same behavior as the Weibull; for τ > 1 the
Moment Generating Function exists and the distribution is lighter-tailed than τ < 1 for which the
Moment Generating Function does not exist. (For τ = 1, one gets a Gamma, for which the
Moment Generating Function exists.)
The LogNormal Distribution has its moments increase rapidly and thus it does not have a Moment
Generating Function. The LogNormal is the heaviest-tailed of those distributions which have all their
moments.
Problems:
30.1 (2 points) You are given the following information on three (3) size of loss distributions:
Distribution Coefficient of Variation Skewness
I 2 3
II 1.22 3.81
III 1 2
Which of these three loss distributions can not be a Gamma Distribution?
A. I B. II C. III D. I, II, III E. None of A,B,C, or D
30.2 (1 points) Which of the following distributions always have positive skewness?
1. Weibull
2. Normal
3. Gamma
30.3 (2 points) Which of the following statements is true?

1. For the Pareto Distribution, the standard deviation (when it exists) is always greater than the mean.
2. For the Pareto Distribution, the skewness (when it exists) is always greater than twice the
coefficient of variation.
3. For the LogNormal distribution, f(x) goes to zero more quickly as x approaches infinity than for the
Transformed Gamma distribution.
Hint: For the Transformed Gamma distribution, f(x) = τ(x/θ)τα exp[-(x/θ)τ] / {x Γ(α)}.
A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C, or D
30.4 (2 points) Rank the tails of the following three distributions, from lightest to heaviest:
1. Weibull with τ = 0.5 and θ = 10.
2. Weibull with τ = 1 and θ = 100.
3. Weibull with τ = 2 and θ = 1000.
A. 1, 2, 3 B. 2,1, 3 C. 1, 3, 2 D. 3, 2, 1 E. None of A, B, C or D
30.5 (3 points) Rank the tails of the following three distributions, from lightest to heaviest:
1. Gamma with α = 0.7 and θ = 10.
2. Inverse Gaussian with µ = 3 and θ = 4 .
3. Inverse Gaussian with µ = 5 and θ = 2 .
A. 1, 2, 3 B. 2, 1, 3 C. 1, 3, 2 D. 3, 2, 1 E. None of A, B, C or D
30.6 (1 point) Rank the tails of the following three distributions, from lightest to heaviest:
1. Exponential
2. Lognormal
3. Single Parameter Pareto
A. 1, 2, 3 B. 2, 1, 3 C. 1, 3, 2 D. 3, 2, 1 E. None of A, B, C or D
30.7 (1 point) The Inverse Exponential Distribution has a righthand tail similar to which of the
following distributions?
A. Lognormal B. Pareto α = 1 C. Pareto α = 2 D. Weibull τ < 1 E. Weibull τ > 1

• Claim sizes for Risk A follow a Exponential distribution, with mean 400.
• Claim sizes for Risk B follow a Gamma distribution, with parameters θ = 200, α = 2.
• r is the ratio of the proportion of Risk B's claims (in number) that exceed d to the
proportion of Risk A's claims (in number) that exceed d.
Determine the limit of r as d goes to infinity.
A. 0 B. 1/2 C. 1 D. 2 E. ∞
30.9 (3 points) Compare the righthand tails of the Paralogistic and Inverse Paralogistic distributions.
30.10 (5 points) X follows a Weibull Distribution.

Y P is the corresponding per payment variable for a deductible, d > 0.
Compare the righthand tails of X and YP.
2 1
30.11 (2 points) f(x) = , x > 0.
π 1 + x2
Compare the righthand tail of the above density to that of an Inverse Pareto.
30.12 (4B, 11/92, Q.2) (1 point) Which of the following are true?
1. The random variable X has a lognormal distribution with parameters µ and σ,
if Y = ex has a normal distribution with mean µ and standard deviation σ.
2. The lognormal and Pareto distributions are positively skewed.
3. The lognormal distribution generally has greater probability in the tail than the Pareto distribution.
A. 1 only B. 2 only C. 1, 3 only D. 2, 3 only E. 1, 2, 3
30.13 (4B, 11/93, Q.21) (1 point) Which of the following statements are true for statistical
distributions?
1. Linear combinations of independent normal random variables are also normal.
2. The lognormal distribution is often useful as a model for claim size distribution because it is
positively skewed.
3. The Pareto probability density function tapers away to zero much more slowly than
the lognormal probability density function.
A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
• Claim sizes for Risk A follow a Pareto distribution, with parameters θ = 10,000 and α = 2.
• Claim sizes for Risk B follow a Burr distribution, F(x) = 1 - {1/(1+(x/θ)γ)}α,

with parameters θ = 141.42, α = 2, and γ = 2.
• r is the ratio of the proportion of Risk A's claims (in number) that exceed d to
the proportion of Risk B's claims (in number) that exceed d.
A. 0 B. 1 C. 2 D. 4 E. ∞
30.15 (CAS3, 11/03, Q.16) (2.5 points)

Which of the following is/are true, based on the existence of moments test?
I. The Loglogistic Distribution has a heavier tail than the Gamma Distribution.
ll. The Paralogistic Distribution has a heavier tail than the Lognormal Distribution.
Ill. The Inverse Exponential has a heavier tail than the Exponential Distribution.
A. I only
B. I and II only
C. I and Ill only
D. II and III only
E. I, ll, and Ill
30.1. E. Distributions I and II canʼt be a Gamma, for which

the skewness = twice coefficient of variation.
30.2. D. The Normal is symmetric, so it has skewness of zero. The Gamma has skewness of
α 0.5 > 0. The Weibull has either a positive or negative skewness, depending on the value of τ.
30.3. A. 1. True. This is the same as saying the CV > 1 for the Pareto. 2. True.
3. False. For the LogNormal, ln f(x) = -0.5 ({ln(x)−µ} /σ)2 - ln(x) - ln(σ) -ln(2π)/2. For very large x this
is approximately: -0.5 ln(x)2 /σ2. For the Transformed Gamma, ln f(x) =
ln(τ) + (τα -1)ln(x) - ταln(θ) -(x/θ)τ - lnΓ(α). For very large x this is approximately: -xτ / θτ.
Thus the log density of the LogNormal goes to minus infinity more slowly than that of the
Transformed Gamma. Therefore the density function of the LogNormal goes to zero less quickly as
x approaches infinity than that of the Transformed Gamma.
The LogNormal has a heavier tail than the Transformed Gamma.
30.4. D. The three survival functions are: S1 (x) = exp[-(x/10).5], S2 (x) = exp(-x/100),
S 3 (x) = exp[-(x/1000)2 ]. S1 (x)/S2 (x) = exp[x/100- x / 10 ]. The limit as x approaches infinity of
S 1 (x)/S2 (x) is ∞, since x increases more quickly than x . Thus the first Weibull is heavier-tailed
than the second. Similarly, the limit as x approaches infinity of S2 (x)/S3 (x) is ∞, since x increases
more quickly than x2 . Thus the second Weibull is heavier-tailed than the third. Alternately, just
calculate the densities or log densities for an extremely large value of x, for example 1 billion = 109 .
(The log densities are more convenient to work with; the ordering of the densities and the log
densities are the same.)
For the Weibull, f(x) = τ(x/θ)τ exp(-(x/θ)τ) /x. ln f(x) = ln(τ) + τ ln(x/θ) - ln(x) -(x/θ)τ.
For the first Weibull, ln f(1 billion) = ln(0.5) +(0.5)ln(100 million) -ln(1 billion) - 100 million ≅
-10,000. For the second Weibull, ln f(1 billion) = ln(1) + (1)ln(10 million) - ln(1 billion) -
10 million ≅ -10,000,000. For the third Weibull, ln f(1 billion) = ln(2) + (2)ln(1 million) -
ln(1 billion) - (1 million)2 ≅ -1,000,000,000,000. Thus f(1 billion) is much larger for the first Weibull
than second Weibull, while f(1 billion) is much larger for the second Weibull than third Weibull. Thus
the third Weibull has the lightest tail, while the first Weibull has the heaviest tail.
Comment: For the Weibull, the smaller the shape parameter τ, the heavier the tail. The values of the
scale parameter θ, have no effect on the heaviness of the tail. However by changing the scale, the
third Weibull with θ = 1000 does take longer before its density falls below the others than if it instead
had θ = 1. The (right) tail behavior refers to the behavior as x approaches infinity, thus how long it
takes the density to get smaller does not affect which has a lighter tail. While the third Weibull might
be lighter-tailed, for some practical applications with a maximum covered loss you may be
uninterested in the large values of x at which its density is smaller than the others.
30.5. B. The three densities functions are:

f1 (x) = 10-0.7 x-.3 exp(-x/10)/ Γ(0.7), f2 (x) = 4 / (2 π) x-1.5 exp(-4(x/3 -1)2 / (2x)) =
2 / π x-1.5 exp(-2x/9 + 4/3 - 2/x). f3 (x) = 1/ π x-1.5 exp(-x/25 + 2/5 - 1/x).

We will take the limit as x approaches infinity of the ratios of these densities, ignoring any annoying
multiplicative constants such as 10-0.7/ Γ(0.7) or 2 / π e4/3.
f1 (x)/f2 (x) ~ x-0.3 exp(-x/10) / x-1.5 exp(-2x/9 - 2/x) = x1.2 exp(0.122x + 2/x).
The limit as x approaches infinity of f1 (x)/f2 (x) is ∞.
Thus the Gamma is heavier-tailed than the first Inverse Gaussian with µ = 3 and θ = 4.
f1 (x)/f3 (x) ~ x-0.3 exp(-x/10) / x-1.5 exp(-x/25 - 1/x) = x1.2 exp(-0.06x + 1/x ).
The limit as x approaches infinity of f1 (x)/f3 (x) is 0, since exp(-0.06x) goes to zero very quickly.
Thus the Gamma is lighter-tailed than the second Inverse Gaussian with µ = 5 and θ = 2.
Thus the second Inverse Gaussian has the heaviest tail, followed by the Gamma, followed by the
first Inverse Gaussian.
Comment: In general the Inverse Gaussian and the Gamma have somewhat similar tails; they both
have their mean residual lives go to a constant as x approaches infinity. Which is heavier-tailed
depends on the particular parameters of the distributions. Let's assume we have a Gamma with
shape parameter α and scale parameter β ( using beta rather than using theta which is also a
parameter of the Inverse Gaussian,) and Inverse Gaussian with parameters µ and θ.
Then the density of the Gamma f1 (x) ~ xα-1 exp(-x/β)
Then the density of the Inverse Gaussian f2 (x) ~ x-1.5 exp[-xθ/(2µ2) - θ/(2x)].
f1 (x)/f2 (x) ~ xα+0.5 exp[x(θ/(2µ2) - 1/β) + θ/ 2x].
If θ/ (2µ2) > 1/β, then the limit as x approaches infinity of f1 (x)/f2 (x) is ∞, and the Gamma is
heavier-tailed than the Inverse Gaussian.
If θ/ (2µ2) < 1/β, then the limit as x approaches infinity of f1 (x)/f2 (x) is 0, and the Gamma is
lighter-tailed than the Inverse Gaussian.
If θ/ (2µ2) = 1/β, then f1 (x)/f2 (x) ~ xα+.5 exp[θ/(2x)], and the limit as x approaches infinity of f1 (x)/f2 (x)
is ∞, and the Gamma is heavier-tailed than the Inverse Gaussian.
30.6. A. The Single Parameter Pareto does not have all of its moments, and thus is heavier tailed
than the other two. The Lognormal has an increasing mean excess loss, while that for the Exponential
is constant. Thus the Lognormal is heavier tailed than the Exponential.
Comment: The Single Parameter Pareto has a tail similar to that of the Pareto.
30.7. B. The Inverse Exponential does not have a mean and neither does the Pareto for
α = 1. More specifically, the density of the Inverse Exponential is: θe−θ/x/x2 which is approximately
θ/x2 for large x, while the density of the Pareto for α = 1 is: θ/(x+θ)2 which is also approximately θ/x2
for large x.
Comment: The Inverse Gamma Distribution has a similar to tail to the Pareto Distribution for the
same shape parameter α. The Inverse Exponential is the Inverse Gamma for α = 1.
30.8. A. S A(d) = e-d/400. fB(x) = xe-x/200/40000.

∞ x= ∞
S B(d) = ∫d x e- x / 200 / 40,000 dx = (1/ 40,000)(-200xe - x / 200 - 40,000e - x / ]
200
x= d
= (1 + d/200)e-d/200.
r = SB(d) / SA(d) = (1 + d/200)/ e0.0025d. As d goes to infinity the denominator increases faster
than the numerator; thus as d goes to infinity, r goes to zero.
α 2 xα − 1 .
30.9. For the Paralogistic: f(x) = +1
α α
θ {1 + (x / θ) }
α
xα - 1 1
For large x, this is approximately proportional to: = .
(xα )α + 1 xα2 + 1
τ2 (x / θ)τ2
For the Inverse Paralogistic: f(x) = .
x {1 + (x / θ)τ }τ +1
xτ 2 1
For large x, this is approximately proportional to: τ τ = τ +1.
x (x ) + 1 x
Thus for α2 + 1 = τ + 1, in other words for τ = α2, the two righthand tails are similar.
For τ < α2, Inverse Paralogistic has a lighter righthand tail than the Paralogistic.
For τ > α2, Inverse Paralogistic has a heavier righthand tail than the Paralogistic.
30.10. For the Weibull, S(x) = exp[-(x/θ)τ].
The survival function of YP is: S(y+d) / S(d) = exp[-({y+d}/θ)τ] / exp[-(d/θ)τ].

Thus the ratio of the survival function of YP at x and that of the Weibull is:
exp[-({x+d}/θ)τ + (x/θ)τ + (d/θ)τ] = exp[{xτ + dτ - (x+d)τ } / θτ].
The behavior of this ratio as x approaches infinity depends on that of: xτ + dτ - (x+d)τ .
For τ = 1, xτ + dτ - (x+d)τ = 0, exp[{xτ + dτ - (x+d)τ } / θτ] = 1, and the two tails are equal.
For example, for τ = 2, xτ + dτ - (x+d)τ = -2dx, which approaches minus infinity as x approaches
infinity. In general, for τ > 1, xτ + dτ - (x+d)τ approaches minus infinity,
exp[{xτ + dτ - (x+d)τ } / θτ] approaches zero,

and thus YP has a lighter righthand tail than the Weibull.
For example, for τ = 1/2, xτ + dτ - (x+d)τ = x1/2 + d1/2 - (x+d)1/2 = x1/2 + d1/2 - x1/2 (1 + d/x)1/2
≅ x1/2 + d1/2 - x1/2 {1 + (1/2)(d/x)} = d1/2 + (1/2)(d/x1/2).

This approaches a constant d1/2 as x approaches infinity.
In general, for τ < 1, xτ + dτ - (x+d)τ approaches a constant dτ,
exp[{xτ + dτ - (x+d)τ } / θτ] approaches a positive constant,

and thus YP has a similar righthand tail to the Weibull.
Comment: One could instead work with the ratio of densities rather than survival functions.
For τ = 1 we have an Exponential, and YP follows the same Exponential as X.
30.11. For large x, the given density is approximately proportional to x-2.

τθ x τ − 1
For the Inverse Pareto: f(x) = .
(x + θ ) τ + 1
xτ - 1
For large x, this is approximately proportional to: = x-2.
xτ + 1
Thus the two righthand tails are similar.
Comment: The given density is the one-sided standard Cauchy density, not on the syllabus.
30.12. B. 1. False. Ln(X) has a Normal distribution if X has a LogNormal distribution.

2. True. The skewness of the Pareto does not exist for α ≤ 3.
For α > 3, the Pareto skewness is: 2{(α+1)/(α−3)} (α - 2) / α > 0.
LogNormal Skewness = ( exp[3σ2] - 3 exp[σ2] + 2) / (exp[σ2] -1)1.5. The denominator is positive,
since exp(σ2) > 1 for σ2 > 0. Te numerator is positive since it can be written as:
y 3 - 3 y + 2, for y = exp(σ2) > 1. (The derivative of y3 - 3 y + 2 is 3y2 - 3 , which is positive for

y > 1. At y = 1, y3 - 3 y + 2 is zero, thus for y > 1 it is positive.)
Since the numerator and denominator are both positive, so is the skewness.
3. False. The Pareto is heavier-tailed than the Lognormal distribution. This can be seen by a
comparison of the mean residual lives. That of the lognormal increases less than linearly, while the
mean residual life of the Pareto increases linearly. Another way to see this is that all of the moments
of the LogNormal distribution exist, while higher moments of the Pareto distribution do not exist.
Comments: The LogNormal and the Pareto distributions are both heavy-tailed and heavy-tailed
distributions have positive skewness, (are skewed to the right.) These are statements that practicing
actuaries should know.
30.13. E. 1. True. 2. True. 3. True.

Comment: Statement 3 is another way of saying that the Pareto has a heavier tail than the
LogNormal.
30.14. E. SA(d) = {10000/(10000+d)}2 . SB(d) = {1/(1+(d/141.42)2 )}2 = (20000/(20000+d2 ))2 .
⎛ 20,000 + d2 ⎞ 2
r = SA(d) / SB(d) = ⎜ ⎟ . As d goes to infinity the numerator increases faster than the
⎝ 2(10,000 + d)⎠
denominator; thus as d goes to infinity, r goes to infinity.

Comment: For γ > 1, the Burr Distribution has a lighter tail than the Pareto Distribution, while for γ < 1,
the Burr Distribution would have a heavier tail than the Pareto Distribution with the same α.
30.15. E. I. The Loglogistic does not have all its moments, while the Gamma does.
⇒ The Loglogistic Distribution has a heavier tail than the Gamma Distribution.
II. The Paralogistic does not have all its moments, while the Lognormal does.
⇒ The Paralogistic Distribution has a heavier tail than the Lognormal Distribution.
III. The Inverse Exponential does not have all its moments, while the Exponential does.
⇒ The Inverse Exponential Distribution has a heavier tail than the Exponential Distribution.
2016-C-2, Loss Distributions, §31 Limited Expected Values HCM 10/21/15, Page 430
Section 31, Limited Expected Values
As discussed previously, the Limited Expected Value E[X ∧ x] is the average size of loss with all
losses limited to a given maximum size x. Thus the Limited Expected Value, E[X ∧ x], is the mean of
the data censored from above at x.
The Limited Expected Value is closely related to other important quantities: the Loss Elimination
Ratio, the Mean Excess Loss, and the Excess (Pure Premium) Ratio. The Limited Expected Value
can be used to price Increased Limit Factors. The ratio of losses expected for an increased
limit L, compared to a basic limit B, is E[X ∧ L] / E[X ∧ B].
The Limited Expected Value is generally the sum of two pieces. Each loss of size less than or equal
to u contributes its own size, while each loss greater than u contributes just u to the average.
For a discrete distribution:
E[X ∧ u] = ∑ xi Prob[X = xi] + u Prob[X > u].

xi≤u
For a continuous distribution:
u
E[X ∧ u] = ∫ t f(t) dt + u S(u).
0
Rather than calculating this integral, make use of Appendix A of Loss Models, which has formulas for
the limited expected value for each distribution.206
For example, the formula for the Limited Expected Value of the Pareto is:207
θ ⎧ ⎛ θ ⎞ α− 1⎫
E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬ , α ≠ 1.
α −1 ⎩ ⎝ θ + x⎠ ⎭
Exercise: For a Pareto with α = 4 and θ = 1000, compute E[X], E[X ∧ 500] and E[X ∧ 5000].
θ 1000 ⎧ ⎛ 1000 ⎞ 4 − 1⎫
[Solution: E[X] = = 333.33. E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬.
α −1 4 −1 ⎩ ⎝ 1000 + x ⎠ ⎭
E[X ∧ 500] = 234.57. E[X ∧ 5000] = 331.79.]
206
In some cases the formula for the Limited Expected Value (Limited Expected First Moment) is not given. In those
cases, one takes k = 1, in the formula for the Limited Expected Moments.
207
For α =1, E[X ∧ x] = -θ ln(θ/(θ+x)).
Here are the formulas for the limited expected value for some distributions:
Distribution Limited Expected Value, E[X ∧ x]

Exponential θ (1 - e-x/θ)
θ ⎧ ⎛ θ ⎞ α− 1⎫
Pareto ⎨1 - ⎜ ⎟ ⎬, α ≠ 1
α −1 ⎩ ⎝ θ + x⎠ ⎭
⎡ ln(x) - µ - σ2 ⎤ ⎡ ln(x) - µ ⎤
LogNormal exp(µ + σ2/2) Φ ⎢ ⎥ + x {1 - Φ ⎢ ⎥⎦ }
⎣ σ ⎦ ⎣ σ
Gamma α θ Γ[α+1 ; x/θ] + x {1 - Γ[α ; x/θ]}
Weibull θ Γ(1 +1/τ) Γ[1 +1/τ ; (x/θ)τ] + x exp[ -(x/θ)τ]
⎛ θ⎞ α − 1
α − ⎜ ⎟
αθ θα ⎝ x⎠
Single Parameter Pareto - α =θ , x ≥θ
α - 1 (α - 1) x - 1 α − 1
Exercise: For a LogNormal Distribution with µ = 9.28 and σ = 0.916, determine E[X ∧ 25000].
[Solution: E[X ∧ x] = exp(µ + σ2/2) Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}.

E[X ∧ 25000] =
exp(9.6995)Φ[0.01] + 25000 {1 - Φ[0.92]} = (16310)(0.5040) + (25000)(1 - 0.8212) = 12,705.]
Relationship to the LER, Excess Ratio, and Mean Excess Loss:
The following relationships hold between the mean, the Limited Expected Value E[X ∧ x], the
Excess Ratio R(x), the Mean Excess Loss e(x), and the Loss Elimination Ratio LER(x):
mean = E[X ∧ infinity].
e(x) = { mean - E[X ∧ x] } / S(x).
R(x) = 1 - { E[X ∧ x] / mean } = 1 - LER(x).
R(x) = e(x) S(x) / mean.
LER(x) = E[X ∧ x] / mean.

Layer Average Severity:
The Limited Expected Value can be useful when dealing with layers of loss. For example, suppose
we are estimating the expected value (per loss) of the layer of loss greater than $1 million but less
than $5 million.208 Note here we are taking the average over all losses, including those that are too
small to contribute to the layer. This Layer Average Severity is equal to the Expected Value
Limited to $5 million minus the Expected Value Limited to $1 million.
Layer Average Severity = E[X ∧ top of Layer] - E[X ∧ bottom of layer].
The Layer Average Severity is the insurerʼs average payment per loss to an insured, when there is
a deductible of size equal to bottom of the layer and a maximum covered loss equal to the top of
the layer. Loss Models refers to this as the expected payment per loss.
expected payment per loss = average amount paid per loss =
E[X ∧ Maximum Covered Loss] - E[X ∧ Deductible Amount].
Exercise: Losses follow a Pareto with α = 4 and θ = 1000. T here is a deductible of 500 and a
maximum covered loss of 5000. What is the insurerʼs average payment per loss?
[Solution: From previous solutions: E[X ∧ 500] = 234.57. E[X ∧ 5000] = 331.79.
The Layer Average Severity = E[X ∧ 5000] - E[X ∧ 500] = 331.79 - 234.59 = 97.20.]
Each small loss, x ≤ d, contributes nothing to a layer from d to u.

Each medium size loss, d < x ≤ u, contributes x - d to a layer from d to u.
Each large loss, u < x, contributes u - d to a layer from d to u.
u
Therefore, Layer Average Severity = ∫d (t − d) f( t) dt + S(u) (u - d).
Average Non-zero Payment:
Besides the average amount paid per loss to the insured, one can also calculate the average
amount paid per non-zero payment by the insurer. Loss Models refers to this as the expected
payment per payment.209
With a deductible, there are many instances where the insured suffers a small loss, but the insurer
makes no payment. Therefore, if the denominator only includes those situations where the insurer
makes a non-zero payment, the average will be bigger. The average payment per payment is
greater than or equal to the average payment per loss.
208
This might be useful for pricing a reinsurance contract.
209
Exercise: Losses follow a Pareto with α = 4 and θ = 1000.

There is a deductible of 500, and a maximum covered loss of 5000.
What is the average payment per non-zero payment by the insurer?
[Solution: From the previous solution the average payment per loss to the insured is 97.20.
However, the insurer only makes a payment S(500) = 19.75% of the time the insured has a loss.
Thus the average per non-zero payment by the insurer is: 97.20 / 0.1975 = 492.08.]
E[X ∧ Maximum Covered Loss] - E[X ∧ Deductible]

expected payment per payment =
S(Deductible)
Coinsurance:
Sometimes, the insurer will only pay a percentage of the amount it would otherwise pay.210
As discussed previously, this is referred to as a coinsurance clause. For example with a 90%
coinsurance factor, after the application of any maximum covered loss and/or deductible, the insurer
would only pay 90% of what it would pay in the absence of the coinsurance clause.
Exercise: Losses follow a Pareto with α = 4 and θ = 1000. There is a deductible of 500, a maximum
covered loss of 5000, and a coinsurance factor of 80%. What is the insurerʼs average payment per
loss to the insured?
[Solution: From a previous solution in the absence of the coinsurance factor, the average payment is
97.20.
With the coinsurance clause each payment is multiplied by 0.8, so the average is:
(0.8)(97.20) = 77.76.]
In general each payment is multiplied by the coinsurance factor, thus so is the average. This is just a
special case of multiplying a variable by a constant. The nth moment is multiplied by the constant to
the nth power. The variance is therefore multiplied by the square of the coinsurance factor.
Exercise: Losses follow a Pareto with α = 4 and θ = 1000.

There is a deductible of 500, a maximum covered loss of 5000, and a coinsurance factor of 80%.
What is the insurerʼs average payment per non-zero payment by the insurer?
[Solution: From a previous solution the average payment per loss to the insured is 77.76.
However, the insurer only makes a payment S(500) = 19.75% of the time the insured has a loss.
Thus the average per non-zero payment by the insurer is: 77.76 / 0.1975 = 393.66.]
210
For example, coinsurance clauses are sometimes used in Health Insurance, Homeowners Insurance, or
Reinsurance.
Formulas for Average Payments:
These are the general formulas:211
Given Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then
the average payment per (non-zero) payment by the insurer is:
E[X ∧ u] - E[X ∧ d]
c = c e(d).
S(d)
the insurerʼs average payment per loss to the insured is:
c (E[X ∧ u] - E[X ∧ d]).
The insurerʼs average payment per loss to the insured is the Layer of Loss between the Deductible
Amount d to the Maximum Covered Loss u, E[X ∧ u] - E[X ∧ d], all multiplied by the coinsurance
factor. The average per non-zero payment by the insurer is the insurerʼs average payment per loss
to the insured divided by the ratio of the number of non-zero payments by the insurer to the
number of losses by the insured, S(d).
Limited Expected Value as an Integral of the Survival Function:
The Limited Expected Value can be written as an Integral of the Survival Function, S(x) = 1 - F(x).
x
E[X ∧ x] = ∫ t f(t) dt + x S(x).
0
Using integration by parts and the fact that the integral of f(x) is -S(x):212
x x
E[X ∧ x] = {- S(x)x + ∫ S(t) dt } + x S(x) = ∫ S(t) dt .
0 0
Thus the Limited Expected Value can be written as an integral of the Survival Function
from 0 to the limit, for a distribution with support starting at zero.213
211
More general formulas that include the effects of inflation will be discussed in a subsequent section.
212
Note that the derivative of S(x) is dS(x) /dx = d(1-F(x) / dx = - f(x). Thus an indefinite integral of f(x) is
-S(x) = F(x) - 1. (There is always an arbitrary constant in an indefinite integral.)
213
Thus this formula does not apply to the Single Parameter Pareto Distribution. For the Single Parameter Pareto
Distribution with support starting at θ, E[X ∧ x] = θ + integral from θ to x of S(t). More generally, E[X ∧ x] is the sum of
the integral from -∞ to 0 of -F(t) and the integral from 0 to x of S(t). See Equation 3.9 in Loss Models.
x
E[X ∧ x] = ∫ S(t) dt .
0
Since the mean is E[X ∧ ∞], it follows that the mean can be written as an integral of the Survival
Function from 0 to the infinity, for a distribution with support starting at zero.214 215
∞
E[X] = ∫ S(t) dt .
0
The losses in the Layer from a to b is given as a difference of Limited Expected Values:
b
E[X ∧ b] - E[X ∧ a] = ∫ S(t) dt .
a
Thus the Losses in a Layer can be written as an integral of the Survival Function from
the bottom of the Layer to the top of the Layer.216
Expected Amount by Which Aggregate Claims are Less than a Given Value:
The amount by which x is less than y is defined as (y - X)+: 0 if x > y and y - x if x ≤ y.
For example, (10 - 2)+ = 8, while (10 - 15)+ = 0.
x 10 - X (10 - X)+ X ∧ 10 (10 - X)+ + (X ∧ 10)
2 8 8 2 10
7 3 3 7 10
15 -5 0 10 10
So we see that (10 - x)+ + (x ∧ 10) = 10, regardless of x.
214
See formula 3.5.2 in Actuarial Mathematics.
215
Do not apply this formula to a Single Parameter Pareto Distribution. For a continuous distribution with support on
(a, b), the mean is: a + the integral from a to b of S(x). For the Single Parameter Pareto Distribution with support
(θ, ∞), E[X] = θ + integral from θ to ∞ of S(x).
216
These are the key ideas behind Lee Diagrams, discussed subsequently.
In general, (y - X)+ + (X ∧ y) = y. ⇒ E[(y - X)+] + E(X ∧ y) = y. ⇒ E[(y - X)+] = y - E[X ∧ y].
More generally, the expected amount by which losses are less than y is:
y y y
E[(y - X)+] = ∫ (y − x) f(x) dx = y ∫ f(x) dx - ∫ x f(x) dx = yF(y) - {E[X ∧ y] - yS(y)} = y - E[X ∧ y].
0 0 0
Therefore, the expected amount by which losses are less than y is:
E[(y - X)+ ] = y - E[X ∧ y].
This can also be seen via a Lee Diagram, as discussed in a subsequent section.
The expected amount by which aggregate losses are less than a given amount is sometimes
called the “savings.” 217
For example, assume policyholder dividends are 1/3 of the amount by which that policyholderʼs
aggregate annual claims are less than 1000. Then let L be aggregate annual claims. Then
⎧(1000 - L)/ 3 L < 1000

Policyholder Dividend = ⎨
⎩0 L ≥ 1000
Then the expected policyholder dividend is one third times the average amount by which aggregate
claims are less than 1000.
Therefore, the expected dividend is: (1000 - E[L ∧ 1000])/3.
Exercise: The aggregate annual claims for a policyholder follow the following discrete distribution:
Prob[X = 200] = 30%, Prob[X = 500] = 40%, Prob[X = 2000] = 20%, Prob[X = 5000] = 10%.
Policyholder dividends are 1/4 of the amount by which that policyholderʼs aggregate annual claims
are less than 1000, (no dividend is paid if annual claims exceed 1000.)
[Solution: E[X ∧ 1000] = (0.3)(200) + (0.4)(500) + (0.3)(1000) = 560.
Therefore, the expected amount by which aggregate annual claims are less than 1000 is:
1000 - E[X ∧ 1000] = 1000 - 560 = 440. Expected policyholder dividend is: 440/4 = 110.
Alternately, if aggregate claims are 200, then the dividend is: (1000 - 200)/4 = 200.
If aggregate claims are 500, then the dividend is: (1000 - 500)/4 = 125.
If aggregate claims are 2000 or 5000, then no dividend is paid.
Expected dividend (including those cases where no dividend is paid) is:
(0.3)(200) + (0.4)(125) = 110.]
217
Insurance Savings as used in Retrospective Rating is discussed for example in Gillam and Snader, “Fundamentals
of Individual Risk Rating.”
Use the formula, E[(y - X)+] = y - E[X ∧ y], if the distribution is continuous rather than discrete.
Exercise: Assume aggregate annual claims for a policyholder are LogNormally distributed, with
µ = 4 and σ = 2.5.218 Policyholder dividends are 1/3 of the amount by which that policyholderʼs
aggregate annual claims are less than 1000. No dividend is paid if annual claims exceed 1000.
What are the expected policyholder dividends?
[Solution: For the LogNormal distribution,
E[X ∧ x] = exp(µ + σ2/2) Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}.
E[X ∧ 1000] = exp(7.125)Φ(-1.337) + (1000){1 - Φ(1.163)} =
(1242.6)(0.0906) + (1000)(1 - 0.8776) = 235.
Therefore, the expected dividend is (1000 - E[X ∧ 1000])/3 = 255.]
Sometimes, the dividend or bonus is stated in terms of the loss ratio, which is losses divided by
premiums. In this case, the same technique can be used to determine the average dividend or
bonus.
Exercise: An insurance agent will receive a bonus if his loss ratio is less than 75%.
The agent will receive a percentage of earned premium equal to 1/5 of the difference between 75%
and his loss ratio. The agent receives no bonus if his loss ratio is greater than 75%. His earned
premium is 10 million. His incurred losses are distributed according to a Pareto distribution with
α = 2.5 and θ = 12 million. Calculate the expected value of his bonus.
[Solution: A loss ratio of 75% corresponds to (0.75)(10 million) = $7.5 million in losses.
If his losses are L, his loss ratio is L/10 million. If L < 7.5 million, his bonus is:
(1/5)(0.75 - L/10 million)(10 million) = (1/5)(7.5 million - L).
Therefore, his bonus is 1/5 the amount by which his losses are less than $7.5 million.
For the Pareto distribution, E[X ∧ x] = {θ/(α-1)} {1 - (θ/(θ+x)α−1}.
Therefore, E[X ∧ 7.5 million] = (12 million/1.5){1 - (12/(12+7.5)1.5} = 4.138 million.
Therefore, the expected bonus is: (1/5){7.5 million - 4.138 million} = 672 thousand.]
E[(X-d)+] versus E[(d - X)+]:
E[(X-d)+] = E[X] - E[X ∧ d], is the expected losses excess of d.
E[(d-X)+] = d - E[X ∧ d], is the expected amount by which losses are less than d.
Therefore, E[(X-d)+] - E[(d-X)+] = E[X] - d = E[X - d].

218
Note that we are applying the mathematical concept of a limited expected value to the distribution of aggregate
losses in the same manner as was done to a distribution of sizes of loss. Aggregate distributions are discussed
further in “Mahlerʼs Guide to Aggregate Distributions.”
In fact, (X-d)+ - (d-X)+ = (x-d, if x ≥ d) - (d - x, if x < d) = X - d.
Exercise: For a Poisson distribution, determine E[(N-1)+].

[Solution: E[N ∧ 1] = 0 f(0) + 1 Prob[N ≥ 1] = Prob[N ≥ 1].
E[(N-1)+] = E[N] - E[N ∧ 1] = λ - Prob[N ≥ 1] = λ + e−λ - 1.
Alternately, E[(N-1)+] = E[(1-N)+] + E[N] - 1 = Prob[N = 0] + λ - 1 = e−λ + λ - 1.]
Exercise: In baseball a team bats in an inning until it makes 3 outs. In the fifth inning of today's game,
each batter for the Bad News Bears has a 45% chance of walking and a 55% chance of striking out,
independent of any other batter. What is the expected number of runs the Bad News Bears will
score in the fifth inning?
(If there are three men on base, a walk forces in a run. Assume no wild pitches, passed balls, etc.
Assume nobody steals any bases, is picked off base, etc.)
[Solution: Treat a walk as a failure for the defense, and striking out as a success for the defense.
An inning ends when there are three successes. The number of walks (failures) is Negative Binomial
with r = 3 and β = (chance of failure)/(chance of success) = 0.45/0.55 = 9/11.
f(0) = 1/(1 + β)r = (11/20)3 = 0.1664. f(1) = rβ/(1 + β)r+1 = (3)(9/11)(11/20)4 = 0.2246.
f(2) = {(r)(r+1)/2}β2/(1 + β)r+2 = (3)(4/2)(9/11)2 (11/20)5 = 0.2021.

E[N ∧ 3] = 0f(0) + 1f(0) + 2f(2) + 3{1 - f(0) - f(1) - f(2)} = 0.2246 + (2)(0.2021) + (3)(0.4069) =
1.8495. If there are 3 or fewer walks in the inning, they score no runs. With a total of 4 walks they
score 1 run, with a total 5 walks they score 2 runs, etc. ⇒ Number of runs scored = (N - 3)+.
Expected number of runs scored = E[(N - 3)+] = E[N] - E[N ∧ 3] = (3)(9/11) - 1.8495 = 0.605.]
Average Size of Losses in an Interval:
As discussed previously, the Limited Expected Value is generally the sum of two pieces.
Each loss of size less than x contributes its own size, while each loss greater than or equal to x
contributes just x to the average:
x
E[X ∧ x] = ∫0 y f(y) dy + x S(x).
This formula can be rewritten to put the integral in terms of the limited expected value E[X ∧ x] and
the survival function S(x), both of which are given in the Appendix A of Loss Models:
x
∫0 y f(y) dy = E[X ∧ x] - x S(x).

This integral represents the dollars of loss on losses of size 0 to x.

Dividing by the probability of such claims, F(x), would give the average size of such losses.
Dividing instead by the mean would give the percentage of losses represented by those losses.
The dollars of loss represented by the losses in an interval from a to b is just the difference of two
integrals of the type we have been discussing:
b a
∫ y f(y) dy - ∫ y f(y) dy = E[X ∧ b] - bS(b) - {E[X ∧ a] - aS(a)}.
0 0
Dividing by F(b) - F(a) would give the average size of loss for losses in this interval.
{E[X ∧ b] - b S(b)} - {E[X∧ a] - a S(a)}

Average Size of Losses in the Interval [a, b] = .
F(b) - F(a)
Exercise: For a LogNormal Distribution, with parameters µ = 8 and σ = 3, what is the average size of
those losses with sizes between $1 million and $5 million?
[Solution: For the LogNormal: F(5 million) = Φ[{ln(5 million)−µ} / σ] = Φ[{ln(5 million)-8} / 3] =
Φ[(15.425-8)/3] = Φ[2.475] = 0.9933.
F(1 million) = Φ[{ln(1 million)-8} / 3] = Φ[1.939] = 0.9737.
E[X ∧ 5 million] = exp(µ + σ2/2)Φ[(ln(5 mil) - µ - σ2)/σ] + (5 mil){1 - Φ[(ln(5 mil) - µ)/σ]} =
exp(µ + σ2/2)Φ[(ln(5 mil) - µ - σ2)/σ] + (5 mil) {1 - Φ[(ln(5 mil)- µ)/σ]} =

(268,337)Φ[-0.525] +(5,000,000)(1- Φ[2.475]) =
(268,337)(0.2998) +(5,000,000)(0.0067) = 113,679.
E[X ∧ 1 million] = exp(µ + σ2/2)Φ[(ln(1 mil) - µ - σ2)/σ] + (1 mil) {1 - Φ[(ln(1 mil)- µ)/σ]} =
(268,337)Φ[-1.061] +(1,000,000)(1- Φ[1.939]) =

(268,337)(0.1444) +(1,000,000)(0.0263) = 65,048.
Thus, the average size of loss for those losses of size between $1 million and $5 million is:
({E[X ∧ 5m ] - (5m) S(5m)} - {E[X ∧ 1m] - (1m)S(1m)}) / {F(5m) - F(1m)} =
({113,679 - (5,000,000)(0.0067)} - {65,048 - (1,000,000)(0.0263)}) / (0.9933 -0.9737) =
41,700/0.0196 = $2.13 million.
Comment: Note that the average size of loss is not at the midpoint of the interval, which is
$3 million. In the case of the LogNormal, E[X ∧ x ] - xS(x) = exp(µ + σ2/2)Φ[(ln(x)− µ − σ2)/σ].
Thus, the mean loss size for the interval a to b is:
exp(µ + σ2/2){Φ[(lnb - µ - σ2)/σ] - Φ[(lna - µ - σ2)/σ]} / {Φ[(lnb - µ)/σ] - Φ[(lna - µ)/σ]},
which would have saved some computation in this case. ]
Dividing instead by the mean would give the percentage of dollars of total losses represented by
those claims.

{E[X ∧ b] - b S(b)} - {E[X∧ a] - a S(a)}
.
E[X]
Exercise: For a LogNormal Distribution, with parameters µ = 8 and σ = 3, what percentage of the
total losses are from those losses with sizes between $1 million and $5 million?
[Solution: E[X] = exp(µ + σ2/2) = e12.5 = 268,337.
From a previous solution, ({E[X ∧ 5m ] - (5m) S(5m)} - {E[X ∧ 1m] - (1m)S(1m)}) = 41,700.
41,700/268,337 = 15.5%.
Comment: In the case of the LogNormal, the percentage of losses from losses of size a to b =
exp(µ + σ2/2) {Φ[(lnb - µ - σ2)/σ] - Φ[(lna - µ - σ2)/σ]} / exp(µ + σ2/2) =
Φ[(lnb - µ - σ2)/σ] - Φ[(lna - µ - σ2)/σ]. ]
Questions about the losses in an interval have to be distinguished from those about layers of loss.
For example, the losses in the layer from $100,000 and $1 million are part of the dollars from losses
of size greater than $100,000. Each loss of size between $100,000 and $1 million contributes its
size minus $100,000 to this layer, while those of size greater than $1 million contribute the width of
the layer, $900,000, to this layer.219
219
See the earlier section on Layers of Loss.
Payments Subject to a Minimum:220
Assume a disabled worker is paid his weekly wage, subject to a minimum payment of 300.221
Let X be a workers weekly wage. Then, while he is unable to work, he is paid Max[X, 300].
Min[X, 300] + Max[X, 300] = X + 300.

Therefore, E[Max[X, 300]] = 300 + E[X] - E[Min[X, 300]] = 300 + E[X] - E[X ∧ 300].
Let Y = amount the worker is paid = Max[X, 300].

Then Y - 300 = 0 if X ≤ 300, and Y - 300 = X - 300 if X > 300.
Therefore, E[Y - 300] = E[(X - 300)+] = E[X] - E[X ∧ 300].
⇒ E[Y] = 300 + E[X] - E[X ∧ 300], matching the previous result.
Exercise: Weekly wages are distributed as follows:

200 @ 20%, 300 @30%, 400 @30%, 500 @ 10%, 1000 @10%.
Determine the average weekly payment to a worker who is disabled.
[Solution: E[X] = (20%)(200) + (30%)(300) + (30%)(400) + (10%)(500) + (10%)(1000) = 400.
E[X ∧ 300] = (20%)(200) + (30%)(300) + (30%)(300) + (10%)(300) + (10%)(300) = 280.
300 + E[X] - E[X ∧ 300] = 300 + 400 - 280 = 420.
Alternately, one can list all of the possibilities:
Wage Payment Probability
200 300 20%
300 300 30%
400 400 30%
500 500 10%
1000 1000 10%
(20%)(300) + (30%)(300) + (30%)(400) + (10%)(500) + (10%)(1000) = 420.]
Another way to look at this is that the average payment is:

mean wage + (the average amount by which the wage is less than 300) =
E[X] + (300 - E[X ∧ 300]) = 300 + E[X] - E[X ∧ 300], matching the previous result.
In general, E[Max[X, a]] = a + E[X] - E[X ∧ a].
220
See for example, SOA M, 11/06, Q. 20.
221
This is a very simplified version of benefits under Workers Compensation.
Payments Subject to both a Minimum and a Maximum:222
Assume a disabled worker is paid his weekly wage, subject to a minimum payment of 300, and a
maximum payment of 700.223 Let X be a workers weekly wage. Then, while he is unable to work, he
is paid Min[Max[X, 300], 700].
Let Y = amount the worker is paid = Min[Max[X, 300], 700].

Then Y - 300 = 0 if X ≤ 300, Y - 300 = X - 300 if 300 < X < 700, and Y - 300 = 400 if X ≥ 700.
Therefore, E[Y - 300] = the layer from 300 to 700 = E[X ∧ 700] - E[X ∧ 300].
⇒ E[Y] = 300 + E[X ∧ 700] - E[X ∧ 300].
Exercise: Weekly wages are distributed as follows:

200 @ 20%, 300 @30%, 400 @30%, 500 @ 10%, 1000 @10%.
Determine the average weekly payment to a worker who is disabled.
[Solution: E[X ∧ 300] = (20%)(200) + (80%)(300) = 280.
E[X ∧ 700] = (20%)(200) + (30%)(300) + (30%)(400) + (10%)(500) + (10%)(700) = 370.
300 + E[X ∧ 700] - E[X ∧ 300] = 300 + 370 - 280 = 390.
Alternately, one can list all of the possibilities:
Wage Payment Probability
200 300 20%
300 300 30%
400 400 30%
500 500 10%
1000 700 10%
(20%)(300) + (30%)(300) + (30%)(400) + (10%)(500) + (10%)(700) = 390.]
Another to way to arrive at the same result is that the average payment is:
mean wage + (average amount by which the wage is less than 300) - (layer above 700) =
E[X] + (300 - E[X ∧ 300]) - (E[X] - E[X ∧ 700]) = 300 + E[X ∧ 700] - E[X ∧ 300], matching the
previous result.
In general, E[Min[Max[X , a] , b]] = a + E[X ∧ b] - E[X ∧ a].
We note that if b = ∞, in other words the payments are not subject to a maximum, this reduces to
the result previously discussed for that case, E[Max[X, a]] = a + E[X] - E[X ∧ a].
If instead a = 0, in other words the payment is not subject to a minimum, this reduces to
E[Min[X , b]] = E[X ∧ b], which is the definition of the limited expected value.
222
This mathematics is a simplified version of the premium calculation under a Retrospectively Rated Policy.
See for example, “Individual Risk Rating” by Margaret Tiller Sherwood, in Foundations of Casualty Actuarial Science.
223
This is a simplified version of benefits under Workers Compensation.
Normal Distribution:224
For the Standard Normal:

x x t= x
∫-∞ t f(t) dt = -∞∫ t exp[-t2 / 2] / 2π dt = -exp[-t2 / 2] / 2π ] = -exp[-x2 /2] / 2 π = -φ(x).

t = -∞
For the nonstandard Normal:

x x (x - µ)/ σ
∫-∞ t f(t) dt = -∞∫ t φ[(t - µ)/ σ] / σ dt = -∞∫ (σy + µ) φ[y] dy =
(x - µ)/ σ (x - µ)/ σ
σ ∫ y φ[y] dy + µ
∫ φ[y] dy = -σ φ[(x-µ)/σ] + µ Φ[(x-µ)/σ].
-∞ -∞
x
Thus, E[X ∧ x] = ∫ t f(t) dt + x S(x) = -σ φ[(x-µ)/σ] + µ Φ[(x-µ)/σ] + x (1 - Φ[(x-µ)/σ])
-∞
= x - σ φ[(x-µ)/σ] - (x - µ) Φ[(x-µ)/σ].
Exercise: For a Normal Distribution with µ = 33 and σ = 10, determine E[X ∧ 38].
[Solution: 38 - (10) φ[(38 - 33) / 10] - (38 - 33) Φ[(38 - 33) / 10] = 38 - 10 φ[0.5] - 5 Φ[0.5] =
38 - (10) exp[-0.52 /2] / 2 π - (5)(0.6915) = 31.02.]
224
One could use the formula for the limited expected value of a Normal Distribution to derive the formula for TVaR
discussed in “Mahlerʼs Guide to Risk Measures” and Example 3.15 in Loss Models.
Problems:

• The size of loss distribution is given by
f(x) = 2e-2x, x > 0
• Under a basic limits policy, individual losses are capped at 1.
• The expected annual claim frequency is 13.
What are the expected annual total loss payments on a basic limits policy?
A. less than 5.0
E. at least 6.5
Use the following information for the next 7 questions.

Assume the unlimited losses follow a LogNormal Distribution with parameters µ = 7 and σ = 3.
Assume an average of 200 losses per year.
31.2 (1 point) What is the total cost expected per year?

31.3 (2 points) If the insurer pays no more than $1 million per loss, what is the insurerʼs total cost
expected per year?
31.4 (2 points) If the insurer pays no more than $5 million per loss, what is the insurerʼs total cost
expected per year?
31.5 (1 point) What are the dollars in the layer from $1 million to $5 million expected per year?
A. less than $3.5 million
B. at least $3.5 million but less than $3.7 million
C. at least $3.7 million but less than $3.9 million
D. at least $3.9 million but less than $4.1 million
E. at least $4.1 million
31.6 (1 point) What are the total dollars excess of $5 million per loss expected per year?
31.7 (2 points) What is the average size of loss for those losses between $1 million and $5 million
in size?
31.8 (1 point) What is the expected total cost per year of those losses between
$1 million and $5 million in size?
31.9 (2 points) A Pareto Distribution with parameters α = 2.5 and θ = $15,000 appears to be a
good fit to liability claims. What is the expected average size of loss for a policy issued with a
$250,000 limit of liability?
A. less than 9200
E. at least 9800

• The weekly wages for workers in a state follow a Pareto Distribution with α = 4 and θ = 1800.
• Injured workers are paid weekly benefits equal to 2/3 of their pre-injury average
weekly wage, but subject to a maximum benefit of the state average weekly wage
and a minimum benefit of 1/4 of the state average weekly wage.
• Injured workers have the same wage distribution as all workers.
• The duration of payments is independent of the workerʼs wage.
31.10 (1 point) What is the state average weekly wage?

A. less than $500
E. at least $590
31.11 (2 points) For a Pareto Distribution with parameters α = 4 and θ = 1800, what is E[X ∧ 900]?
A. less than $400
E. at least $490
31.12 (2 points) For a Pareto Distribution with parameters α = 4 and θ = 1800, what is E[X ∧ 225]?
A. less than $100
E. at least $190
31.13 (3 points) What is the average weekly benefit received by injured workers?
A. less than $300
E. at least $360
Hint: Use the solutions to the previous three questions.

Losses follow an Exponential Distribution with θ = 10,000.
31.14 (1 point) What is the average loss?

A. less than 8500
D. at least 9500 but less than 10,000
E. at least 10,000
31.15 (1 point) Assuming a 25,000 policy limit, what is the average payment by the insurer?
A. less than 9000
E. at least 9300
31.16 (1 point) Assuming a 1000 deductible (with no maximum covered loss),

A. less than 9000
E. at least 9300
31.17 (1 point) Assuming a 1000 deductible (with no maximum covered loss),

what is the average payment per non-zero payment by the insurer?
A. less than 8500
E. at least 10,000
31.18 (1 point) Assuming a 1000 deductible and a 25,000 maximum covered loss,
A. less than 8500
E. at least 10,000
31.19 (1 point) Assuming a 1000 deductible and a 25,000 maximum covered loss,
what is the average payment per (non-zero) payment by the insurer?
A. less than 9000
E. at least 9300
31.20 (1 point) Assuming a 75% coinsurance factor (with no deductible or maximum covered loss),
what is the average payment by the insurer?
A. less than 6700
E. at least 7000
31.21 (1 point) Assuming a 75% coinsurance factor and a 1000 deductible (with no maximum
covered loss), what is the average payment per loss?
A. less than 6700
E. at least 7000
31.22 (1 point) Assuming a 75% coinsurance factor, a 1000 deductible and a 25,000 maximum
covered loss, what is the average payment per non-zero payment by the insurer?
A. less than 6700
E. at least 7000
31.23 (2 points) What is the average size of the losses in the interval from 1000 to 25000?
Assume no deductible, no maximum covered loss, and no coinsurance factor.
A. less than 7500
E. at least 9000
31.24 (2 points) What is the proportion of total dollars of loss from the losses in the interval from
1000 to 25000? Assume no deductible, no maximum covered loss, and no coinsurance factor.
A. less than 74%
E. at least 80%
31.25 (3 points) Assuming a 1000 deductible, what is the average size of the insurerʼs payments
for those payments greater than 500 and at most 4000?
A. less than 2100
E. at least 2190
31.26 (3 points) Assuming a 75% coinsurance factor, and a 1000 deductible, what is the average
size of the insurerʼs payments for those payments greater than 500 and at most 4000?
A. less than 2100
E. at least 2190
31.27 (4 points) Assuming a 75% coinsurance factor, a 1000 deductible and a 25,000 maximum
covered loss, what is the average size of the insurerʼs payments for those payments greater than
15,000 and at most 19,000?
A. less than 17,400
E. at least 17,700
31.28 (1 point) Assuming a 75% coinsurance factor, a 1000 deductible and a 25,000 maximum
covered loss, what is the mean of the insurerʼs payments per loss?
A. less than 4000
E. at least 7000
31.29 (3 points) You are given the following information about a policyholder:
• His loss ratio is calculated as incurred losses divided by earned premium.
• He will receive a policyholder dividend as a percentage of earned premium equal to
1/4 of the difference between 60% and his loss ratio.
• He receives no policyholder dividend if his loss ratio is greater than 60%.
• His earned premium is 40,000.
• His incurred losses are distributed via a LogNormal Distribution, with µ = 6 and σ = 3.
Calculate the expected value of his policyholder dividend.
(A) 4800 (B) 5000 (C) 5200 (D) 5400 (E) 5600

• In the state of Minnehaha, each town is responsible for its snow removal.
• However, a state fund shares the cost if a town has a lot of snow during a winter.
• In exchange, a town is required to pay into this state fund when it has a winter with
a small amount of snow.
• Let x be the number of inches of snow a town has during a winter.
• If x < 20, then the town pays the state fund c(20 - x), where c varies town.
• If x > 50, then the state fund pays the town c(x - 50).
• c = 1000 for the town of Frostbite Falls.
31.30 (3 points) The number of inches of snow the town of Frostbite Falls has per winter is
equally likely to be: 8, 10, 16, 21, 35, 57, 70, or 90.
What is the expected net amount the state fund pays Frostbite Falls (expected amount state fund
pays town minus expected amount town pays the state fund) per winter?
A. 3000 B. 3500 C. 4000 D. 4500 E. 5000
31.31 (5 points) The number of inches of snow the town of Frostbite Falls has per winter is
LogNormal, with µ = 2.4 and σ = 1.5.
What is the expected net amount the state fund pays Frostbite Falls (expected amount state fund
pays town minus expected amount town pays the state fund) per winter?
A. 7000 B. 7500 C. 8000 D. 8500 E. 9000
31.32 (2 points) N follows a Poisson Distribution, with λ = 2.5. Determine E[(N - 3)+].
A. 0.2 B. 0.3 C. 0.4 D. 0.5 E. 0.6
31.33 (2 points) The lifetime of batteries is Exponential with mean 6. Batteries are sold for $100
each. If a battery lasts less than 2 years, the manufacturer will pay the purchaser the pro rata share of
the purchase price. For example if the battery lasts only 1.5 years, the manufacturer will pay the
purchaser (100)(2 - 1.5)/2 = 25.
What is the expected amount paid by the manufacturer per battery sold?
(A) 11 (B) 13 (C) 15 (D) 17 (E) 19
31.34 (4 points) XYZ Insurance Company writes insurance in a state with a catastrophe fund for
hurricanes. For any hurricane on which XYZ has more than $30 million in losses in this state, the
Catastrophe Fund will pay XYZ 75% of its hurricane losses above $30 million, subject to a
maximum payment from the fund of $90 million.
The amount XYZ pays in this state on a hurricane that hits this state is distributed via a LogNormal
Distribution, with µ = 15 and σ = 2. What is expected value of the amount XYZ will receive from the
Catastrophe Fund due to the next hurricane to hit this state?
(A) 4 million (B) 5 million (C) 6 million (D) 7 million (E) 8 million

• Losses follow a Pareto Distribution, with parameters α = 5 and θ = 40,000.
• Three losses are expected each year.
• For each loss less than or equal to 5,000, the insurer makes no payment.

For each loss greater than 5,000, the insurer pays the amount of the loss up to the maximum
covered loss of 25,000, less a 5000 deductible.
(Thus for a loss of 7000 the insurer pays 2000; for a loss of 80,000 the insurer pays 20,000.)
A. Less than 7,500
E. At least 22,500
31.36 (2 points) For each loss greater than 5,000, the insurer pays the entire amount of the loss up
to the maximum covered loss of 25,000.
A. Less than 7,500
E. At least 22,500
31.37 (2 points) Losses follow an Exponential Distribution with θ = 20,000.

Calculate the percent of expected losses within the layer 5,000 to 50,000.
A. Less than 50%
E. At least 65%
31.38 (4 points) Losses follow a LogNormal Distribution with µ = 9.4 and σ = 1.

A. Less than 50%
E. At least 65%
31.39 (3 points) Losses follow a Pareto Distribution with α = 3 and θ = 40,000.

A. Less than 50%
E. At least 65%
31.40 (3 points) N follows a Geometric Distribution, with β = 2.5. Determine E[(N - 3)+].
A. 0.9 B. 1.0 C. 1.1 D. 1.2 E. 1.3
31.41 (3 points) Losses follow a Pareto Distribution with α = 3 and θ = 12,000.

Policy A has a deductible of 3000. Policy B has a maximum covered loss of u.
The average payment per loss under Policy A is equal to that under Policy B. Determine u.
A. 4000 B. 5000 C. 6000 D. 7000 E. 8000

If E[(y - X)+ ] = 8, determine y.
A. 10 B. 15 C. 20 D. 25 E. 30
31.43 (2 points) X is Exponential with θ = 2. Y is equal to 1 - X if X < 1, and Y is 0 if X ≥ 1.

What is the expected value of Y?
A. 0.15 B. 0.17 C. 0.19 D. 0.21 E. 0.23
31.44 (3 points) Let R be the weekly wage for a worker compared to the statewide average.
R follows a LogNormal Distribution with σ = 0.4. Determine the percentage of overall wages earned
by workers whose weekly wage is less than twice the statewide average.
A. 88% B. 90% C. 92% D. 94% E. 96%
31.45 (2 points) You observe the following 35 losses: 6, 7, 11, 14, 15, 17, 18, 19, 25, 29, 30, 34,
40, 41, 48, 49, 53, 60, 63, 78, 85, 103, 124, 140, 192, 198, 227, 330, 361, 421, 514, 546, 750,
864, 1638. What is the (empirical) Limited Expected Value at 50?
A. less than 38
E. at least 41
31.46 (2 points) Alexʼs pay is based on the annual profit made by his employer.
Alex is paid 2% of the profit, subject to a minimum payment of 100.
The annual profits for Alexʼs company, X, follow a distribution F(x).
Which of the following represents Alexʼs expected payment?
A. 100F(100) + E[X]/50 - E[X ∧ 100]
B. 100F(5000) + E[X]/50 - E[X ∧ 5000]/50
C. 100 + 0.02(E[X] - E[X ∧ 5000])
D. 0.02(E[X ∧ 5000] - E[X ∧ 100]) + 100S(5000)
31.47 (2 points) In the previous question, assume F(x) = 1 - {20,000/(20,000 + x)}3 .

Determine Alexʼs expected payment.
A. 200 B. 230 C. 260 D. 290 E. 320
31.48 (2 points) The size of losses follows a Gamma distribution with parameters α = 3, θ = 100.
What is the limited expected value at 500, E[X ∧ 500] ?
Hint: Use Theorem A.1 in Appendix A of Loss Models:
j=n-1
Γ(n; x) = 1 - ∑ xj e- x / j! , for n a positive integer.
j=0
A. less than 275

E. at least 290
31.49 (2 points) Donald Adams owns the Get Smart Insurance Agency.
Let L be the annual losses from the insurance policies that Donʼs agency writes for the Control
Insurance Company. L follows a Single Parameter Pareto distribution with α = 3 and θ = 100,000.
Don gets a bonus from the Control Insurance Company calculated as (170,000 - L)/4 if this quantity
is positive and 0 otherwise. Calculate Donʼs expected bonus.
A Less than 10,000
E. At least 16,000
31.50 (1 point) In the previous question, calculate the expected value of Donʼs bonus conditional on
his bonus being positive.
31.51 (1 point) X follows the density f(x), with support from 0 to infinity.
1000 1000
∫0 f(x) dx = 0.87175.
∫0 x f(x) dx = 350.61.
Determine E[X ∧ 1000].

A. Less than 480
E. At least 510
31.52 (3 points) The size of loss is modeled by a two parameter Pareto distribution with θ = 5000
and α = 3. An insurance has the following provisions:
(i) It pays 75% of the first 2000 of any loss.
(ii) It pays 90% of any portion of a loss that is greater than 10,000.
Calculate the average payment per loss.
A Less than 1050
E. At least 1200
31.53 (3 points) The mean number of minutes used per month by owners of cell phones varies
between owners via a Single Parameter Pareto Distribution with α = 1.5 and θ = 20.
The Telly Savalas Phone Company is planning to sell a new unlimited calling plan.
Only those whose current average usage is greater than the overall average will sign up for the plan.
In addition, those who sign up will use on average 50% more minutes than currently.
What is the expected number of minutes used per month under the new plan?
A. 150 B. 180 C. 210 D. 240 E. 270
31.54 (2 points) Define the first moment distribution, G(x), as the percentage of total loss dollars that
come from those losses of size less than x.
If the size of loss distribution follows a LogNormal Distribution, with parameters µ and σ, determine
the form of the first moment distribution.
Define the trimmed mean as the average of those values between the first (lower) quartile and the
third (upper) quartile.
Determine the trimmed mean for an Exponential Distribution.
31.56 (2 points) For a Pareto Distribution with α = 1, derive the formula for the Limited Expected
Value that is shown in Appendix A of Loss Models, attached to the exam.
31.57 (4 points) The value of a Property Claims Service (PCS) index is determined by the
catastrophe losses for the insurance industry in a certain region of the country over a certain period of
time. Each $100 million of catastrophe losses corresponds to one point on the index.
A 100/150 call spread would pay: (200) {(S - 150)+ - (S - 100)+},
⎧0 if x < 0
where S is the value of the PCS index at expiration and X+ = ⎨ .
⎩ x if x ≥ 0
You assume that the catastrophe losses in a certain region follow a LogNormal Distribution with
parameters µ = 20 and σ = 2.
What is the expected payment on a 100/150 call spread on the PCS Index for this region?
A. 200 B. 300 C. 400 D. 500 E. 600
31.58 (2 points) For the Normal Distribution with mean µ and standard deviation σ:
E[X ∧ x] = x - (x-µ) Φ[(x-µ)/σ] - σ φ[(x-µ)/σ].
For µ = 62 and σ = 20, calculate E[X ∧ 75].
31.59 (3 points) Define the quantile Qα to be such that F[Qα] = α.
For α between 0 and 1/2, compute the Windsorized mean by:

1. Replace all values below Qα by Qα.
2. Replace all values above Q1−α by Q1−α.
3. Take the average.
Determine the algebraic form of the Windsorized mean for an Exponential Distribution.
31.60 (5 points) As a state insurance department actuary in the state of Pennisota, you have to
oversee the operations of the State Medical Reinsurance Fund. The Fund covers the layer from
$250,000 to $1 million ($750,000 in excess of $250,000) for each Medical Malpractice claim
incurred in Pennisota.
The severity distribution for Medical Malpractice Insurance in the state of Pennisota is Pareto with
α = 3 and θ = 400,000.
The State is considering two possible legislative bills for Medical Malpractice in Pennisota.
Legislative Bill 1 would increase the size of all claims by 5%.
Legislative Bill 2 would only increase the size of claims with values originally less than $500,000 by
10%, subject to the constraint that claims that are originally less than $500,000 would become no
greater than $500,000. Under Legislative Bill 2 the value of all claims in excess of $500,000 would
not change.
a. (2 points) Estimate for Bill 1 its percentage impact on the losses paid by
the State Medical Reinsurance Fund.
b. (3 points) Estimate for Bill 2 its percentage impact on the losses paid by
the State Medical Reinsurance Fund.
31.61 (4 points) Define πp as the pth percentile.
Define the trimmed mean as the average of those values between π1-p and πp .
For p = 95%, determine the algebraic form of the trimmed mean for a Pareto Distribution.
31.62 (3 points) Losses follow a Pareto Distribution with α = 2.6 and θ = 5000.
An insurance policy has a deductible of 2000 applied after a maximum covered loss of 3000.
A new insurance policy will instead apply a deductible of 1000 prior to a limit of 1000.
What is the percentage increase in expected losses paid under the new policy?
A. 30% D. 35% C. 40% D. 45% E. 50%
31.63 (2 points) A general contractor will be paid a bonus if a skyscraper is completed in less than
450 days. (The skyscraper will have to pass final inspection.)
Let S be the number of days taken to complete the skyscraper.
The bonus is 30,000 times the amount by which S is less than 450.
No bonus is paid if S ≥ 450.
E[S] = 500.
E[(S-450)+] = 90.
Determine the expected bonus paid.
31.64 (3 points) The distribution of annual loss ratios for Permanent Assurance Company is
LogNormal with µ = -0.65 and σ = 0.6.
Conditional on the loss ratio being greater than 0.7, compute the average loss ratio.
A. 95% D. 100% C. 105% D. 110% E. 115%
31.65 (8 points) You assume that the distribution of the wealth in a country follows a
Single Parameter Pareto Distribution.
(a) If α = 1.5, determine the average wealth of those whose wealth exceeds 10 million > θ.
(b) For α > 1 and y > θ, determine the average wealth of those whose wealth exceeds y.
(c) Let Πp and Πq be two percentiles of the distribution, with q > p. Determine Πq / Πp .
(d) What part of the total wealth of the top 10% of the wealth distribution is owned by the top 1%?
31.66 (2 points) An insurer has excess-of-loss reinsurance on auto insurance. You are given:
• The individual losses have a Pareto distribution with
⎛ 4000 ⎞ 3
F(x) = 1 - , x > 0.
⎝ 4000 + x ⎠
• Reinsurance will pay the excess of each loss over 3000.
If the reinsurance instead paid the excess of each loss over 5000, what would be the percent
increase in the expected payment per loss by the insurer?
A. 15% D. 17% C. 19% D. 21% E. 23%
31.67 (4, 5/88, Q.61) (3 points) Losses for a given line of insurance are distributed according to
the probability density function f(x) = 0.015 - 0.0001x, 0 < x < 100.
An insurer has issued policies each with a deductible of 10 for this line.
On these policies, what is the average expected payment by the insurer per non-zero payment by
the insurer?
A. Less than 30
E. 45 or more
31.68 (4, 5/90, Q.53) (2 points) Loss Models defines two functions:
1. the limited expected value function, E[X ∧ x] and
2. the Mean Excess Loss function, e(x)
If F(x) = Pr{X ≤ x} and the expected value of X is denoted by E[X], then which of the following
equations expresses the relationship between E[X ∧ x] and e(x)?
A. E[X ∧ x] = E[X] - e(x) / {1- F(x)}
B. E[X ∧ x] = E[X] - e(x)
C. E[X ∧ x] = E[X] - e(x)(1 - F(x))
D. E[X ∧ x] = E[X](1 - F(x)) - e(x)
31.69 (4B, 11/93, Q.16) (1 point) Which of the following statements are true regarding loss
distribution models?
1. For small samples, method of moments estimators have smaller variances than
maximum likelihood estimators.
2. The limited expected value function evaluated at any point d > 0 equals
d
E [X ∧ d] = ∫0 x fx(x) dx + d{1 - Fx(d)}, where fx(x) and Fx(x) are the probability density
and distribution functions, respectively, of the loss random variable X.
3. A consideration in model selection is agreement between the empirical and fitted
limited expected value functions.
A. 2 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
• For each loss that occurs, the insurer's payment is equal to the entire amount of the
loss if the loss is greater than 100. The insurer makes no payment if the loss is less
than or equal to 100.
A. Less than 8,000
E. At least 9,900
• Losses follow a lognormal distribution, with parameters µ = 10 and σ = 1.
• One loss is expected each year.
• For each loss less than or equal to 50,000, the insurer makes no payment.
• For each loss greater than 50,000, the insurer pays the entire amount of the
loss up to the maximum covered loss of 100,000.
A. Less than 7,500
E. At least 22,500
31.72 (3, 5/00, Q.25) (2.5 points) An insurance agent will receive a bonus if his loss ratio is less
than 70%. You are given:
(i) His loss ratio is calculated as incurred losses divided by earned premium on his block
of business.
(ii) The agent will receive a percentage of earned premium equal to 1/3 of the difference
between 70% and his loss ratio.
(iii) The agent receives no bonus if his loss ratio is greater than 70%.
(iv) His earned premium is 500,000.
(v) His incurred losses are distributed according to the Pareto distribution:
F(x) = 1 - {600,000 / (x + 600,000)}3 , x > 0.
Calculate the expected value of his bonus.
(A) 16,700 (B) 31,500 (C) 48,300 (D) 50,000 (E) 56,600
31.73 (3, 11/00, Q.27 & 2009 Sample Q.116) (2.5 points) Total hospital claims for a health plan
were previously modeled by a two-parameter Pareto distribution with α = 2 and θ = 500.
The health plan begins to provide financial incentives to physicians by paying a bonus of 50% of
the amount by which total hospital claims are less than 500.
No bonus is paid if total claims exceed 500.
Total hospital claims for the health plan are now modeled by a new Pareto distribution
with α = 2 and θ = K. The expected claims plus the expected bonus under the revised
model equals expected claims under the previous model.
Calculate K.
(A) 250 (B) 300 (C) 350 (D) 400 (E) 450
31.74 (3, 11/02, Q.37 & 2009 Sample Q.96) (2.5 points) Insurance agent Hunt N. Quotum will
receive no annual bonus if the ratio of incurred losses to earned premiums for his book of business is
60% or more for the year. If the ratio is less than 60%, Huntʼs bonus will be a percentage of his
earned premium equal to 15% of the difference between his ratio and 60%. Huntʼs annual earned
premium is 800,000. Incurred losses are distributed according to the Pareto distribution,
with θ = 500,000 and α = 2. Calculate the expected value of Huntʼs bonus.
(A) 13,000 (B) 17,000 (C) 24,000 (D) 29,000 (E) 35,000
31.75 (1 point) In the previous question, (3, 11/02, Q. 37), calculate the expected value of Huntʼs
bonus, given that Hunt receives a (positive) bonus.
(A) 46,000 (B) 48,000 (C) 50,000 (D) 52,000 (E) 54,000
31.76 (CAS3, 11/03, Q.21) (2.5 points)

The cumulative loss distribution for a risk is F(x) = 1 - 106 / (x + 103 )2 .
A. 10% B. 12% C. 17% D. 34% E. 41%
31.77 (SOA3, 11/03, Q.3 & 2009 Sample Q.84) (2.5 points) A health plan implements an
incentive to physicians to control hospitalization under which the physicians will be paid a bonus B
equal to c times the amount by which total hospital claims are under 400 (0 ≤ c ≤ 1). The effect the
incentive plan will have on underlying hospital claims is modeled by assuming that the new total
hospital claims will follow a two-parameter Pareto distribution with α = 2 and θ = 300. E(B) = 100.
Calculate c.
(A) 0.44 (B) 0.48 (C) 0.52 (D) 0.56 (E) 0.60
31.78 (SOA3, 11/04, Q.7 & 2009 Sample Q.123) (2.5 points) Annual prescription drug costs
are modeled by a two-parameter Pareto distribution with θ = 2000 and α = 2.
A prescription drug plan pays annual drug costs for an insured member subject to the
following provisions:
(i) The insured pays 100% of costs up to the ordinary annual deductible of 250.
(ii) The insured then pays 25% of the costs between 250 and 2250.
(iii) The insured pays 100% of the costs above 2250 until the insured has paid 3600 in total.
(iv) The insured then pays 5% of the remaining costs.
Determine the expected annual plan payment.
(A) 1120 (B) 1140 (C) 1160 (D) 1180 (E) 1200
31.79 (CAS3, 11/05, Q.22) (2.5 points) An insurance agent gets a bonus based on the
underlying losses, L, from his book of business.
L follows a Pareto distribution with parameters α = 3 and θ = 600,000.
His bonus, B, is calculated as (650,000 - L)/3 if this quantity is positive and 0 otherwise.
Calculate his expected bonus.
A Less than 100,000
E. At least 160,000
31.80 (SOA M, 11/05, Q.14) (2.5 points) You are given:

(i) T is the future lifetime random variable.
(ii) h(t) = µ, t ≥ 0, where h(t) is the hazard rate.
(iii) Var[T] = 100.
Calculate E[T ∧ 10].
(A) 2.6 (B) 5.4 (C) 6.3 (D) 9.5 (E) 10.0
31.81 (CAS3, 5/06, Q.37) (2.5 points) Between 9 am and 3 pm Big National Bank employs 2
tellers to service customer transactions. The time it takes Teller X to complete each transaction
follows an exponential distribution with a mean of 10 minutes. Transaction times for Teller Y follow an
exponential distribution with a mean of 15 minutes. Both Teller X and Teller Y are continuously busy
while the bank is open.
On average every third customer transaction is a deposit and the amount of the deposit follows a
Pareto distribution with parameter α = 3 and θ = $5000.
Each transaction that involves a deposit of at least $7500 is handled by the branch manager.
Calculate the expected total deposits made through the tellers each day.
E. At least $37,500
For a special investment product, you are given:
(i) All deposits are credited with 75% of the annual equity index return, subject to a minimum
guaranteed crediting rate of 3%.
(ii) The annual equity index return is normally distributed with a mean of 8%
and a standard deviation of 16%.
(iii) For a random variable X which has a normal distribution with mean µ and standard deviation σ,
you are given the following limited expected values:
E[X ∧ 3%]
µ = 6% µ = 8%
σ = 12% -0.43% 0.31%
σ = 16% -1.99% -1.19%
E[X ∧ 4%]
µ = 6% µ = 8%
σ = 12% 0.15% 0.95%
σ = 16% -1.43% -0.58%
Calculate the expected annual crediting rate.

(A) 8.9% (B) 9.4% (C) 10.7% (D) 11.0% (E) 11.6%
31.83 (SOA M, 11/06, Q.31 & 2009 Sample Q.286) (2.5 points) Michael is a professional
stuntman who performs dangerous motorcycle jumps at extreme sports events around the world.
The annual cost of repairs to his motorcycle is modeled by a two parameter Pareto distribution
with θ = 5000 and α = 2.
An insurance reimburses Michaelʼs motorcycle repair costs subject to the following provisions:
(i) Michael pays an annual ordinary deductible of 1000 each year.
(ii) Michael pays 20% of repair costs between 1000 and 6000 each year.
(iii) Michael pays 100% of the annual repair costs above 6000 until Michael has paid 10,000 in
out-of-pocket repair costs each year.
(iv) Michael pays 10% of the remaining repair costs each year.
Calculate the expected annual insurance reimbursement.
(A) 2300 (B) 2500 (C) 2700 (D) 2900 (E) 3100
31.1. C. The distribution is an Exponential Distribution with θ = 1/2.
For the Exponential Distribution E[X ∧ x] = θ (1 - e-x/θ).

The average size of the capped losses is: E[X ∧ 1] = (1/2)(1 - e-2) = 0.432.
Thus the expected annual total loss payments on a basic limits policy are: (13)(0.432) = 5.62.
Alternately, one can use the relation between the mean excess loss and the Limited Expected
Value: e(x) = { mean - E[X ∧ x] } / {1 - F(x)}, therefore E[X ∧ x] = mean - e(x){1 - F(x)}.
For the Exponential Distribution, the mean excess loss is a constant = θ = mean.
Therefore E[X ∧ x] = mean - e(x){1 - F(x)} = θ - θ(e-x/ θ). Proceed as before.
31.2. B. mean = exp(µ + σ2/2) = 98,716. Therefore, with 200 claims expected per year, the
expected total cost per year is: (200)(98716) = $19.74 million.
31.3. A. E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}.

E[X ∧ 1 mill.] = exp(7 + 9/2)Φ[(ln(1000000) - 7 - 9 )/3] + (1000000){1 - Φ[(ln(1000000) − 7)/3]}
= (98,716)Φ[-0.73] + (100000)(1−Φ[2.27]) = (98,716)(1 - 0.7673) + (100,000)(1 - 0.9884)
= 22,971 + 11,600 = 34,571.
With a limit of $1 million per claim and 200 claims expected per year, the expected total cost per
year is: 200 E[X ∧ 1 million] = (200)(34,571) = $6.91 million.
31.4. E. E[X ∧ 5 million] = exp(7 + 9/2)Φ[(ln(5000000) - 7 - 9 )/3] +

(5000000) {1 - Φ[(ln(5000000) - 7)/3]} = (98716)Φ[-0.19] + (500000)(1 - Φ[2.81])
= (98716)(1 - 0.5753) + (500,000)(1 - 0.9975) = 41,925 + 12,500 = 54,425.
200 E[X ∧ 5 million] = $10.88 million.
31.5. D. The dollars in the layer from $1 million to $5 million is the difference between the dollars
limited to $5 million and the dollars limited to $1 million. Using the answers to the two previous
questions: $10.88 million - $6.91 million = $3.97 million.
Comment: In terms of the limited expected values and the expected number of losses N, the
dollars in the layer from $1 million to $5 million equals: N{E[X ∧ 5 million] - E[X ∧ 1 million]}.
In this case N = 200.
31.6. C. The dollars excess of $5 million per loss is the difference between the total cost and the
cost limited to $5 million per loss. Using the answers to two prior questions:
$19.74 million - $10.88 million = $8.86 million.
Comment: The dollars excess of $5 million per losses equals:
N{E[X ∧ ∞] - E[X ∧ 5 million]} = N{mean - E[X ∧ 5 million]}. In this case N = 200 losses.
31.7. D. First calculate the dollars of loss on these losses per total number of losses:
{E[X ∧ 5 million] - 5 million S(5 million)} - {E[X ∧ 1 million] - 1 million S(1 million)} =
{54,425 - (5 million)(1 - 0.9975)} - {34,517 - (1 million)(1 - 0.9884)} = 41,925 - 22,917 = $19,008.
Then divide by the probability of a loss being of this size:
F(5 million) - F(1 million) = Φ[(ln(5000000) - 7)/3] - Φ[(ln(1000000) - 7)/3] =
Φ[2.81] - Φ[2.27] = (0.9975 - 0.9884) = 0.0091. $19,008 / 0.0091 = $2.09 million.
31.8. C. Either one can calculate the expected number of losses of this size per year as
(200){F(5 million) - F(1 million)} = (200){0.9975 - 0.9884} = 1.8 and multiply by the average size
calculated in the previous question. (1.8)($2.09 million) = $3.8 million. Alternately, one can multiply
the expected number of losses per year times the dollars on these losses per loss calculated in a
previous question: (200)($19,008) = $3.8 million.
31.9. E. For the Pareto Distribution, E[X ∧ x] = {θ/(α-1)} {1-(θ/(θ+x))α−1}.

E[X ∧ 250000] = {15000/1.5} {1 - (15000/(15000+250000))2.5-1} = 10000(0.9865) = 9865.
31.10. E. The mean of a Pareto is: θ/(α-1) = 1800/3 = 600.
31.11. B. For the Pareto Distribution, E[X ∧ x] = {θ/(α-1)} {1 - (θ/(θ+x))α−1}.

E[X ∧ 900] = {1800/3} {1 - (1800/(1800+900))3 } = 422.22.
31.12. D. For the Pareto Distribution, E[X ∧ x] = {θ/(α-1)} {1 - (θ/(θ+x))α−1}.
E[X ∧ 225] = {1800/3} {1 - (1800/(1800+225))3} = 178.60.

31.13. B. The average weekly wage is $600, from a previous solution. Thus the maximum benefit
is $600, while the minimum benefit is $600/4 = $150. These correspond to pre-injury wages of
$600/(2/3) = $900 and $150/(2/3) = $225 respectively. (If a workers pre-injury wage is more than
$900 his benefit is only $600. If his pre-injury wage is less than $225, his benefit is still $150.)
Let x be the workerʼs pre-injury wage, then the workerʼs benefits are:
$150 if x ≤ $225, 2x/3 if x ≥ $225 and x ≤ $900, $600 if x ≥ $900.
Thus the average benefit is made up of three terms (low, medium, and high wages):
900
150 F(225) + (2/3) ∫225 x f(x) dx + 600 S(900)
900 900 225
∫225 x f(x) dx = ∫0 x f(x) dx -

∫0 x f(x) dx = E[X ∧ 900] - 900S(900) - {E[X ∧ 225] - 225S(225)}.
Thus the average benefit is:

150F(225) + 150S(225) + 600S(900) - 600S(900) + (2/3)(E[X ∧ 900] - E[X ∧ 225]) =
150 + (2/3)(E[X ∧ 900] -E[X ∧ 225]) = 150 + (2/3)(422.22 - 178.60) = 312.41.
Alternately, the benefits can be described as:
150 + (2/3)(layer of wages between 900 and 225) = 150 + (2/3)(E[X ∧ 900] -E[X ∧ 225]).
Comment: Extremely unlikely to be asked on the exam. Relates to the calculation of Law
Amendment Factors used in Workersʼ Compensation Ratemaking. Geometrically oriented
students may benefit by reviewing the subsection on payments subject to both a minimum and
a maximum in the subsequent section on Lee Diagrams.
31.14. E. E[X] = θ = 10,000.
31.15. C. E[X ∧ 25000] = 10,000 (1-e-25000/10000) = 9179.
31.16. B. E[X ∧ x] = θ (1 - e-x/θ) = 10,000 (1 - e-1000/10000) = 952.

E[X] - E[X ∧ 1000] = 10,000 - 952 = 9048.
31.17. E. {E[X] - E[X ∧ 1000]}/S(1000) = (10,000 - 952)/0.9048 = 10,000.

Alternately, the average size of the data truncated and shifted from below is the mean excess loss.
For the Exponential e(x) = θ = 10,000.
Comment: For the Exponential, {E[X] - E[X ∧ x]}/S(x) = θ. Thus for the Exponential Distribution, in the
absence of any maximum covered loss, the average size of the insurerʼs payment per non-zero
payment by the insurer does not depend on the deductible amount; the mean excess loss is
constant for the Exponential Distribution.
31.18. A. E[X ∧ 25000] = 10,000 (1 - e-25000/10000) = 9179.

E[X ∧ 25000] - E[X ∧ 1000] = 9179 - 952 = 8,227.
31.19. B. {E[X ∧ 25000] - E[X ∧ 1000]} / S(1000) = (9179 - 952)/0.9048 = 9093.
31.20. E. Each payment is 75% of the insuredʼs loss, so the average is:
(0.75)E[X] = (0.75)(10,000) = 7500.
31.21. B. Each payment is 75% of the what it would have been without any coinsurance, so the
average is: (0.75)(E[X] - E[X ∧ 1000]) = (0.75)(10,000 - 952) = 6786.
31.22. C. Each payment is 75% of the what it would have been without any coinsurance, so the
average is (0.75)(E[X ∧ 25000] - E[X ∧ 1000])/ S(1000) = (0.75)(9179 - 952)/0.9048 = 6819.
31.23. D. Average Size of Losses in the Interval [1000, 25000] =

{E[X ∧ 25000] - 25000S(25000) - (E[X ∧ 1000] - 1000S(1000))}/ {F(25000) - F(1000)} =
{9179 - 25000(0.08208) - (952 - (1000)(0.9048))} / (0.9179 - 0.0952) = 7080/0.8227 = 8606.
31.24. A. {E[X ∧ 25000] - 25000S(25000) - (E[X ∧ 1000] - 1000S(1000))} / E[X] =

{9179 - 25000(0.08208) - (952 - (1000)(0.9048))} / 10,000 = 7080/10,000 = 70.8%.
31.25. C. The payments from 500 to 4000 correspond to losses of size between 1500 and
5000. These losses have average size:
{E[X ∧ 5000] - 5000S(5000) - (E[X ∧ 1500] - 1500S(1500))} / {F(5000) - F(1500)} =
{3935 - 5000(0.6065) - (1393 - (1500)(0.8607))} / (0.3935 - 0.1393) = 3148.
The average size of the payments is 1000 less: 3148 - 1000 = 2148.
31.26. B. The payments from 500 to 4000 correspond to losses of size between
1000 + (500/.75) = 1667 and 1000+(4000/.75) = 6333. These losses have average size:
{E[X ∧ 6333] - 6333S(6333) - (E[X ∧ 1667] - 1667S(1667))} / {F(6333) - F(1667)} =
{4692 - 6333(0.5308) - (1535 - (1667)(0.8465))} / (0.4692 - 0.1535) = 3820.
The average size of the payments is 1000 less and then multiplied by 0.75:
(3820 - 1000)(0.75) = 2115.
31.27. B. The most the insurer will pay is: (0.75)(25,000 - 1000) = 18,000.
For any loss of size greater than or equal to 25,000 the insurer pays 18,000.
Let X be the size of loss.
Then the payment is 18,000 if X ≥ 25,000, and (0.75)(X - 1000) if 25,000 > X > 1000.
A payment of 15,000 corresponds to a loss of: (15,000/0.75) + 1000 = 21,000.
Thus the dollars of payments greater than 15,000 and at most 19,000 is the payments on losses of
size greater than 21,000, which we split into two pieces:
25,000
∫ 0.75(x - 1000) f(x) dx + 18,000 S(25000) =

21,000
25,000
0.75 ∫ x f(x) dx - 750{F(25000)-F(21000)} + 18,000S(25000) =
21,000
0.75 {E[X ∧ 25000] - 25000S(25000) - (E[X ∧ 21000] - 21000S(21000)}

+ 750{S(25000) - S(21000)} + 18,000S(25000) =
0.75E[X ∧ 25000] - 0.75E[X ∧ 21000] + 15,750S(21000) - 18,750S(25000) - 750S(21000)
+ 750S(25000) + 18,000S(25000) =
0.75E[X ∧ 25000] - 0.75E[X ∧ 21000] + 15,000S(21000) =
0.75(9179.2) - (0.75)(8775.4) + 15,000(0.12246) = 2139.8.
In order to get the average size we need to divide the payments by the percentage of the number
of losses represented by losses greater than 21,000, S(21,000) = 0.12246:
2139.8 / 0.12246 = 17,473.
Comment: Long and difficult. In this case it may be easier to calculate the integral of xf(x)dx, rather
than put it in terms of the Limited Expected Values and Survival Functions.
31.28. D. 0.75{E[X ∧ 25000] - E[X ∧ 1000]} = (0.75)(9179 - 952) = 6170.

31.29. B. A loss ratio of 60% corresponds to (0.6)(40000) = $24,000 in losses.

If his losses are x, and x < 24,000, then he gets a dividend of (1/4)(24,000 - x).
The expected dividend is:
24,000
(1/4) ∫0 (24,000 - x) f(x) dx = (1/4){24000 F(24000) - (E[X ∧ 24000] - 24000S(24000))}
= (1/4){24000 - (E[X ∧ 24000]}. For a LogNormal Distribution,

E[X ∧ x] = exp(µ + σ2/2) Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}. Therefore,
E[X ∧ 24000] = exp(6 + 9/2)Φ[(ln24000 - 6 - 9)/3] + 24000 {1 - Φ[(ln24000 - 6)/3]} =
(36316)Φ[-1.64] + (24000){1 - Φ[1.36]} = (36,316)(1 - 0.9495) + (24,000)(1 - 0.9131) = 3920.
Therefore, his expected dividend is: (1/4)(24000 - 3920) = $5020.
31.30. E. If x < 20, then Frostbite Falls pays the state fund 1000(20 - x).
The expected amount by which x is less than 20, (the “savings” at 20), is: 20 - E[X ∧ 20].
E[X ∧ 20] = (8 + 10 + 16 + 100)/8 = 16.75.
Therefore, the expected amount paid by the town to the state fund per winter is:
(1000)(20 - E[X ∧ 20]) = 3250.
If x > 50, then the state fund pays Frostbite Falls 1000(x - 50). The expected amount by which x is
more than 50, (the inches of snow excess of 50), is: E[X] - E[X ∧ 50].
E[X] = (8 + 10 + 16 + 21 + 35 + 57 + 70 + 90)/8 = 38.375.
E[X ∧ 50] = (8 + 10 + 16 + 21 + 35 + 150)/8 = 30.
Therefore, the expected amount paid by the state fund to the town per winter is:
(1000)(E[X] - E[X ∧ 50]) = (1000)(38.375 - 30) = 8375.
Expected amount state fund pays town minus expected amount town pays the state fund is:
8375 - 3250 = 5125.
Alternately, one can list what happens in each possible situation:
Snow Paid by State
8 -12,000
10 -10,000
16 -4,000
21 0
35 0
57 7,000
70 20,000
90 40,000
Average 5,125
Comment: (12 + 10 + 4)/8 = 3.250 = 20 - E[X ∧ 20]. (7 + 20 + 40)/8 = 8.375 = E[X] - E[X ∧ 50].
A very simplified example of retrospective rating. See for example, “Individual Risk Rating,” by
Margaret Tiller Sherwood in Foundations of Casualty Actuarial Science.
31.31. A. If x < 20, then Frostbite Falls pays the state fund 1000(20 - x).
The expected amount by which x is less than 20, (the “savings” at 20), is: 20 - E[X ∧ 20].
E[X ∧ 20] = exp(µ + σ2/2) Φ[(ln20 − µ − σ2)/σ] + 20{1 - Φ[(ln20 − µ)/σ]} =
(33.954)Φ[-1.10] + (20){1 - Φ[.40]) = (33.954)(0.1357) + (20)(1 - 0.6554) = 11.50.
Therefore, the expected amount paid by the town to the state fund per winter is:
(1000)(20 - E[X ∧ 20]) = 8500.
If x > 50, then the state fund pays Frostbite Falls 1000(x - 50). The expected amount by which x is
more than 50, (the inches of snow excess of 50), is: E[X] - E[X ∧ 50].
E[X] = exp(µ + σ2/2) = 33.954.
E[X ∧ 50] = exp(µ + σ2/2) Φ[(ln50 − µ − σ2)/σ] + 50{1 - Φ[(ln50 − µ)/σ]} =

(33.954)Φ[-0.49] + (20){1 - Φ[1.01]) = (33.954)(0.3121) + (50)(1 - 0.8438) = 18.41.
Therefore, the expected amount paid by the state fund to the town per winter is:
(1000)(E[X] - E[X ∧ 50]) = (1000)(33.954 - 18.41) = 15,544.
Expected amount state fund pays town minus expected amount town pays the state fund is:
15,544 - 8500 = 7,044.
Comment: In the following Lee Diagram, other than the constant c = 1000, the expected amount
paid by the town to the state (when there is little snow) corresponds to Area A, below a horizontal
line at 20 and above the curve. Other than the constant c = 1000, the expected amount paid by the
state to the town (when there is a lot of snow) corresponds to Area B, above a horizontal line at 50
and below the curve.
200
150
100
50
0.2 0.4 0.6 0.8 1

31.32. C. E[(N-3)+] = E[N] - E[N ∧ 3] = λ - (Prob[N = 1] + 2Prob[N = 2] + 3Prob[N ≥ 3]) =
λ - λe−λ - λ2e−λ - (3)(1 - e−λ - λe−λ - λ2e−λ/2) = λ + 3e−λ + 2λe−λ + λ2e−λ/2 - 3 =

2.5 + 3e-2.5 + 2(2.5)λe-2.5 + (2.52 )e-2.5/2 - 3 = 0.413.
Alternately, E[(N-3)+] = E[(3-N)+] + E[N] - 3 = 3Prob[N = 0] + 2Prob[N = 1] + Prob[N = 2] + λ - 3 =
λ + 3e−λ + 2λe−λ + λ2e−λ/2 - 3 = 0.413.
31.33. C. The expected amount by which lifetimes are less than 2 is:
2 - E[X ∧ 2] = 2 - (6)(1 - e-2/6) = 0.2992.
The expected amount paid per battery is: (100)(0.2992/2) = 14.96.
31.34. B. For the LogNormal Distribution,

E[X ∧ 30 million] =
exp(15 + 22 /2)Φ[(ln30000000 - 15 - 22 )/2] + 30000000 {1 - Φ[(ln30000000 - 15)/2]} =
24,154,953 Φ[-0.89] + 30,000,000 {1 - Φ[1.11]} =
(24,154,953)(0.1867) + (30,000,000)(1 - 0.8665) = 8.51 million.
E[X ∧ 150 million] =
exp(15 + 22 /2) Φ[(ln150000000 - 15 - 22 )/2] + 150000000 {1 - Φ[(ln150000000 - 15)/2]} =
24154953 Φ[-0.09] + 150000000 {1 - Φ[1.91]} =
(24,154,953)(0.4641) + (150,000,000)(1 - 0.9719) = 15.43 million.
The maximum payment of $90 million correspond to a loss by XYZ of: 30 + 90/0.75 =
150 million. Therefore the average payment to XYZ per hurricane is:
0.75 (E[X ∧ 150 million] - E[X ∧ 30 million]) = (0.75)(15.43 - 8.51) = 5.2 million.
Comment: The portion of hurricanes on which XYZ receives non-zero payments is:
S(30 million) = 1 - Φ[(ln30000000 - 15)/2] = 1 - Φ[1.11] = 0.01335.
Therefore, the average payment per nonzero payment is: (0.75)(15.43 - 8.51) / 0.1335 =
38.9 million. A very simplified version of the Florida Hurricane Catastrophe Fund.
31.35. C. Per loss, the insurer would pay the layer from 5,000 to 25,000, which is:
E[X ∧ 25,000] - E[X ∧ 5,000]. For the Pareto: E[X ∧ x] = {θ/(α-1)} {1-(θ/(θ+x))α−1} =
10000 {1 - (40000/(40000+x))4 }. E[X ∧ 25,000] = 10000 {1 - (40/65)4 } = 8566.
E[X ∧ 5,000] = 10000 {1 - (40/45)4 } = 3757. E[X ∧ 25,000] - E[X ∧ 5,000] = 8566 - 3757 = 4809.
Three losses expected per year, thus the insurerʼs expected payment is: (3)(4809) = 14,427.
31.36. E. Without the feature that the insurer pays the entire loss (up to 25,000) for each loss
greater than 5,000, the insurer would pay the layer from 5,000 to 25,000, which is:
E[X ∧ 25,000] - E[X ∧ 5,000]. As calculated in the solution to the previous question,
E[X ∧ 25,000] - E[X ∧ 5,000] = 8566 - 3757 = 4809. However, that extra provision adds 5,000 per
large loss, or 5,000S(5000) = 5,000{θ/(θ+5000)}α = 5000 (40/45)5 = 2775. Thus per loss the
insurer pays: 5,000(S(5000) + E[X ∧ 25,000] - E[X ∧ 5,000] = 2775 + 4809 = 7584.
There are three losses expected per year, thus the insurerʼs expected payment is:
(3)(7584) = 22,752.
31.37. E. The expected losses within the layer 5,000 to 50,000 is:
50,000 50,000
∫ S(x) dx =
∫ e - x / 20,000 dx = 13,934.
5000 5000
The percent of expected losses within the layer 5,000 to 50,000 is: 13934/20000 = 69.7%.
Alternately, for the Exponential Distribution, LER(x) = 1 - e-x/θ.
LER(50000) - LER(5000) = e-5000/20000 - e-50000/20000 = e-0.25 - e-2.5 = 69.7%.
31.38. D. E[X] = exp(µ + σ2/2) = e9.9 = 19,930.
E[X ∧ 5000] = exp(µ + σ2/2) Φ[(ln5000 - µ - σ2)/σ] + 5000{1 - Φ[(ln5000 - µ)/σ]} =

(19930)Φ[-1.88] + (5000){1 - Φ[-0.88]) = (19930)(0.0301) + (5000)(0.8106) = 4653.
E[X ∧ 50000] = exp(µ + σ2/2) Φ[(ln50000 - µ - σ2)/σ] + 50000{1 - Φ[(ln50000 - µ)/σ]} =
(19930)Φ[0.42] + (50000)(1 - Φ[1.42]) = (19930)(0.6628) + (50000)(1 - 0.9222) = 17,100.
The percent of expected losses within the layer 5,000 to 50,000 is:
(E[X ∧ 50,000] - E[X ∧ 5000])/E[X] = (17,100 - 4653)/19,930 = 62.5%.
31.39. C. E[X ∧ x] = {θ/(α-1)} {1 - (θ/(θ + x))α−1} = 20000{1 - (40000/(40000 + x))2 }.

E[X ∧ 5000] = 4198. E[X ∧ 50,000] = 16,049. E[X] = θ/(α−1) = 20,000.
(E[X ∧ 50,000] - E[X ∧ 5000])/E[X] = (16049 - 4198)/20,000 = 59.3%.
31.40. A. E[(N - 3)+] = E[N] - E[N ∧ 3] = β - (Prob[N = 1] + 2Prob[N = 2] + 3Prob[N ≥ 3]) =
β - β/(1+β)2 - 2β2/(1+β)3 - 3β3/(1+β)3 = {β(1+β)3 - β(1+β) - 2β2 - 3β3}/(1+β)3 = β4/(1+β)3

= 2.54 /3.53 = 0.911.
Alternately, E[(N-3)+] = E[(3-N)+] + E[N] - 3 = 3Prob[N = 0] + 2Prob[N = 1] + Prob[N = 2] + β - 3 =
β + 3/(1+β) + 2β/(1+β)2 + β2/(1+β)3 - 3 = 2.5 + 3/3.5 + (2)(2.5)/3.52 + 2.52 /3.53 - 3 = 0.911.
Alternately, the Geometric shares the memoryless property of the Exponential ⇒
E[(N-3)+] / Prob[N ≥ 3] = E[N] = β. ⇒ E[(N-3)+] = β Prob[N≥3] = β β3/(1+β)3 = β4/(1+β)3 = 0.911.
Comment: For integral j, for the Geometric, E[(N - j)+] = βj+1/(1+β)j.
31.41. E. For Policy A the average payment per loss is: E[X] - E[X ∧ 3000] =
θ/(α-1) - {θ/(α-1)} {1 - (θ/(θ+3000))α−1} = 6000(12/15)2 = 6000(.64).
For Policy B the average payment per loss is: E[X ∧ u] = {θ/(α-1)} {1 - (θ/(θ+u))α−1} =
6000{1 - (12000/(12000+u))2 }. Setting this equal to 6000(0.64):
6000(0.64) = 6000{1 - (12000/(12000+u))2 } ⇒ (12000/(12000+u))2 = 0.36. ⇒ u = 8000.
31.42. B. E[(25 - X)+ ] = (25 - 5)(80%) + (0)(20%) = 16 > 8. ⇒ y must be less than 25.
Therefore, E[(y - X)+ ] = (0.8)(y - 5) = 8. ⇒ d = 15.
31.43. D. E[Y] = E[(1 - X)+] = 1 - E[X ∧ 1] = 1 - θ(1 - e−1/θ) = 1 - 2(1 - e-1/2) = 0.213.
1
x=1
Alternately, E[Y] = ∫0 (1 - x) e- x / 2 / 2 dx = -e- x / 2 + xe- x / 2 + 2e- x / 2 ]
x= 0
= 2e-1/2 - 1 = 0.213.
31.44. D. Since by definition E[R] = 1, the LogNormal Distribution has mean of 1.

exp[µ + σ2/2] = 1. ⇒ µ = -σ2/2 = -0.08.
Percentage of overall wages earned by workers with R < 2 is:
{E[X ∧ 2] - 2S(2)} / E[X] = Φ[(ln2 - µ - σ2)/σ] = Φ[(ln2 + 0.08 - 0.42 )/0.4] = Φ[1.53] = 93.7%.
Comment: Such wage tables are used to price the impact of changes in the laws governing
Workers Compensation Benefits.
31.45. B. Each loss below 50 is counted as its size, while each of the 19 losses ≥ 50 counts as 50.
E[X ∧ 50] =
{6 + 7+ 11+ 14 + 15+17+ 18 + 19 + 25+ 29 + 30 + 34 + 40 + 41 + 48 + 49 + (19)(50)} / 35 =
(403 + 950)/35 = 38.66.
31.46. C. 100/2% = 5000. If the profit is less then 5000, then Alex gets 100.
Thus we want: E[Max[0.02 X, 100]] = 0.02 E[max[X, 5000]].
∞ ∞ 5000
E[Max[X, 5000]] = 5000F(5000) + ∫ xf(x)dx = 5000F(5000) + ∫ xf(x)dx - ∫ xf(x)dx
5000 0 0
= 5000F(5000) + E[X] - {E[X ∧ 5000] - 5000S(5000)} = 5000 + E[X] - E[X ∧ 5000].

0.02 E[Max[X, 5000] = 100 + 0.02(E[X] - E[X ∧ 5000]).
Alternately, let Y = max[X, 5000]. Y - 5000 = 0 if X ≤ 5000, and Y - 5000 = X - 5000 if X > 5000.
⇒ E[Y] = 5000 + E[X] - E[X ∧ 5000].

Expected value of Alexʼs pay is: 0.02E[Y] = 100 + 0.02(E[X] - E[X ∧ 5000]).
Comment: Similar to SOA M, 11/06, Q.20.
31.47. B. F is a Pareto Distribution with α = 3 and θ = 20,000.

E[X] = 20,000/(3 - 1) = 10,000.
E[X ∧ 5000] = (10,000)(1 - {20,000/(20,000 + 5000)}2 ) = 3600.
Alexʼs expected payment is: 100 + .02(E[X] - E[X ∧ 5000]) = 100 + (0.02)(10000 - 3600) = 228.
31.48. C. Γ[3 ; 5] = 1 - e-5(1 + 5 + 52 /2) = 0.875. Γ[4 ; 5] = 1 - e-5(1 + 5 + 52 /2 +53 /6) = 0.735.
For the Gamma Distribution, E[X ∧ 500] = (αθ)Γ[α+1 ; 500/θ] + 500 {1 - Γ[α ; 500/θ]} =
300Γ[4 ; 5] + 500{1 - Γ[3 ; 5]} = (300)(0.735) + (500)(1 - 0.875) = 283.
31.49. A. E[X ∧ x] = αθ/(α - 1) - θ3/{(α - 1)xα−1}.

E[L ∧ 170,000] = (3)(100,000)/(3 - 1) - 1000003 /{(3 - 1) 1700003-1} = 132,699.
E[(170,000 - L)+] = 170,000 - E[L ∧ 170000] = 170,000 - 132,699 = 37,301.
E[Bonus] = E[(170,000 - L)+/4] = 37,301/4 = 9325.
31.50. His bonus is positive when L < 170,000.

⎛ 100,000 ⎞ 3
F(170,000) = 1 - ⎜ ⎟ = 0.79646.
⎝ 170,000 ⎠
E[Bonus | Bonus > 0] = E[Bonus] / Prob[Bonus > 0] = E[Bonus] / F(170,000) =

9325 / 0.79646 = 11,708.
1000
31.51. A. E[X ∧ 1000] = ∫0 x f(x) dx + 1000 S(1000) = 350.61 + (1000)(1 - 0.87175) =
478.86.
Comment: Based on a LogNormal Distribution with µ = 6.0 and σ = 0.8.
31.52. D. For this Pareto Distribution, E[X ∧ x] = (5000/2) {1 - 50002 /(5000 + x)2 } .
E[X ∧ 2000] = 1224. E[X ∧ 10000] = 2222. E[X] = θ/(α-1) = 2500.
(75%) E[X ∧ 2000] + (90%)(E[X] - E[X ∧ 10000]) = (75%)(1224) + (90%)(2500 - 2222) = 1168.
31.53. E. The mean of the Single Parameter Pareto is: αθ/(α - 1) = (1.5)(20)/(1.5 - 1) = 60.
Thus we want the average size of loss for those losses of size greater than 60.
E[X ∧ x] = αθ/(α - 1) - θα / {xα−1 (α - 1)}.
E[X ∧ 60] = (1.5)(20)/(1.5 - 1) - 201.5 / {601.5-1 (1.5 - 1)} = 36.906.
Average size of loss for those losses of size greater than 60 is:
{E[X] - (E[X ∧ 60] - 60S(60))}/S(60) = (60 - 36.906)/{(20/60)1.5} + 60 = 180.
Taking into account the 50% increase: (1.5)(180) = 270.
Alternately, the average size of those losses of size greater than 60 is:
∞
∞ ∫ x 1.5 201.5 x - 2.5 dx ∞
∫60 x f(x) dx / S(60) = 60

(20 / 60)1.5
= (1.5) (601.5) ∫60 x - 1.5 dx
= (1.5)(601.5) (2)(60-0.5) = 180. Taking into account the 50% increase: (1.5)(180) = 270.
31.54. The contribution of the small losses, those losses of size less than x is: E[X ∧ x] - x S(x).
E[X ∧ x] - x S(x)
The percentage of loss dollars from those losses of size less than x is: .
E[X]
ln[x] - µ - σ 2
For the LogNormal Distribution, E[X ∧ x] - x S(x) = exp[µ + σ2/2] Φ[ ].
σ
E[X ∧ x] - x S(x) ln[x] - µ - σ 2
Thus for the LogNormal Distribution, G(x) = = Φ[ ].
E[X] σ
G(x) is also LogNormal with parameters: µ + σ2, and σ.

31.55. 0.25 = 1 - exp[-Q0.25 / θ]. ⇒ Q0.25 = θ ln[4/3].
0.75 = 1 - exp[-Q0.75 / θ]. ⇒ Q0.75 = θ ln[4].

θln[4] x = θln[4]
∫ x exp[-x / θ] dx = -x exp[-x / θ] - θ exp[-x / θ] ]

x = θln[4 / 3]
= θ ln[4/3] 3/4 + θ 3/4 - θ ln[4]/4 - θ/4
θln[4/ 3]
= θ{1/2 + ln[4]/2 - ln[3] 3/4}.

One half of the total probability is between the first and third quartile.
Trimmed Mean = θ{1/2 + ln[4]/2 - ln[3] 3/4} / (1/2) = θ{1 + ln[4] - ln[3] 3/2} = 0.7384 θ.
Alternately, E[X ∧ x] = θ (1 - e-x/θ).

E[X ∧ Q0.25] = 0.25 θ. E[X ∧ Q0.75] = 0.75 θ.
The average size of those losses of size between Q0.25 and Q0.75 is:
{E[X ∧ Q0.75] - Q0.75 S(Q0.75 )} - {E[X ∧ Q0.25] - Q0.25 S(Q0.25)}
=
F(Q0.75) - F(Q0.25)
{0.75θ - (ln[4]θ)(0.25)} - {0.25θ - (ln[4 / 3]θ)(0.75)} θ {0.5 + 0.5 ln[4] - 0.75 ln[3]}
= =
0.75 - 0.25 1/2
θ{1 + ln[4] - ln[3] 3/2} = 0.7384 θ.
Comment: Here the trimmed mean excludes 25% probability in each tail.
One could instead for example exclude 10% probability in each tail
The trimmed mean could be applied to a small set of data in order to estimate the mean of the
distribution from which the data was drawn. For a symmetric distribution such as a Normal
Distribution, the trimmed mean would be an unbiased estimator of the mean. If instead you
assumed the data was from a skewed distribution such as an Exponential, then the trimmed mean
would be a biased estimator of the mean. If the data was drawn from an Exponential, then the
trimmed mean divided by 0.7384 would be an unbiased estimator of the mean.
The trimmed mean would be a robust estimator; it would not be significantly affected by unusual
values in the sample. In contrast, the sample mean can be significantly affected by one unusually
large value in the sample.
θ
31.56. For α = 1, S(x) = .
θ + x
x x
θ t= x θ
E[X ∧ x] = ∫0 S(t) dt =
∫ θ + t
dt = θ ln(θ + t)] = θ ln(θ+x) - θ ln(θ) = - θ ln[
t =0
θ + x
].
0
Comment: The mean only exists if α > 1. However, since the values entering its computation are
limited, the limited expected value exists as long as α > 0.
31.57. D. (S - 150)+ - (S - 100)+ is the amount in the layer from 100 to 150 on the index.
This is 1/(100 million) times the layer from 10 billion to 15 billion on catastrophe losses.
(100 million times 150 is 15 billion.)
Thus the payment on the spread is 1/500,000 times the layer from 10 billion to 15 billion on
catastrophe losses.
⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤
For the LogNormal, E[X ∧ x] = exp(µ + σ2/2) Φ ⎢ ⎥ + x {1 - Φ ⎢ ⎥⎦ }
⎣ σ ⎦ ⎣ σ
E[X ∧ 10 billion] =
ln[10 billion] - 20 - 22 ln[10 billion] - 20
exp[20 + 22 /2] Φ[ ] + (10 billion) {1 - Φ[ ]} =
2 2
(3.5849 billion) Φ[-0.49] + (10 billion) {1 - Φ[1.51]} =

(3.5849 billion) (0.3121) + (10 billion) {1 - 0.9345} = 1.774 billion.
E[X ∧ 15 billion] =
ln[15 billion] - 20 - 22 ln[15 billion] - 20
exp[20 + 22 /2] Φ[ ] + (15 billion) {1 - Φ[ ]} =
2 2
(3.5849 billion) Φ[-0.28] + (15 billion) {1 - Φ[1.72]} =

(3.5849 billion) (0.3897) + (15 billion) {1 - 0.9573} = 2.038 billion.
E[X ∧ 15 billion] - E[X ∧ 10 billion]

= (2.038 billion - 1.774 billion) / 500,000 = 528.
500,000
Comment: Not intended as a realistic model of catastrophe losses.
Catastrophe losses would be from hurricanes, earthquakes, etc.
An insurer could hedge its catastrophe risk by buying a lot of these or similar call spreads. An insurer
who owned many of these call spreads, would be paid money in the event of a lot of catastrophe
losses in this region for the insurance industry. This should offset to some extent the insurerʼs own
losses due to these catastrophes, in a manner somewhat similar to reinsurance.
528 is the amount expected to be paid by someone who sold one of these calls (in other words
owned a put.) The probability of paying anything is low, but this person who sold a call could pay
up to a maximum of: (200)(50) = 10,000.
31.58. E[X ∧ 75] = 75 - (75 - 62) Φ[(75-62)/20] - (20) φ[(75-62)/20] = 75 - 13 Φ[0.65] - 20 φ[0.65]
= 75 - (13)(0.7422) - (20) exp[-0.652 /2] / 2 π = 58.89.
31.59. The small values each contribute Qα. Their total contribution is αQα.
The large values each contribute Q1−α. Their total contribution is αQ1−α .
The medium values each contribute their value x.
Q1-α
Their total contribution is: ∫ x f(x) dx =

Q α
E[X ∧ Q1−α] - Q1−α S(Q1−α) - {E[X ∧ Qα] - Qα S(Qα)} =
E[X ∧ Q1−α] - αQ1−α - {E[X ∧ Qα] - (1-α)Qα}.

Thus adding up the three contributions, the Windsorized mean is:
αQ α + αQ 1−α + E[X ∧ Q1−α] - αQ1−α - {E[X ∧ Qα] - (1-α)Qα} =
E[X ∧ Q1−α] - E[X ∧ Qα] + Qα.
For the Exponential, Qα = -θ ln(1 - α). Q 1−α = -θ ln(α).
E[X ∧ x] = θ (1 - e-x/θ). E[X ∧ Qα] = θα. E[X ∧ Q1−α] = θ(1-α).
Thus the Windsorized mean is: θ(1-α) - θα - θ ln(1 - α) = θ {1 - 2α - ln(1-α)}.

Comment: The trimmed mean excludes probability in each tail.
In contrast, the Windsorized mean substitutes for extreme values the corresponding quantile.
The Windsorized mean could be applied to a small set of data in order to estimate the mean of the
distribution from which the data was drawn.
For example if α = 10%, then all values below the 10th percentile are replaced by the 10th
percentile, and all values above the 90th percentile are replaced by the 90th percentile, prior to
taking an average. For a symmetric distribution such as a Normal Distribution, the Windsorized mean
would be an unbiased estimator of the mean. If instead you assumed the data was from a skewed
distribution such as an Exponential, then the Windsorized mean would be a biased estimator of the
mean.
The Windsorized mean would be a robust estimator; it would not be significantly affected by
unusual values in the sample. In contrast, the sample mean can be significantly affected by one
unusually large value in the sample.
For the Exponential, here is a graph of the Windsorized mean divided by the mean, in other words
the Windsorized mean for θ = 1, as a function of alpha:
Wind. Mean
1.0
0.9
0.8
0.7
0.6
alpha
0.1 0.2 0.3 0.4 0.5
As alpha increases, we are substituting for more of the values in the tails.
31.60. a) For the Pareto Distribution:

θ ⎧ ⎛ θ ⎞ α− 1⎫ ⎛ 400,000 ⎞ 2
E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬ = (200,000) {1 - ⎜ ⎟ }.
α −1 ⎩ ⎝ θ + x⎠ ⎭ ⎝ 400,000 + x ⎠
The current cost is proportional to: E[X ∧ 1 million] - E[X ∧ 250,000] =

183,673 - 124,260 = 59,413.
The cost under Bill #1 is proportional to: 1.05 (E[X ∧ 1 million/1.05] - E[X ∧ 250,000/1.05]) =
(1.05)(182,504 - 121,408) = 64,151.
Impact of Bill #1 is: 64,151/59,413 - 1 = 8.0%.
b) Split the reinsurer's coverage into two layers:
250K to 500K and 500K to 1000K.
The second layer is unaffected by Bill #2. The first layer is affected by Bill #2 in the exact same way
as if there had been 10% inflation to all losses.
(If 454,454 < X ≤ 500,000, then after 10% inflation the contribution to the first layer is as if the loss
became $500,000. If X > $500,000, then after 10% inflation the contribution to the first layer is as if
the loss stayed the same.)
Thus the cost under Bill # 2 is:
1.1 E[X ∧ 454,454] - 1.1 E[X ∧ 227,273] + E[X ∧ 1 million] - E[X ∧ 500,000] =
(1.1)(156,179) - (1.1)(118,673) + 183,673 - 160,494 = 64,436.
Impact of Bill #2 is: 64,436/59,413 - 1 = 8.5%.
Alternately, if X is the original cost and Y is the cost under Bill #2, then:
If X ≤ 500,000 / 1.1 = 454,454, then Y = 1.1 X.
If 454,454 < X ≤ 500,000, then Y = 500,000.
If 500,000 < X, then Y = X.
Let f, F, and S refer to the original Pareto, then the cost under Bill # 2 is:
500,000/1.1
∫ (1.1x - 250,000) f(x) dx + (500,000 - 250,000){F(500,000) - F(454,454)} +

250,000/1.1
1,000,000
∫ (x - 250,000) f(x) dx + 750,000 S(1,000,000) =

500,000
454,545
1.1 ∫ x f(x) d - 250,000{F(454,545) - F(227,273)}
227,273
+ 250,000{F(500,000) - F(454,545)} +
1,000,000
∫ x f(x) dx - 250,000{F(1,000,000) - F(500,000)} + 750,000 S(1,000,000) =

500,000
1 1{E[X ∧ 454,545] - 454,545S(454,545) - E[X ∧ 227,273] + 227,273S(227,273)}

+ (250,000){S(454,545) - S(227,273)} - (250,000){S(500,000) - S(454,545)}
E[X ∧ 1 million] - 1,000,000S(1,000,000) - E[X ∧ 500,000] + 500,000S(500,000)
+ 250,000{S(1,000,000) - S(500,000)} + 750,000 S(1,000,000) =
1.1 E[X ∧ 454,545] - 1.1 E[X ∧ 227,273] + E[X ∧ 1 million] - E[X ∧ 500,000] =
(1.1)(156,179) - (1.1)(118,673) + 183,673 - 160,494 = 64,436.
Impact of Bill #2 is: 64,436/59,413 - 1 = 8.5%.
Comment: Based on CAS9, 11/99, Q.40.
31.61. Using the formula for VaR for the Pareto, πp = θ {(1-p)-1/α - 1}.
π 0.05 = θ {0.95-1/α - 1}. π 0.95 = θ {0.05-1/α - 1}.
θ ⎧ ⎛ θ ⎞ α− 1⎫
E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬ , α ≠ 1.
α −1 ⎩ ⎝ θ + x⎠ ⎭
θ ⎧ ⎛ 1 ⎞ α −1⎫ θ
E[X ∧ π0.05] = ⎨1 -
⎝ 0.95 α ⎠
⎬= {1 - 0.951- 1/ α } .
α −1 ⎩ - 1/
⎭ α − 1
θ ⎧ ⎛ 1 ⎞ α −1⎫ θ
E[X ∧ π0.95] = ⎨1 -
⎝ α ⎠
⎬= {1 - 0.051- 1/ α } .
α −1 ⎩ 0.05 - 1/
⎭ α −1
The trimmed mean, the average size of those losses of size between π0.05 and π0.95 is:
{E[X ∧ π 0.95 ] - π 0.95 S(π0.95)} - {E[X ∧ π0.05 ] - π 0.05 S(π0.05 )}
=
F(π 0.95 ) - F(π 0.05)
θ
{ (0.951-1/α - 0.051-1/α) + 0.95 π0.05 - 0.05 π0.95} / 0.9 =
α −1
1
θ{ (0.951-1/α - 0.051-1/α) + 0.951-1/α - 0.051-1/α - 0.9} / 0.9 =
α −1
α (0.951- 1/ α - 0.051- 1/ α )
θ{ - 1}, α ≠ 1.
(0.9) (α -1)
θ
For α = 1, E[X ∧ x] = - θ ln[ ].
θ+x
π 0.05 = θ {0.95-1 - 1} = θ/19. π 0.95 = θ {0.05-1 - 1} = 19θ.
E[X ∧ π0.05] = θ ln(20/19). E[X ∧ π0.95] = θ ln(20).
Therefore, the trimmed mean is: θ {ln(20) - ln(20/19) + 0.95/19 - (0.05)(19)} / 0.9 = 2.2716 θ.
Comment: Here the trimmed mean excludes 5% probability in each tail.
One could instead for example exclude 10% probability in each tail.
Even though we have excluded an equal probability in each tail, for the positively skewed Pareto
Distribution, the trimmed mean is less than the mean.
As α approaches 1, the mean approaches infinity, while the trimmed mean approaches 2.2716 θ.
Here is a graph of the ratio of the trimmed mean to the mean:
Trimmed Mean over Mean
0.8
0.7
0.6
0.5
0.4
0.3
0.2
alpha
2 4 6 8 10
For example, for α = 3, the trimmed mean is 0.384436θ , while the mean is θ/2;
for α = 3 the ratio of the trimmed mean to the mean is 0.768872.
31.62. D. For the Pareto Distribution,

θ ⎧ ⎛ θ ⎞ α− 1⎫ ⎛ 5000 ⎞ 1.6
E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬ = 3125 {1 - }.
α −1 ⎩ ⎝ θ + x⎠ ⎭ ⎝ 5000 + x ⎠
The average payment per loss for the first policy is:
E[X ∧ 3000] - E[X ∧ 2000] = 1651.81 - 1300.91 = 350.90.
The second policy pays nothing for X < 1000, pays 1000 for X > 2000,
and pays X - 1000 for 1000 ≤ X ≤ 2000. This is equivalent to d = 1000 and u = 2000.
The average payment per loss for the second policy is:
E[X ∧ 2000] - E[X ∧ 1000] = 1300.91 - 790.68 = 510.23.
The percentage increase in expected losses is: 510.23/350.90 = 45.4%.
31.63. E. 90 = E[(S-450)+] = E[S] - E[S ∧ 450]. ⇒ E[S ∧ 450] = 500 - 90 = 410.

Therefore, the expected amount by which S is less than 450 is: 450 - E[S ∧ 450] = 40.
Therefore, the expected bonus is (40)(30,000) = 1,200,000.
ln(0.7) + 0.65 ln(0.7) + 0.65

31.64. D. F(0.7) = Φ[ ] = Φ[ ] = Φ[0.49] = 0.6879.
0.6 0.6
E[X] = exp[-0.65 + 0.62 /2] = 0.6250.
⎡ ln(x) - µ - σ2 ⎤ ⎡ ln(x) - µ ⎤
E[X ∧ x] = exp(µ + σ2/2) Φ ⎢ ⎥ + x {1 - Φ ⎢ ⎥⎦ }.
⎣ σ ⎦ ⎣ σ
ln(0.7) + 0.65 - 0.62 ln(0.7) + 0.65

E[X ∧ 0.7] = exp[-0.65 + 0.62 /2] Φ[ ] + (0.7) {1 - Φ[ ]}.
0.6 0.6
= (0.6250) Φ[-0.11] + (0.7)(1 - Φ[0.49]) = (0.6250)(1 - 0.4562) + (0.7)(1 - 0.6879) = 0.5036.

Thus, conditional on the loss ratio being greater than 0.7, the average loss ratio is:
E[X] - {E[X ∧ 0.7] + 0.7 S(0.7)} 0.6250 - {0.5036 + (0.7)(1 - 0.6879)}
= = 1.089.
1 - F(0.7) 1 - 0.6879
31.65. (a) S(10) = (θ/10)1.5.

∞ ∞
wealth of such individuals is: ∫10 x f(x) dx = 10∫ x (1.5)(θ1.5) / x2.5 dx = (1.5/0.5) θ1.5 / 100.5.
⇒ average wealth of such individuals is: {(1.5/0.5) θ1.5 / 100.5} / (θ/10)1.5 = (3)(10) = 30 million.
∞ ∞
α
(b) S(y) = (θ/y)α. wealth of such individuals is: ∫y x f(x) dx = ∫y x α θα / xα+ 1 dx =
α - 1
θα / yα−1.
α α
⇒ average wealth of such individuals is: { θα / yα−1} / (θ/y)α = y.
α - 1 α - 1
Alternately, for the Single Parameter Pareto Distribution, e(x) = x / (α-1).
α
Thus the average size of the data left truncated at y is: y + y / (α-1) = y.
α - 1
⎛ 1 - p⎞ 1/ α
(c) Πp = VaRp (X) = θ (1-p)-1/α. ⇒ Πq / Πp = ⎜ ⎟ .
⎝ 1 - q⎠
1/ α
⎛ 1 - 0.9 ⎞
(d) From part (c), Π99% / Π90% = = 101/α. From part (b), the average wealth of
⎝ 1 - 0.99⎠
each group is proportional to its lower endpoint, the corresponding percentile.

Thus the ratio of the average wealth of the top 1% to the top 10% is 101/α.
However, there are only 1/10 as many individuals in the top 1% as the top 10%.
Thus the ratio of the total wealth of the top 1% to the top 10% is: 101/α / 10 = 0.1(α-1)/α.
α
Alternately, TVaRp (X) = θ(1-p)-1/α. ⇒ TVaR99%(X) / TVaR90%(X) = (0.01/0.1)-1/α = 101/α.
α - 1
Thus the ratio of the average wealth of the top 1% to the top 10% is 101/α. Proceed as before.
Comment: As alpha approaches one, in other words as the distribution of wealth becomes more
α
unequal between individuals, approaches infinity.
α - 1
For example, for α = 1.5, the ratio of the total wealth of the top 1% to the top 10% is:
0.1(α-1)/α = 0.11/3 = 46.4%.

For α = 1.5, in the same way, the top 0.1% owns 46.4% of the total wealth of the top 1%.
If instead α = 2.5, then the ratio of the total wealth of the top 1% to the top 10% is:
0.1(α-1)/α = 0.10.6 = 25.1%.

θ ⎧ ⎛ θ ⎞ α− 1⎫ ⎛ 4000 ⎞ 2
31.66. C. E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬ = (4000/2) {1 - }.
α −1 ⎩ ⎝ θ + x⎠ ⎭ ⎝ 4000 + x ⎠
E[X ∧ 3000] = (2000) {1 - (4/7)2 } = 1346.94.

E[X ∧ 5000] = (2000) {1 - (4/9)2 } = 1604.94.
1604.94/1346.94 - 1 = 19.16%.
Comment: The reinsurerʼs expected payment per ground-up loss goes
from 2000 - 1346.94 = 653.06, to 2000 - 1604.94 = 395.06.
31.67. C. For a loss of size x, the insurer pays 0 if x < 10, and x - 10 if 100 ≥ x ≥ 10.
(There are no losses greater than 100.) The average payment, excluding from the average small
losses on which the insurer makes no payment is:
100 100
∫ (x -10)f(x) dx ∫ (x -10) (0.015 - 0.0001x) dx

10 10
= = 32.4 / 0.855 = 37.9.
100 100
∫10 f(x) dx ∫10 (0.015 - 0.0001x) dx
100 100
Alternately, S(10) =
∫10 f(x) dx = 10∫ (0.015 - 0.0001x) dx = 0.855.
100 100
E[X] = ∫0 x f(x) dx = ∫0 x (0.015 - 0.0001x) dx = 41.67.
10 10
E[X ∧ 10 ] = ∫0 x f(x) dx + 10S(10) = ∫0 x (0.015 - 0.0001x) dx + (10)(0.855)
= 0.72 + 8.55 = 9.27.
Average payment per payment is: (E[X] - E[X ∧ 10 ])/S(10) = (41.67 - 9.27)/0.855 = 37.9.
31.68. C. e(x) = (losses excess of x) / (claims excess of x) = (E[X] - E[X ∧ x]) / S(x).
Therefore, E[X ∧ x] = E[X] - e(x){1 - F(x)}.
31.69. D. 1. False (not true), For small samples, either of the two methods may have smaller
variance. For large samples, the method of maximum likelihood has the smallest variance. 2. True.
This is the definition of the Limited Expected Value. 3. True.
∞ ∞ 10
31.70. E. Expected amount paid per loss = ∫100 x f(x) dx = ∫0 x f(x) dx - ∫0 x f(x) dx =
Mean - {E[X ∧ 100] - 100S(100). S(100) = {θ/(θ+100)}2 = (1000/1100)2 = 0.8264.
E[X ∧ 100] = {θ/(α−1)} {1 - (θ/(θ+100))α−1} = {1000/(2-1)} {1 - (1000/1100)2-1} = 90.90.

Mean = θ/(α-1) = 1000. Therefore, Expected amount paid per loss =
1000 - {90.90 - 82.64} = 991.74. Expect 10 losses per year, so the average cost per year is:
(10)(991.5) = $9915.
Alternately, the expected cost per year of 10 losses is:
∞ ∞
10 ∫100 x f(x) dx = (10)(2)(10002) 100∫ x (1000 + x)- 3 dx =
∞
x= ∞
107 {-x (1000 + x)- 2 } ]
x = 100
+ 107 ∫100 (1000 + x)- 2 dx = 107 {100/11002 + 1/1100} = $9917.
Alternately, the average severity per loss > $ 100 is: 100 + e(100) = 100 + (θ+100)/(α -1)
= 1100 + 100 = $1200. Expected number of losses > $100 = 10S(100) = 8.2645.
Expected annual payment = ($1200) (8.2645) = $9917.
Comment: This is the franchise deductible.
31.71. C. For the LogNormal: E[X ∧ x] = exp(µ + σ2/2) Φ[(lnx - µ - σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}.
E[X ∧ 100,000] = exp(10 + 12/2) Φ[(ln100000 - 10 - 12 )/1] + 100000{1 - Φ[(ln100000 - 10)/1]}

= e10.5Φ(0.51) + 100000{1 - Φ(1.51)} = 36,316(0.6950) + 100,000(1 - 0.9345) = 31,790.
E[X ∧ x] - xS(x) = exp(µ + σ2/2) Φ[(lnx − µ − σ2)/σ]. E[X ∧ 50,000] - 50,000S(50000) =
e10.5Φ[(ln50000 - 10 - 12 )/1] = 36,316 Φ(-0.18) = 36,316(0.4286) = 15,565.
Without the feature that the insurer pays the entire loss (up to $100,000) for each loss greater than
$50,000, the insurer would pay the layer from 50,000 to 100,000, which is
E[X ∧ 100,000] - E[X ∧ 50,000]. That extra provision adds 50,000 per large loss, or
50,000 S(50000). Thus the insurer pays: 50,000S(50000) + E[X ∧ 100,000] - E[X ∧ 50,000] =
E[X ∧ 100,000] - {E[X ∧ 50,000] - 50,000S(50000)} = 31,790 - 15,565 = 16,225.
Alternately, the insurer pays for all dollars of loss in the layer less than $100,000, except it pays
nothing for losses of size less than $50,000. The former is: E[X ∧ 100,000];
the latter is: E[X ∧ 50,000] - 50,000S(50000). Thus the insurer pays:
E[X ∧ 100,000] - {E[X ∧ 50,000] - 50,000S(50000)}. Proceed as above.
Alternately, the insurer pays all dollars for losses greater than $50,000, except it pays nothing in the
layer above $100,000. The former is:
E[X] - {E[X ∧ 50,000] - 50,000(S(50000)}; the latter is: E[X] - E[X ∧ 100,000].
Thus subtracting the two values the insurer pays:
E[X ∧ 100,000] - { E[X ∧ 50,000] - 50,000(S(50000) }. Proceed as above.
Alternately, the insurer pays all dollars for losses greater than $50,000 and less than $100,000, and
pays $100,000 per loss greater than $100,000. The former is:
{E[X ∧ 100,000] - 100,000S(100000)} - {E[X ∧ 50,000] - 50,000S(50000)}; the latter is:
100,000S(100,000). Thus adding the two contributions the insurer pays:
E[X ∧ 100,000] - { E[X ∧ 50,000] - 50,000S(50000)}. Proceed as above.
31.72. E. A loss ratio of 70% corresponds to (.7)(500000) = $350,000 in losses.

If the losses are x, and x < 350,000, then the agent gets a bonus of (1/3)(350,000 - x).
On the other hand, if x ≥ 350,000, then the bonus is zero.
Therefore the expected bonus is:
350,000 350,000 350,000
(1/3) ∫0 (350,000 - x) f(x) dx = (1/3)(350000)
∫0 f(x) dx - (1/3)
∫0 x f(x) dx =
(1/3)(350000)F(350000) - (1/3){E[X ∧ 350000] - 350000S(350000)} =

(1/3){350000 - (E[X ∧ 350000] }.
The distribution of losses is Pareto with α = 3 and θ = 600,000. Therefore,
E[X ∧ 350000] = {θ/(α-1)} {1 - (θ/(θ + 350000))α−1} = (600000/2)(1 - (600/950)2 ) = 180,332.

Therefore, the expected bonus is: (1/3) (350000 - 180,332) = 56,556.
Alternately, the expected amount by which losses are less than y is: y - E[L ∧ y].
Therefore, expected bonus = (1/3)(expected amount by which losses are less than 350000) =
E[(350000 - L)+)]/3 = (1/3)(350000 - E[L ∧ 350000]). Proceed as before.
Alternately, his losses must be less than 350,000 to receive a bonus.
S(350,000) = (600/(350 + 600))3 = .25193 = probability that he receives no bonus.
The mean of the "small" losses (< 350,000) is: {E[L ∧ 350000] - 350000S(350000)}/F(350000)
= (180,332 - (350,000)(.25193))/(1 - 0.25193) = 123,192.
123,192 / 500,000 = 24.638%, is the expected loss ratio when he gets a bonus.
Therefore, the expected bonus when he gets a bonus is: 500,000(70% - 24.638%)/3 = 75,603.
His expected overall bonus is: (1 - 0.25193)(75,603) + (0.25193)(0) = 56,556.
Comment: Note that since if x ≥ 350, 000 the bonus is zero, we only integrate from zero to
350,000. Therefore, it is not the case that E[Bonus] = (1/3)(350,000 - E[X]).
31.73. C. Let total dollars of claims be A. Let B = the Bonus.

Then B = (500-A)/2 if A < 500 and 0 if A ≥ 500. Let y = A if A < 500 and 500 if A ≥ 500.
Then E[y] = E[A ∧ 500]. 2B + y = 500, regardless of A. Therefore 2E[B] + E[y] = 500.
Therefore E[B] = (500 - E[A ∧ 500])/2 = 250 - E[A ∧ 500]/2.
For the Pareto Distribution, E[X] = θ/(α-1), and E[X ∧ x] = {θ/(α-1)}{1- (θ/(x+θ))α−1}.
For the revised model, E[A ∧ 500] = K{1- (K/(500+K))} = 500K / (500 + K).
Thus for the revised model, E[B] = 250 - 250K/(500 + K) = 125,000/(500 + K).
Expected aggregate claims under the revised model are: K/(2-1) = K.
Expected aggregate claims under the previous model are: 500/(2-1) = 500.
So we are given that: K + 125,000/(500 + K) = 500.
500K + K2 + 125000 = 250000 + 500K. K2 = 125000. K = 353.
Comment: The expected amount by which claims are less than 500 is:
E[(500 - A)+)] = 500 - E[A ∧ 500].
31.74. E. A loss ratio of 60% corresponds to: (60%)(800000) = 480,000 in losses.

For the Pareto distribution, E[X ∧ x] = {θ/(α-1)} {1 - (θ/(θ+x))α−1}.
E[X ∧ 480000] = {(500000)/(2-1)} (1 - (500000/(480000 + 500000))2 - 1) = 244,898.
If losses are less than 480,000 a bonus is paid.
Bonus = (15%)(amount by which losses < 480,000).
Expected bonus = (15%)E[(480000 - L)+] =
(15%){480,000 - E[L ∧ 480000]} = (15%)(480000 - 244898) = 35,265.
31.75. B. From the previous solution, his expected bonus is: 35,265.
He gets a bonus when the aggregate losses are less than 480,000.
The probability of this is: F(480,000) = 1 - {500/(500 + 480)}2 = 0.73969.
Expected value of Huntʼs bonus, given that Hunt receives a (positive bonus) is:
35,265/0.73969 = 47,675.
Comment: This question asks about an analog to the expected payment per
(non-zero) payment, while the exam question asks about an analog to the expected payment per
loss. In this question we only consider situations where the bonus is positive, while the exam
question includes those situations where the bonus is zero.
31.76. E. The expected losses within the layer 1,000 to 10,000 is:
10,000 10,000 x = 10,000
∫ S(x) dx =
∫ 106 / (x + 103 )2 dx = -106 / (x + 3
10 ) ]
x = 1000
= 106 (1/2000 - 1/11000) =
1000 1000
1000(1/2 - 1/11).
∞ ∞
x= ∞
E[X] = ∫0 S(x) dx =
∫0 106 / (x + 103 )2 dx = -106 / (x + 103 ) ]
x= 0
= 1000.
Therefore the percent of expected losses within the layer 1,000 to 10,000 is:
1000(1/2 - 1/11)/1000 = 1/2 - 1/11 = 40.9%.
Alternately, this is a Pareto Distribution with α = 2 and θ = 1000.
E[X ∧ x] = {θ/(α-1)} {1 - (θ/(θ + x))α−1} = 1000{1 - 1000/(1000 + x)} = 1000x/(1000 + x).

E[X ∧ 1000] = 500. E[X ∧ 10,000] = 909. E[X] = θ/(α-1) = 1000.
(E[X ∧ 10,000] - E[X ∧ 1000]) / E[X] = (909 - 500)/1000 = 40.9%.
31.77. A. E[X ∧ x] = {θ/(α-1)} {1 - (θ/(θ+x))α−1}. E[X ∧ 400] = 300(1 - 3/7) = 171.43.
100 = E[B] = c(400 - E[X ∧ 400]) = c(400 - 171.43) = c228.57. ⇒ c = 100/228.57 = 0.4375.
31.78. C. At 5100 in loss, the insured pays: 250 + (25%)(2250 - 250) + (5100 - 2250) = 3600.
⇒ For annual losses > 5100, the insured pays 5% of the amount > 5100.
⇒ The insurer pays: 75% of the layer from 250 to 2250, 0% of the layer 2250 to 5100,
and 95% of the layer from 5100 to ∞.
E[X ∧ x] = {θ/(α-1)} {1 - (θ/(θ+x))α−1} = (2000){1 - 2000/(2000 + x)} = 2000x/(2000 + x).
E[X ∧ 250] = (2000)(250) / (2000 + 250) = 222.
E[X ∧ 2250] = (2000)(2250) / (2000 + 2250) = 1059.
E[X ∧ 5100] = (2000)(5100) / (2000 + 5100) = 1437.
E[X] = θ/(α-1) = 2000 / (2 - 1) = 2000.
The expected annual plan payment:
(75%)(E[X ∧ 2250] - E[X ∧ 250]) + (95%)(E[X] - E[X ∧ 5100]) =
(75%)(1059 - 222) + (95%)(2000 - 1437) = 1163.
Comment: Provisions are similar to those in the 2006 Medicare Prescription Drug Program.
Here is a detailed breakdown of the layers of loss:
Layer Expected Losses in Layer Insured Share Insurer Share
5100 to ∞ 563 5% 95%
2250 to 5100 378 100% 0%
250 to 2250 837 25% 75%
0 to 250 222 100% 0%
Total 2000
E[X] - E[X ∧ 5100] = 2000 - 1437 = 563. E[X ∧ 5100] - E[X ∧ 2250] = 1437 - 1059 = 378.
E[X ∧ 2250] - E[X ∧ 250] = 1059 - 222 = 837. E[X ∧ 250] = 222.
For example, for an annual loss of 1000, insured pays: 250 + (25%)(1000 - 250) = 437.5,
and insurer pays: (75%)(1000 - 250) = 562.5.
For an annual loss of 4000, insured pays: 250 + (25%)(2250 - 250) + (4000 - 2250) = 2500, and
insurer pays: (75%)(2250 - 250) = 1500.
For an annual loss of 8000, insured pays: 250 + (25%)(2250 - 250) + (5100 - 2250) +
(5%)(8000 - 5100) = 3745, and insurer pays: (75%)(2250 - 250) + (95%)(8000 - 5100) = 4255.
31.79. C. For this Pareto,

E[L ∧ 650000] = {600000/(3 - 1)}{1 - (600000/(650000 + 600000)2 } = 230,880.
E[(650,000 - L)+] = 650,000 - E[L ∧ 650000] = 650,000 - 230,880 = 419,120.
E[Bonus] = E[(650,000 - L)+/3] = 419,120/3 = 139,707.
31.80. C. A constant force of mortality is an Exponential Distribution.

Variance = θ2 = 100. ⇒ θ = 10.
E[T ∧ 10] = θ(1 - e-10/θ) = (10)(1 - e-1) = 6.32.
31.81. B. Teller X completes on average 6 transactions per hour, while Teller Y completes on
average 4 transactions per hour. (6)(6 + 4) = 60 transactions by tellers expected in total.
1/3 of all transactions are deposits, and therefore we expect 20 deposits.
Expected number of deposits handled by tellers: 20 F(7500).
Average size of those deposit of size less than 7500 is:
{E[X ∧ 7500] - 7500S(7500)} / F(7500).
Expected total deposits made through the tellers each day:
20{E[X ∧ 7500] - 7500S(7500)7500]} = 20{(5000/2)(1 - (5/12.5)2 ) - 7500(5/12.5)3 } =
(20){2100 - (7500)(0.064)} = 32,400.
Comment: While the above is the intended solution of the CAS, it is not what I would have done to
solve this poorly worded exam question.
Let y be total number of deposits expected per day.
Then we expect S(7500)y deposits to be handled by the manager, and F(7500)y deposits to be
handled by the tellers. Expect 60 - F(7500)y non-deposits to be handled by the tellers.
1/3 of all transactions are deposits, presumably including those handled by the manager.
{60 + S(7500)y}/3 = y. ⇒ y = 60/{3 - S(7500)}.
Expected number of deposits handled by tellers: F(7500)y = F(7500) 60/{3 - S(7500)}.
Multiply by the average size of those deposit of size less than 7500:
(F(7500) 60/{3 - S(7500)}) {E[X ∧ 7500] - 7500S(7500)}/F(7500) =
60{E[X ∧ 7500] - 7500S(7500)}/(3 - S(7500))
= (60){2100 - (7500)(0.064)}/(3 - 0.064) = 33,106.
Resulting in a different answer than the intended solution.
31.82. B. 3%/75% = 4%. If the index return is less then 4%, then the depositor gets 3%.
Thus we want: E[Max[.75 X, 3%]] = 75% E[max[X, 4%]].
∞ ∞ 4
E[max[X, 4%]] = 4F(4) + ∫4 xf(x)dx = 4F(4) + ∫0 xf(x)dx - ∫0 xf(x)dx
= 4F(4) + E[X] - {E[X ∧ 4] - 4S(4)} = 4 + E[X] - E[X ∧ 4] = 4 + 8 - (-0.58) = 12.58.
75% E[max[X, 4%]] = (75%)(12.58%) = 9.43%.
Alternately, let Y = max[X, 4]. Then Y - 4 = 0 if X ≤ 4, and Y - 4 = X - 4 if X > 4.
⇒ E[Y] = 4 + E[X] - E[X ∧ 4] = 4 + 8 - (-0.58) = 12.58. (75%)(12.58%) = 9.43%.

Alternately, as discussed in “Mahlerʼs Guide to Risk Measures” for the Normal Distribution:
TVaRp [X] = µ + σ φ[zp ] / (1 - p). We are interested in the tail value at risk for a 4% interest rate.
For the Normal with mean 8% and standard deviation 16%, 4% corresponds to:
zp = (4% - 8%) / 16% = -0.25. ⇒ p = 1 - 0.5987 = 0.4013.
Therefore, TVaR = 0.08 + 0.16 {exp[-(-0.25)2 /2] / 2 π } / 0.5987 = 0.1833.

Now 40.13% of the time the return on the equity index is less than 4%, while the remaining 59.87%
of the time the return is greater than 4%.
Therefore, the expected annual crediting rate is:
(75%) {(40.13%)(4%) + (59.87%)(0.1833)} = 9.43%.
Given the table of limited expected values, this alternate solution is harder.
Comment: In general, Min[X, 4] + Max[X, 4] = X + 4.
Therefore, E[Max[X, 4]] = E[X] + 4 - E[X ∧ 4].
31.83. C. Let X be such that Michael just has paid 10,000 in out-of-pocket repair costs:
10000 = 1000 + (20%)(6000 - 1000) + (X - 6000). ⇒ X = 14,000.
Thus the insurance pays 80% of the layer from 1000 to 6000, plus 90% of the layer above 14,000.
For this Pareto Distribution, E[X ∧ x] = 5000{1 - 5000/(5000 + x)} = 5000x/(5000 + x).
E[X ∧ 1000] = 833. E[X ∧ 6000] = 2727. E[X ∧ 14000] = 3684. E[X] = θ/(α-1) = 5000.
Expected annual payment by the insurer is:
80%(E[X ∧ 6000] - E[X ∧ 1000]) + 90%(E[X] - E[X ∧ 14000]) =
80%(2727 - 833) + 90%(5000 - 3684) = 2700.
Comment: Similar to SOA3, 11/04, Q.7.
Here is a detailed breakdown of the layers of loss:
Layer Expected Losses in Layer Michaelʼs Share Insurer Share
14,000 to ∞ 5000 - 3684 = 1316 10% 90%
6000 to 14,000 3684 - 2727 = 957 100% 0%
1000 to 6000 2727 - 833 = 1894 20% 80%
0 to 1000 833 100% 0%
Total 5000
2016-C-2, Loss Distributions, §32 Limited Higher Moments HCM 10/21/15, Page 495
Section 32, Limited Higher Moments
One can get limited higher moments in a manner parallel to the limited expected value.
Just as the limited expected value at u, E[X ∧ u] , is the first moment of data limited to u,
the limited second moment, E[(X ∧ u)2 ], is the second moment of the data limited to u.
First limit the losses, then square, then take the expected value.
Exercise: Prob[X = 2] = 70%, and Prob[X = 9] = 30%. Determine E[X ∧ 5] and E[(X ∧ 5)2 ].
[Solution: E[X ∧ 5] = (70%)(2) + (30%)(5) = 2.9. E[(X ∧ 5)2 ] = (70%)(22 ) + (30%)(52 ) = 10.3.
Comment: Var[X ∧ 5] = 10.3 - 2.92 = 1.89.]
As with the limited expected value, one can write the limited second moment as a contribution of
small losses plus a contribution of large losses:
u
E[(X ∧ u)2 ] = ∫0 t2 f(t) dt + S(u) u2.
The losses of size larger than u, each contribute u2 , while the losses of size u or less, each contribute
their size squared. E[(X ∧ u)2 ] can be computed by integration in the same manner as the moments
and Limited Expected Values. As shown in Appendix A attached to the exam, here are the
formulas for the limited higher moments for some distributions:225
Distribution E[(X ∧ x)n ]
Exponential n! θn Γ(n+1 ; x/θ) + xn e-x/θ
Pareto {n! θn Γ(α−n) / Γ(α)} β[n+1, α−n ; x/(θ+x)] + xn (θ/(θ+x))α
Gamma {θn Γ(α+n) Γ(α+n; x/θ) / Γ(α)} + xn {1- Γ(α; x/θ) }
⎡ ln(x) − µ − n σ2 ⎤ n {1- Φ ⎡
ln(x) − µ ⎤
LogNormal exp[nµ +.5 n2 σ2] Φ ⎢ ⎥⎦ + x ⎢⎣ ⎥⎦ }
⎣ σ σ
Weibull θn Γ(1 + n/τ) Γ(1 +n/τ ; (x/θ)τ) + xn exp[-(x/θ)τ]
α θn n θα
Single Parameter Pareto - , x ≥ θ.
α - n (α - n) x α - n
225
The formula for the limited moments of the Pareto involving Incomplete Beta Functions, reduces to the formula
shown subsequently for n=2. However, it requires integration by parts and a lot of algebraic manipulation.
One obtains the Limited Expected Value by setting n = 1, while one obtains the limited second
moment for n = 2.226
Distribution E[(X ∧ x)2 ]
Exponential 2θ2 - 2θ2e-x/θ - 2θxe-x/θ
α θ2 2 θα
Single Parameter Pareto - , x ≥ θ.
α - 2 (α - 2) x α - 2
2 θ2
Pareto {1 - (1 + x / θ)1 - α (1 + (α -1)x / θ)}
(α − 1) (α − 2)
LogNormal exp[2µ + 2σ2] Φ [ ln(x) − σµ − 2σ2

] + x2 {1 - Φ[ ln(x)σ − µ]}
Exercise: For a LogNormal Distribution with µ = 7 and σ = 0.5, what is the E[(X ∧ 1000)2 ]?
[Solution: E[(X ∧ 1000)2 ] = e14.5 Φ[{ln(1000) - 7.5} / 0.5] + 10002 {1 - Φ[{ln(1000) - 7} / 0.5] }
= 1,982,759 Φ[-1.184] + 1,000,000{1 - Φ[-0.184]}
= 1,982,759 (0.1182) + 1,000,000(1 - 0.4270) = 807,362.]
Generally, E[(X ∧ u)2 ] is less than E[X2 ]. For low censorship points u or more skewed distributions
the difference can be quite substantial. For example, in the above exercise,
E[X2 ] = exp[2µ + 2σ2] = e14.5 = 1,982,759,
while E[(X ∧ 1000)2 ] = 807,362.
Gamma Distribution:
For the Gamma Distribution: E[(X ∧ x)2 ] = θ2 α(α+1) Γ(α+2; x/θ)+ x2 {1- Γ(α; x/θ)}.
Using Theorem A.1 in Appendix A of Loss Models,
Γ(3; x/θ) = 1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2 e-x/θ /2. Also Γ(1; x/θ) = 1 - e-x/θ.
Thus for the Exponential, which is a Gamma for α = 1, E[(X ∧ x)2 ] = 2θ2 Γ(3; x/θ) + x2 {1 - Γ(α; x/θ)} =
2θ2{1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2 e-x/θ /2} + x2 e-x/θ = 2θ2 - 2θ2e-x/θ - 2θxe-x/θ.
226
The limited second moments of a Exponential and Pareto are not shown in Loss Models in these forms, but as
shown below these formulas are correct.
Exercise: For an Exponential Distribution with θ = 10, what is the E[(X ∧ 30)2 ]?
[Solution: E[(X ∧ x)2 ] = 2θ2 - 2θ2e-x/θ - 2θxe-x/θ. E[(X ∧ 30)2 ] = 200 - 200e-3 - 600e-3 = 160.2.]
Second Limited Moment in Terms of Other Quantities of Interest:
It is sometimes useful to write E[(X ∧ u)2 ] in terms of the Survival Function, the Excess Ratio R(x) or
the Loss Elimination Ratio LER(x) as follows. Using integration by parts and the fact that the integral
of f(x) is -S(x):
u u u
t= u
E[(X ∧ u)2 ] = ∫0 t2 f(t) dt + S(u) u2 = -S(t) t ]= 0 + ∫0 S(t) 2t dt + S(u)u2 = ∫0 S(t) 2t dt .
t 2
In particular, for u = ∞, one can write the second moment as twice the integral of the survival function
times x:227
∞
E[X2 ] = 2 ∫0 S(t) t dt.
u
More generally, E[(X ∧ u)n ] = ∫0 n tn - 1S(t) dt .
228
∞
For u = ∞, E[Xn ] = ∫0 n tn - 1S(t) dt .
Using integration by parts and the fact that the integral of S(t) is LER(x) µ :
∞ u
t =u
E[(X ∧ u)2 ] = 2 ∫0 S(t) t dt = 2xµ LER(x)t ]= 0 - 2µ ∫0 LER(t) dt .
227
See formula 3.5.3 in Actuarial Mathematics. Recall that the mean can be written as an integral of the survival
function. One can proceed in the same manner to get higher moments in terms of integrals of the survival function
times a power of x.
228
The form shown here is true for distributions with support x > 0. More generally, the nth limited moment is the
sum of an integral from -∞ to 0 of - n tn-1F(t) and an integral from 0 to L of n tn-1S(t).
See Equation 3.9 in Loss Models.
u u
E[(X ∧ u)2 ] = 2µ {LER(u)u - ∫0 LER(t) dt } = 2µ { ∫0 R(t) dt - R(u)u}.
So for example for the Pareto distribution: µ = θ / (α-1), R(x) = {θ/(θ+x)}α−1.

u t= u
∫0 R(t) dt = -θα - 1 / {(α - 2)(θ+ t)α - 2 } ] = {θα−1/(α−2)}{θ2−α - (θ+u)2−α}.

t= 0
E[(X ∧ u)2 ] = {2 θα / (α−1)(α−2)}{θ2−α - (θ+u)2−α − (α−2)u(θ+u)1−α}.
E[(X ∧ u)2 ] = {2 θ2 / (α−1)(α−2)}{1 - (1 + u/ θ)2−α - (α−2)(u/ θ)(1 + u/ θ)1−α}.
E[(X ∧ u)2 ] = E[X2 ] {1 - (1 + u/ θ)1−α[1 + (α−1)u/ θ]}.
∞
Letting u go to infinity, it follows that: E[X2 ] = 2 E[X] ∫0 R(x) dx .
∞
E[X2 ]
∫0
⇒ R(x) dx =
2 E[X]
.
expected excess losses

∫ S(y) dy
Now the excess ratio R(x) is: = x .
mean E[X]
∞ ∞ ∞
E[X2 ]
Therefore, ∫ ∫ S(y) dy dx = E[X] ∫ R(x) dx = E[X]
2 E[X]
= E[X2 ]/2.
x=0 y= x 0
⎛ ∞ ∞⎞2
∫ ∫
⎜ f(x)
I will use the above result to show that the variance is equal to S(y) dy⎟ dx .
⎜ ⎟ S(x)2
x=0 ⎝ y = x ⎠
⎛ ∞ ⎞2
∫
⎜ f(x)
Using integration by parts, let u = S(y) dy⎟ and dv = dx.
⎜ ⎟ S(x)2
⎝ y =x ⎠
∞
du = 2 ∫ S(y) dy (-S(x)). v = 1/S(x).
y =x
x =∞
⎛ ∞
∞ ⎞2 ⎛ ∞ ⎞2 ⎤ ∞ ∞
⎥
∫ ∫ ∫ ∫x=0 y=∫x S(y) dy dx .
⎜ f(x) 1
Therefore, S(y) dy⎟ dx = ⎜ S(y) dy⎟ +2
⎜ ⎟ S(x) 2 ⎜ ⎟ S(x) ⎥
x=0 ⎝ y = x ⎠ ⎝ y =x ⎠ ⎥⎦
x =0
∞
Now as was shown previously, E[X2 ] = 2
∫0 t S(t) dt .
∞
Therefore, if there is a finite second moment, ∫0 t S(t) dt must be finite.
If in the extreme righthand tail S(t) ~ 1/t2 , then this integral would be infinite.
Therefore, in the extreme right hand tail, S(t) must go down faster than 1/t2 .
∞
If in the extreme righthand tail S(t) ~ 1/t2+ε , then
∫ S(y) dy / S(x) ~ (1/x1+ε) x1+ε/2 = 1/xε/2.
y =x
Therefore, if in the extreme right hand tail, S(t) goes down faster than 1/t2 ,
∞
then ∫ S(y) dy / S(x) approaches zero as x approaches infinity.
y =x
⎛ ∞ ⎞2
∫
1
Therefore, as x approaches infinity, ⎜ S(y) dy⎟ approaches zero.
⎜ ⎟ S(x)
⎝ y =x ⎠
∞ ⎛ ∞ ⎞2
∫ ∫
⎜ f(x)
Thus, S(y) dy⎟ dx = -E[X]2 / S(0) + 2 E[X2 ]/2 = E[X]2 - E[X]2 = Var[X].
⎜ ⎟ S(x)2
x=0 ⎝ y = x ⎠
Pareto Distribution:
∞
E[X2 ] 229
As discussed previously, ∫0 R(x) dx =
2 E[X]
.
For example, for the Pareto Distribution, R(x) = {θ/(θ+x)}α−1.

∞ ∞
2 θ2 / {(α -1)(α - 2)} E[X2 ]
∫0 R(x) dx = θα−1
∫0 (θ + x)α - 1 dx = θα−1 θα−2/(α - 2) = θ / (α-2) =
2 θ / (α - 1)
=
2 E[X]
.
As discussed previously, for the Pareto Distribution,

E[(X ∧ x)2 ] = E[X2 ] {1 - (1 + x / θ)1 - α (1 + (α - 1)x / θ)} .230
Exercise: For a Pareto with α = 4 and θ = 1000, compute E[X2 ], E[(X ∧ 500)2 ] and
E[(X ∧ 5000)2 ].
2 θ2
[Solution: E[X2 ] = = 333,333,
(α − 1) (α − 2)
E[(X ∧ 500)2 ]= E[X2 ] {1 - (1 + 500/θ)1−α(1 + (α−1)500/ θ)} = 86,420, and
E[(X ∧ 5000)2 ] = E[X2 ] {1 - (1 + 5000/θ)1−α(1 + (α−1)5000/ θ)} = 308,642.]
The limited higher moments can also be used to calculate the variance, coefficient of variation, and
skewness of losses subject to a maximum covered loss.
Exercise: For a Pareto Distribution with α = 4 and θ = 1000, and for a maximum covered loss of
5000, compute the variance and coefficient of variation (per single loss.)
[Solution: From a previous solutions E[X ∧ 5000] = 331.79 and E[(X ∧ 5000)2 ] = 308,642.
Thus the variance is: 308,642 - 331.792 = 198,557. Thus the CV is: 198,557 / 331.79 = 1.34.]
229
Where R(x) is the excess ratio, and the distribution has support starting at zero.
230
While this formula was derived above, it is not in the Appendix attached to the exam.
Second Moment of a Layer of Loss:
Also, one can use the Limited Second Moment to calculate the second moment of a layer of loss.231
The second moment of the layer from d to u is:232
E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d {E[X ∧ u] - E[X ∧ d]}.
Exercise: For a Pareto with α = 4 and θ = 1000, compute the second moment of the layer
from 500 to 5000.
[Solution: From the solutions to previous exercises, E[X ∧ 500] = 234.57, E[X ∧ 5000] = 331.79,
E[(X ∧ 500)2 ] = 86,420, and E[(X ∧ 5000)2 ] = 308,642.
The second moment of the layer from 500 to 5000 is:
E[(X ∧ 5000)2 ] - E[(X ∧ 500)2 ] - 2(500){E[X ∧ 5000] - E[X ∧ 500]} =
308,642 - 86,420 - (1000)(331.79 - 234.57) = 125,002.
Comment: Note this is the second moment per loss, including those losses that do not penetrate
the layer, in the same way that E[X ∧ 5000] - E[X ∧ 500] is the first moment of the layer per loss.]
With no maximum covered loss, in other words with u = ∞ in the above formula, then:
E[(X-d)+2 ] = E[X2 ] - E[(X ∧ d)2 ] - 2d {E[X] - E[X ∧ d]}.
Exercise: For a Pareto with α = 4 and θ = 1000, compute the second moment of (X - 500)+.
[Solution: E[X2 ] - E[(X ∧ 500)2 ] - (2)(500) {E[X] - E[X ∧ 500]} =

333,333 - 86,420 - (1000)(333.33 - 234.57) = 148,153.
Comment: Recall that (X - 500)+ includes the zero payments.]
Exercise: For a Pareto with α = 4 and θ = 1000, compute the second moment of (X - 5000)+.
[Solution: E[X2 ] - E[(X ∧ 5000)2 ] - (2)(5000) {E[X] - E[X ∧ 5000]} =
333,333 - 308,642 - (10,000)(333.33 - 331.79) = 9291.]
With instead no deductible, in other words with d = 0 in the above formula, then the second
moment of the layer from 0 to u is E[(X ∧ u)2 ], as it should be.
231
Recall that the expected value of a Layer of Loss is the difference of the Limited Expected Value at the top of the
layer and the Limited Expected Value at the bottom of the layer. For second and higher moments the relationships
are more complicated.
232
See Theorem 8.8 in Loss Models. Here we are referring to the second moment of the per loss variable; similar to
the average payment per loss, we are including those times a small loss contributes zero to the layer.
Derivation of the Second Moment of a Layer of Loss:
For the layer from d to u, the medium size losses contribute x - d, while the large losses contribute
the width of the interval u - d.
Therefore, the second moment of the layer from d to u is:
u u
∫d (x - d)2 f(x) dx + (u - d)2S(u) = ∫d (x2 - 2dx + d2) f(x) dx + u2S(u) - 2duS(u) + d2S(u) =
u d u
∫0 x2 f(x) dx + u2S(u) - ∫0 x2 f(x) dx - d2S(d) + d2S(d) - 2d ∫d x f(x) dx

+ d2 {F(u) - F(d)} - 2duS(u) + d2 S(u) =
E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - uS(u) - (E[X ∧ d] - dS(d))}

+ d2 {S(d) + F(u) - F(d)} - 2duS(u) + d2 S(u) =
E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - (E[X ∧ d]}
+ d{2uS(u) - 2dS(d) + dS(d) + dF(u) - dF(d) -2uS(u) + dS(u)} =
E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - E[X ∧ d]} + d{d(F(u) + S(u)) - d(F(d) + S(d)} =
E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - E[X ∧ d]} + d(d - d) =
E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - E[X ∧ d]}.
Variance of a Layer of Loss:
Given the first and second moments layer of loss, one can compute the variance and the coefficient
of variation of a layer of loss.
Exercise: For a Pareto with α = 4 and θ = 1000, compute the variance of the losses in the layer from
500 to 5000.
[Solution: From the previous exercise, the second moment is 125,002 and the mean is
E[X ∧ 5000] - E[X ∧ 500] = 331.79 - 234.57 = 97.22.
The variance of the layer from 500 to 5000 is: 125,002 - 97.222 = 115,550.]
Exercise: For a Pareto with α = 4 and θ = 1000, compute the coefficient of variation of the losses in
the layer from 500 to 5000.
[Solution: From the previous exercise, the variance of the layer from 500 to 5000 is:
125,002 - 97.222 = 115,550 and the mean is 97.22. Thus the CV = 115,5500.5 / 97.22 = 3.5.]
Using the formulas for the first and seconds moments of a layer of loss,
the variance of the layer of losses from d to u is:
E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d {E[X ∧ u] - E[X ∧ d]} - {E[X ∧ u] - E[X ∧ d]}2 .
Since the average payment per loss under a maximum covered loss of u and a deductible of d is
the layer from d to u, this provides a formula for the variance of the average payment per loss under
a maximum covered loss of u and a deductible of d.
Exercise: Assume losses are given by a LogNormal Distribution with µ = 8 and σ = 0.7.
An insured has a deductible of 1000, and a maximum covered loss of 10,000.
What is the expected amount paid per loss?
[Solution: For the LogNormal Distribution:
E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx - µ - σ2)/σ] + x {1 - Φ[(lnx - µ)/σ]}.
E[X ∧ 1000] = e8.245 Φ[{ln(1000) - 8 - 0.49} / 0.7] + 1000{1 - Φ[{ln(1000) - 8} / 0.7] }

= 3808.54 Φ[-2.260] + 1000{1 - Φ[-1.560]}
= 3808.54 (0.0119) + 1000(1 - 0.0594) = 986.
E[X ∧ 10000] = e8.245 Φ[{ln(10000) - 8 - 0.49} / 0.7] + 10,000{1 - Φ[{ln(10000) - 8} / 0.7] }
= 3808.54 Φ[1.029] + 10000{1 - Φ[1.729]} = 3808.54(0.8483) + 10000(1 - 0.9581) = 3650.
E[X ∧ 10000] - E[X ∧ 1000] = 3650 - 986 = 2664.]
An insured has a deductible of 1000, and a maximum covered loss of 10,000.
What is the variance of the amount paid per loss?
E[(X ∧ x)2 ] = exp[2µ + 2σ2]Φ[{ln(x) - (µ + 2σ2)} / σ] + x2 {1 - Φ[{ln(x) - µ} / σ]}.
E[(X ∧ 1000)2 ] = e16.98 Φ[{ln(1000) - 8.98} / 0.7] + 10002 {1 - Φ[{ln(1000) - 8} / 0.7] }

= 23,676,652 Φ[-2.960] + 1,000,000{1 - Φ[-1.560]}
= 23,676,652 (0.0015) + 1,000,000(1 - 0.0594) = 976,115.
E[(X ∧ 10000)2 ] = e16.98 Φ[{ln(10000) - 8.98} / 0.7] + 100002 {1 - Φ[{ln(10000) - 8} / 0.7] }
= 23,676,652 Φ[0.329] + 100,000,000{1 - Φ[1.729]}
= 23,676,652 (0.6289) + 100,000,000(1 - 0.9581) = 19,080,246.
E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - (E[X ∧ d]} - {E[X ∧ u] - (E[X ∧ d]}2
= 19,080,246 - 976,115 - (2000)(2664) - 26642 = 5.68 million.
Comment: It would take too long to perform all of these computations for a single exam question!]
If one has a coinsurance factor of c, then each payment is multiplied by c, therefore the variance is
multiplied by c2 .
An insured has a deductible of 1000, a maximum covered loss of 10,000, and a coinsurance factor
of 80%. What is the variance of the amount paid per loss?
[Solution: (0.82 )(5.68 million) = 3.64 million.]
Variance of Non-Zero Payments:
Exercise: For a Pareto with α = 4 and θ = 1000, compute the average non-zero payment given a
deductible of 500 and a maximum covered loss of 5000.
[Solution: (E[X ∧ 5000] - E[X ∧ 500]) / S(500) = (331.79 - 234.57) / 0.1975 = 492.2. ]
One can compute the second moment of the non-zero payments in a manner similar to the second
moment of the payments per loss. Given a deductible of d and a maximum covered loss of u, the
2nd moment of the non-zero payments is:233
u
∫d (x - d)2 {f(x) / S(d)} dx + (u - d)2S(u)/S(d) = (2nd moment of the payments per loss) / S(d) =
{E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d(E[X ∧ u] - E[X ∧ d])} / S(d).
So just as with the first moment, the second moment of the non-zero payments has S(d) in the
denominator. If one has a coinsurance factor of c, then the second moment is multiplied by c2 .
Exercise: For a Pareto with α = 4 and θ = 1000, compute the second moment of the non-zero
payments given a deductible of 500 and a maximum covered loss of 5000.
[Solution: {E[(X ∧ 5000)2 ] - E[(X ∧ 500)2 ] - 2(500){E[X ∧ 5000] - E[X ∧ 500]}}/S(500) =
{308,642- 86,420 - (1000)(331.79 - 234.57)} / (1000/1500)4 = 125,002/0.1975309 = 632,823.]
Thus given a deductible of d and a maximum covered loss of u the variance of the non-zero
payments is: {E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d(E[X ∧ u] - E[X ∧ d])} / S(d) - {(E[X ∧ u] - E[X ∧ d]) / S(d)}2 .
Exercise: For a Pareto with α = 4 and θ = 1000, compute the variance of the non-zero payments
given a deductible of 500 and a maximum covered loss of 5000.
[Solution: From the solutions to previous exercises, variance = 632,823 - 492.22 = 390,562.]
If one has a coinsurance factor of c, then each payment is multiplied by c, therefore the variance is
multiplied by c2 .
Exercise: For a Pareto with α = 4 and θ = 1000, compute the variance of the non-zero payments
given a deductible of 500, a maximum covered loss of 5000, and a coinsurance factor of 85%.
[Solution: (0.852 )(390,562) = 282,181.]
233
Note that the density is f(x)/S(d) from d to u, and has a point mass of S(u)/S(d) at u.
Exponential Distribution:
As shown in Appendix A of Loss Models attached to the exam, for the Exponential Distribution:
E[(X ∧ x)n ] = n! θn Γ(n+1 ; x/θ) + xn e-x/θ.
However, the Incomplete Gamma for positive integer alpha is:234

α-1
∑
xi e- x
Γ(α ; x) = 1 - .
i!
i=0
Thus, Γ(3 ; x/θ) = 1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2 e-x/θ/2.
Therefore, for the Exponential, E[(X ∧ x)2 ] = 2! θ2 Γ(2+1 ; x/θ) + x2 e-x/θ =
(2)(θ2){1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2 e-x/θ/2} + x2 e-x/θ = 2θ2 - 2θ2e-x/θ - 2θxe-x/θ.
Exercise: What is the variance of the payment per loss, YL , for an Exponential with deductible d?
[Solution: The first moment is: E[X] - E[X ∧ d] = θ e-d/θ.
Second Moment is: E[X2 ] - E[(X ∧ d)2 ] - 2 d {E[X] - E[X ∧ d]} =
2θ2 - {2θ2 - 2θ2e-d/θ - 2θde-dx/θ} - (2d)(θ - θ(1 - e-d/θ)} = 2θ2e-d/θ.
Thus Var[YL ] = 2θ2e-d/θ - (θ e-d/θ)2 = θ2 (2e-d/θ - e-2d/θ).

Alternately, the payment per payment variable, YP, is also Exponential with mean θ.
E[YL ] = E[YP] S(d) = θ e-d/θ.
Second Moment of YL is: (Second moment of YP) S(d) = 2θ2e-d/θ. Proceed as before.]
Exercise: Losses follow an Exponential Distribution with θ = 200.

For a deductible of 100, what is the variance of the payment per loss variable, YL ?
[Solution: θ2 (2e-d/θ - e-2d/θ) = 2002 (2 e-0.5 - e-1) = 33,807.]
234
See “Mahlerʼs Guide to Frequency Distributions.”
Problems:
Assume the unlimited losses follow a LogNormal Distribution with parameters µ = 10 and σ = 1.5.
Assume an average of 10,000 losses per year. For simplicity, ignore any variation in costs due to
variations in the number of losses per year.
32.1 (2 points) What is the coefficient of variation of the total cost expected per year?
A. less than 0.014
E. at least 0.026
32.2 (3 points) If the insurer pays no more than $250,000 per loss,
what is the coefficient of variation of the insurerʼs total cost expected per year?
A. less than 0.014
E. at least 0.026
32.3 (3 points) If the insurer pays no more than $1,000,000 per loss,
A. less than 0.014
E. at least 0.026
32.4 (1 point) If the insurer pays the layer from $250,000 to $1 million per loss,
what are the insurerʼs total costs expected per year?
32.5 (3 points) If the insurer pays the layer from $250,000 to $1 million per loss,
A. less than 0.03
E. at least 0.09
32.6 (3 points) What is the coefficient of skewness of the total cost expected per year?
A. less than 0.15
E. at least 0.30
32.7 (3 points) If the insurer pays no more than $250,000 per loss, what is the coefficient of
skewness of the insurerʼs total cost expected per year?
A. less than 0.015
E. at least 0.030
Use the following information for the next 7 questions

Losses follow an Exponential Distribution with θ = 10,000.
32.8 (1 point) What is the variance of losses?

32.9 (2 points) Assuming a 25,000 policy limit, what is the variance of payments by the insurer?
32.10 (3 points) Assuming a 1000 deductible (with no maximum covered loss),

what is the variance of the payments per loss?
32.11 (2 points) Assuming a 1000 deductible (with no maximum covered loss),

what is the variance of non-zero payments by the insurer?
32.12 (3 points) Assuming a 1000 deductible and a 25,000 maximum covered loss,
what is the variance of the payments per loss?
32.13 (3 points) Assuming a 1000 deductible and a 25,000 maximum covered loss,
what is the variance of the non-zero payments by the insurer?
32.14 (2 points) Assuming a 75% coinsurance factor, a 1000 deductible and a 25,000 maximum
covered loss, what is the variance of the insurerʼs payments per loss?
32.15 (2 points) Let X be the result of rolling a fair six-sided die, with the numbers 1 through 6 on its
faces. Calculate Var(X ∧ 4).
(A) 1.1 (B) 1.2 (C) 1.3 (D) 1.4 (E) 1.5
32.16 (3 points) The size of loss distribution has the following characteristics:
(i) E[X] = 245.
(ii) E[X ∧ 100] = 85.
(iii) S(100) = 0.65.
(iv) E[X2 | X > 100] = 250,000.
There is an ordinary deductible of 100 per loss.
Calculate the second moment of the payment per loss.
(A) 116,000 (B) 118,000 (C)120,000 (D) 122,000 (E) 124,000

• Losses follow a LogNormal Distribution with parameters µ = 9.7 and σ = 0.8.
• The insured has a deductible of 10,000, maximum covered loss of 50,000, and a
coinsurance factor of 90%.
32.17 (3 points) What is the average payment per loss?

A. less than 7,000
E. at least 10,000
32.18 (2 points) What is E[(X ∧ 50,000)2 ]?

32.19 (2 points) What is E[(X ∧ 10,000)2 ]?

32.20 (2 points) What is the variance of the payment per loss?


0-25 30
25-50 32
50-100 20
100-200 8
Estimate E[(X ∧ 80)2 ].
(A) 2300 (B) 2400 (C) 2500 (D) 2600 (E) 2700

• Losses are uniform from 0 to 40.
32.22 (1 point) What is E[X ∧ 10]?

A. 8.5 B. 8.75 C. 9.0 D. 9.25 E. 9.5
32.23 (1 point) What is E[X ∧ 25]?

A. 15 B. 16 C. 17 D. 18 E. 19
32.24 (2 points) What is E[(X ∧ 10)2 ]?

A. 79 B. 80 C. 81 D. 82 E. 83
32.25 (2 points) What is E[(X ∧ 25)2 ]?

A. 360 B. 365 C. 370 D. 375 E. 380
32.26 (2 points) What is the variance of the layer of loss from 10 to 25?
A. 37 B. 39 C. 41 D. 43 E. 45
Assume the following discrete size of loss distribution:

10 50%
50 30%
100 20%
32.27 (2 points) What is the coefficient of variation of this size of loss distribution?
A. less than 0.6
E. at least 1.2
32.28 (3 points) What is the coefficient of skewness of this size of loss distribution?
A. less than 0
E. at least 0.6
32.29 (2 points) If the insurer pays no more than 25 per loss,

what is the coefficient of variation of the distribution of the size of payments?
A. less than 0.6
E. at least 1.2

what is the coefficient of variation of the distribution of the size of payments?
A. less than 0.6
E. at least 1.2

what is the coefficient of skewness of the distribution of the size of payments?
A. less than 0
E. at least 0.6
32.32 (1 point) If the insurer pays the layer from 30 to 70 per loss,
what is the insurerʼs expected payment per loss?
A. 10 B. 12 C. 14 D. 16 E. 18
32.33 (2 points) If the insurer pays the layer from 30 to 70 per loss,
what is the coefficient of variation of the insurerʼs payments per loss?
A. less than 0.6
E. at least 1.2
32.34 (2 points) X follows the density f(x), with support from 0 to infinity.
500 500 500
∫0 f(x) dx = 0.685.
∫0 x f(x) dx = 217.
∫0 x2 f(x) dx = 76,616.
Determine the variance of the limited loss variable with u = 500, X ∧ 500.
A. 14,000 B. 15,000 C. 16,000 D. 17,000 E. 18,000
32.35 (4 points) The size of loss follows a Single Parameter Pareto Distribution
with α = 3 and θ = 200.
A policy has a deductible of 250, a maximum covered loss of 1000,
and a coinsurance of 80%.
Determine the variance of YP, the per payment variable.
A. 12,000 B. 13,000 C. 14,000 D. 15,000 E. 16,000
32.36 (3 points)
The loss severity random variable X follows an exponential distribution with mean θ.
Determine the coefficient of variation of the excess loss variable Y = max(X - d, 0).
32.37 (21 points) Let U be the losses in the layer from a to b.

Let V be the losses in the layer from c to d. a < b ≤ c < d.
(i) (3 points) Determine an expression for the covariance of U and V in terms of
Limited Expected Values.
(ii) (2 points) For an Exponential Distribution with mean of 8, determine the covariance of
the losses in the layer from 0 to 10 and the losses in the layer from 10 to infinity.
(iii) (3 points) For an Exponential Distribution with mean of 8, determine the variance of
the losses in the layer from 0 to 10.
Hint: For the Exponential, E[(X ∧ x)2 ] = 2θ2 - 2θ2e-x/θ - 2θxe-x/θ.
(iv) (3 points) For an Exponential Distribution with mean of 8, determine the variance of
the losses in the layer from 10 to infinity.
(v) (1 point) For an Exponential Distribution with mean of 8, determine the correlation of
(vi) (2 points) For a Pareto Distribution with α = 3 and θ = 16, determine the covariance of
(vii) (3 points) For a Pareto Distribution with α = 3 and θ = 16, determine the variance of
the losses in the layer from 0 to 10.
2 θ2
Hint: For the Pareto, E[(X ∧ x)2 ] = {1 - (1 + x/θ)1−α (1 + (α-1)x/θ)}.
(α - 1) (α - 2)
(viii) (3 points) For a Pareto Distribution with α = 3 and θ = 16, determine the variance of
the losses in the layer from 10 to infinity.
(ix) (1 point) For a Pareto Distribution with α = 3 and θ = 16, determine the correlation of
32.38 (14 points) Let X be the price of a stock at time 1.

X is distributed via a LogNormal Distribution with µ = 4 and σ = 0.3.
Let Y be the payoff on a one-year 70-strike European Call on this stock.
Y = 0 if X ≤ 70, and Y = X - 70 if X > 70.
(i) (1 point) What is the mean of X?
(ii) (2 points) What is the variance of X?
(iii) (3 points) What is the mean of Y?
(iv) (4 points) What is the variance of Y?
(v) (3 points) What is the covariance of X and Y?
(vi) (1 point) What is the correlation of X and Y?

Limit Limited Expected Value Limited Second Moment
100,000 55,556 4444 million
250,000 80,247 12,346 million
500,000 91,837 20,408 million
1,000,000 97,222 27,778 million
Infinite 100,000 40,000 million
32.39 (2 points) Determine the coefficient of variation of the layer of loss from 100,000 to 500,000.
(A) Less than 3
(E) At least 6
32.40 (2 points) Determine the coefficient of variation of the layer of loss from 250,000 to 1 million.
(A) Less than 3
(E) At least 6
32.41 (2 points) Determine the coefficient of variation of the layer of loss excess of 250,000.
(A) Less than 3
(E) At least 6

• Losses are uniform from 2 to 20.
• There is a deductible of 5.
32.42 (1 point) Determine the variance of YP, the per-payment variable.

A. 17 B. 18 C. 19 D. 20 E. 21
32.43 (3 points) Determine the variance of YL , the per-loss variable.

A. 23 B. 26 C. 29 D. 32 E. 35
32.44 (3 points) The loss severity random variable X follows the pareto distribution with
α = 5 and θ = 400.
Determine the coefficient of variation of the excess loss variable Y = max(X - 300, 0).
(A) 6.5 (B) 7.0 (C) 7.5 (D) 8.0 (E) 8.5
32.45 (4 points) Severity follows a LogNormal Distribution with µ = 7 and σ = 0.6.

There is a 1000 deductible.
Determine the variance of the payment per loss variable.
(A) 300,000 (B) 400,000 (C) 500,000 (D) 600,000 (E) 700,000
32.46 (3 points) You are given the following information:

Limit Limited Expected Value Limited Second Moment Survival Function
1000 976.66 963,617 0.91970
5000 2828.18 9,830,381 0.12465
Infinite 3000 12,000,000
There is a 1000 deductible and a 5000 maximum covered loss.

Determine the standard deviation of the per payment variable, YP.
A. 1050 B. 1100 C. 1150 D. 1200 E. 1250
32.47 (3, 11/00, Q.21 & 2009 Sample Q.115) (2.5 points)
A claim severity distribution is exponential with mean 1000.
An insurance company will pay the amount of each claim in excess of a deductible of 100.
Calculate the variance of the amount paid by the insurance company for one claim,
including the possibility that the amount paid is 0.
(A) 810,000 (B) 860,000 (C) 900,000 (D) 990,000 (E) 1,000,000
32.48 (1, 5/01, Q.35) (1.9 points) The warranty on a machine specifies that it will be replaced at
failure or age 4, whichever occurs first.
The machineʼs age at failure, X, has density function 1/5 for 0 < x < 5.
Let Y be the age of the machine at the time of replacement.
Determine the variance of Y.
(A) 1.3 (B) 1.4 (C) 1.7 (D) 2.1 (E) 7.5
32.49 (4, 11/03, Q.37 & 2009 Sample Q.28) (2.5 points) You are given:
Claim Size (X) Number of Claims
(0, 25] 25
(25, 50] 28
(50, 100] 15
(100, 200] 6
Estimate E(X2 ) - E[(X ∧ 150)2 ].
(A) Less than 200
(E) At least 500
32.50 (SOA3, 11/03, Q.28) (2.5 points) For (x):

(i) K is the curtate future lifetime random variable.
(ii) qx+k = 0.1(k + 1), k = 0, 1, 2,…, 9
Calculate Var(K ∧ 3).
(A) 1.1 (B) 1.2 (C) 1.3 (D) 1.4 (E) 1.5
32.51 (4, 5/07, Q.13) (2.5 points)

The loss severity random variable X follows the exponential distribution with mean 10,000.
Determine the coefficient of variation of the excess loss variable Y = max(X - 30,000, 0).
(A) 1.0 (B) 3.0 (C) 6.3 (D) 9.0 (E) 39.2
32.1. E. For the sum of N independent losses, both the Variance and the mean are N times that for
a single loss. The standard deviation is multiplied by N.
Thus the CV, which is the ratio of the standard deviation to the mean, is divided by N.
Per loss, mean = exp(µ + σ2/2) = e11.125, and second moment is exp(2µ + 2σ2) = e24.5.
Therefore for a single loss, CV = E[X2] / E[X]2 - 1 = e24.5 / e22.25 - 1 = e2.25 -1 = 2.91.
For 10,000 losses we divide by 10,000 = 100, thus the CV for the total losses is 0.0291.
32.2. A. E[X ∧ x] = exp(µ + σ2/2) Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}.

E[X ∧ 250,000] = exp(11.125)Φ[(ln(250,000) - 10 - 2.25)/1.5] +
(250,000){1 − Φ[(ln(250,000) − 10)/1.5]} = (67,846)Φ[.12] + (250,000)(1 − Φ[1.62])
= (67,846)(.5478) + (250,000)(1 - .9474) = 50.3 thousand.
E[(X ∧ L)2 ] = exp[2µ + 2σ2]Φ[{ln(L) − (µ+ 2σ2)} / σ] + L2 {1 - Φ[{ln(L) − µ} / σ]}.
E[(X ∧ 250,000)2 ] = exp(24.5)Φ[-1.38] + 6.25 x 1010{1 - Φ[1.62]} =

(4.367 x 1010)(.0838) + (6.25 x 1010)(1 - .9474) = 6.95 x 109 .
Therefore for a single loss Coefficient of Variation =
E[(X ∧ 250,000)2 ]/ E[X ∧ 250,000]2 - 1 = 6.95 x 109 / 2.53 x 109 - 1 = 1.32.
For 10,000 losses we divide by 10,000 = 100, thus the CV is 0.0132.

32.3. C. E[X ∧ x] = exp(µ + σ2/2) Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}.

E[X ∧ 1,000,000] =
exp(11.125)Φ[(ln(1,000,000) - 10 - 2.25 )/1.5] + (1,000,000) {1 - Φ[(ln(1,000,000) − 10)/1.5]}
= (67,846)Φ[1.04] + (1,000,000)(1−Φ[2.54]) = (67,846)(0.8508) + (1,000,000)(1 - 0.9945)
= 63.2 thousand.
E[(X ∧ L)2 ] = exp[2µ + 2σ2] Φ[{ln(L) − (µ + 2σ2)} / σ] + L2 {1 - Φ[{ln(L) − µ} / σ] }.
E[(X ∧ 1,000,000)2 ] = exp(24.5)Φ[-.46] + (1012){1 - Φ[2.54]} =
(4.367 x 1010)(.3228) + (1012){1 - 0.9945} = 1.960 x 1010.
Therefore Coefficient of Variation = E[(X ∧ 250,000)2 ]/ E[X ∧ 250,000]2 - 1 =
1.960 x 1010 / 3.99 x 109 - 1 = 1.98.

Comment: The Coefficient of Variation of the Limited Losses is less than that of the unlimited losses.
The CV of the losses limited to 250,000 is lower than that of the losses limited to
$1 million.
32.4. A. Since the insurer expects 10,000 losses per year, the expected dollars in the layer from
250,000 to $1 million are:
10,000{E[X ∧ 1 million] - E[X ∧ 250,000]} = 10,000(63.2 thousand - 50.3 thousand) =
129 million.
32.5. C. The mean for the layer is: E[X ∧ 1 million] - E[X ∧ 250,000] =
63.2 thousand - 50.3 thousand = 13.1 thousand. The second moment for the layer is:
E[(X ∧ 1 million)2 ] - E[(X ∧ 250,000)2 ] - 2(250000)(E[X ∧ 1 million] - E[X ∧ 250,000]) =
1.960 x 1010 - 6.95 x 109 - 6.55 x 109 = 6.10 x 109 .
Therefore for a single loss, Coefficient of Variation = 6.10 x 109 / 1.72 x 108 - 1 = 5.9.
Comment: The CV of a layer depends on how high the layer is, the width of the layer, as well as the
loss distribution. A higher layer usually has a larger CV.
32.6. E. For the sum of N independent losses, both the Variance and the Third Central Moment
are N times that for a single loss. (For the sum of independent random variables, second and third
central moments each add.) Thus the skewness, which is the ratio of the Third Central Moment to the
variance to the 3/2 power, is divided by N.
Per loss, mean = exp(µ + σ2/2) = e11.125, second moment is: exp(2µ + 2σ2) = e24.5, and third
moment is: exp(3µ + 4.5σ2) = e40.125. Therefore, the variance is: e24.5 - e22.25 = 3.907 x 1010.
The third central moment is: e40.125 - 3e24.5e11.125 + 2e33.375 = 2.620 x 1017.
Thus for one loss the skewness is: 2.585 x 1017 / (3.907 x 1010)1.5 = 33.5. For 10,000 losses we
divide by 10,000 = 100, thus the skewness for the total losses is 0.335.
32.7. B. E[X ∧ 250,000] = exp(11.125) Φ[(ln(250,000) - 10 - 2.25 )/1.5] +

(250,000) {1 - Φ[(ln(250,000) − 10)/1.5]} = 50.3 thousand.
E[(X ∧ 250,000)2 ] = exp(24.5) Φ[-1.38] + 6.25 x 1010{1 - Φ[1.62]} = 6.95 x 109 .

Thus the variance of a limited loss is 6.95 x 109 - 2.53 x 109 = 4.42 x 109 .
E[(X ∧ L)3 ] = exp[3µ + 4.5 σ2] Φ[{ln(L) − (µ+ 3σ2)} / σ] + L3 {1- Φ[{ln(L) − µ} / σ] }
E[(X ∧ 250,000)3 ] = e40.125Φ[-2.88] + 1.5625 x 1016 {1- Φ[1.62]} = 1.355 x 1015.
The third central moment is:
1.355 x 1015 - 3(50.3 thousand)(6.95 x 109 ) + 2( 50.3 thousand)3 = 5.6 x 1014.
Thus for one loss the skewness is: 5.6 x 1014 / ( 4.42 x 109 )1.5 = 1.9.
For 10,000 losses we divide by 10,000 = 100; thus the skewness for the total losses is 0.019.
Comment: The skewness of the limited losses is much smaller than that of the unlimited losses. Rare
very large losses have a large impact on the skewness of the unlimited losses.
32.8. A. θ2 = 100 million.
32.9. E. The second moment is E[(X ∧ x)2 ] = 2θ2 Γ(3; x/θ) + x2 e-x/θ.
E[(X ∧ 25000)2 ] = 200,000,000 Γ(3; 2.5) + 625,000,000e-2.5 =
(200 million) {1- e-2.5(1+2.5+2.52 /2)} + 51.30 million = 142.54 million.
variance = 142.54 million - 91792 = 58.28 million.
α-1
∑
xi e- x
Comment: Γ(α ; x) = 1 - .
i!
i=0
32.10. B. The second moment is: E[X2 ] - E[(X ∧ 1000)2 ] - (2)(1000){E[X] - (E[X ∧ 1000]}.
E[X] - E[X ∧ 1000] = 10,000 - 952 = 9048. E[X2 ] = 2θ2 = 200 million.
E[(X ∧ 1000)2 ] = 200,000,000 Γ(3; .1) + 1,000,000e-0.1 =
200 million{{1 - e-.1(1+.1+.12 /2)} + 0.005e-2.5} = 0.936 million.
The second moment is: 200 million - 0.936 million - (2000)(9048) = 180.97 million.
Variance = 180.97 million - 90482 = 99.1 million.
Alternately, the payment per payment variable, YP, is also Exponential with mean 10,000.
E[YL ] = E[YP] S(d) = 10,000 e-1000/10,000 = 9048.
Second Moment of YL is: (Second moment of YP) S(d) = 2θ2e-d/θ = (200,000,000)e-1000/10,000
= 180.97 million. Variance = 180.97 million - 90482 = 99.1 million.
Comment: In this situation, Var[YL ] = 2θ2e-d/θ - (θ e-d/θ)2 = θ2 (2e-d/θ - e-2d/θ).
∫
f(x)
32.11. C. The second moment is: (x -1000)2 dx
S(1000)
1000
= (2nd moment of payment per loss) / S(1000) = 180.97 million / e-0.1 = 180.97 million / 0.9048 =
200.00 million.
Variance = 200.00 million - 10,0002 = 100 million.
Comment: Due to the memoryless property of the Exponential, the variance is the same as in the
absence of a deductible.
32.12. D. E[X ∧ 25000] = 9179. E[X ∧ 1000] = 952.

E[(X ∧ 25000)2 ] = 142.54 million. E[(X ∧ 1000)2 ] = 0.936 million.
second moment of the layer of loss =
E[(X ∧ 25000)2 ] - E[(X ∧ 1000)2 ] - (2)(1000){E[X ∧ 25000] - (E[X ∧ 1000]} =
142.54 million - 0.936 million - (2000)(9179 - 952) = 125.15 million.
Variance = 125.15 million - (9179 - 952)2 = 57.46 million.
25,000
∫
f(x)
32.13. D. The second moment is: (x - 1000)2 dx + 240002 S(25000)/S(1000)
S(1000)
1000
= (2nd moment of payment per loss) / S(1000)

= 125.15 million / e-0.1 = 125.15 million / 0.9048 = 138.32 million.
The mean is: 9093. Variance = 138.32 million - 90932 = 55.63 million.
32.14. E. The second moment is:

0.752 {E[(X ∧ 25000)2 ] - E[(X ∧ 1000)2 ] - (2)(1000){E[X ∧ 25000] - (E[X ∧ 1000]}}
= 0.5625{ 142.54 million - 0.936 million - (2000)(9179 - 952)} = 70.40 million.
Variance = 70.40 million - 61702 = 32.33 million.
32.15. C. E[X ∧ 4] = (1/6)(1) + (1/6)(2) + (1/6)(3) + (3/6)(4) = 3.

E[(X ∧ 4)2] = (1/6)(12) + (1/6)(22) + (1/6)(32) + (3/6)(42) = 10.33.
Var(X ∧ 4) = 10.33 - 32 = 1.33.
∞ ∞
32.16. E. E[X2 | X > 100] =
∫100 x2 f(x) dx / S(100) . ⇒ 100∫ x2 f(x) dx = S(100)E[X2 | X > 100]
= (0.65)(250,000) = 162,500.
∞ ∞ 100
∫100 x f(x) dx = ∫0 x f(x) dx - ∫0 x f(x) dx = E[X] - {E[X ∧ 100] - 100S(100)}

= 245 - {85 - (100)(0.65)} = 225.
∞
∫100 f(x) dx = S(100) = 0.65.

With a deductible of 100 per loss, the second moment of the payment per loss is:
∞ ∞ ∞ ∞
∫100 (x - 100)2 f(x) dx = ∫100 x2 f(x) dx - 200 100∫ x f(x) dx + 10,000 100∫ f(x) dx
= 162,500 - (200)(225) + (10,000)(0.65) = 124,000.
Comment: Similar to SOA M, 5/05, Q.17, in “Mahlerʼs Guide to Aggregate Distributions.”
The variance of the payment per loss is: 124,000 - (245 - 85)2 = 98,400.
With a deductible of d, the second moment of the payment per loss is:
E[X2 | X > d] S(d) - 2d(E[X] - {E[X ∧ d] - dS(d)}) + d2 S(d) =
E[X2 | X > d] S(d) - 2d E[X] + 2d E[X ∧ d] - d2 S(d).
32.17. E. E[X ∧ x] = exp(µ + σ2/2) Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}.

E[X ∧ 50000] = exp(10.02)Φ[(ln(50,000) -9.7 - .64 )/.8] + (50,000) {1 − Φ[(ln(50,000) − 9.7)/.8]} =
(22,471)Φ[.60] + (50,000) {1 - Φ[1.40]} = (22,471)(0.7257) + (50,000) (1 - 0.9192) = 20,347.
E[X ∧ 10000] = exp(10.02)Φ[(ln(10,000) - 9.7 - 0.64 )/0.8] + (10,000){1 - Φ[(ln(10,000) - 9.7)/0.8]}
= (22,471)Φ[-1.41] + (10,000) {1 - Φ[-0.61]} = (22,471)(0.0793) + (10,000) (0.7291) = 9073.
(0.9)(E[X ∧ 50000] - E[X ∧ 10000]) = (0.9)(20,347 - 9073) = 10,147.
32.18. B. E[(X ∧ x)2 ] = exp[2µ + 2σ2] Φ[{ln(x) − (µ+ 2σ2)} / σ] + x2 {1 - Φ[{ln(x) −µ} / σ] }
E[(X ∧ 50,000)2 ] = exp[20.68] Φ[{ln(50000) - 10.98}/0.8] + 500002 {1 - Φ[{ln(50000) - 9.7}/0.8] } =

957,656,776Φ[-0.20] + 2,500,000,000{1 - Φ[1.40]} =
(957,656,776)(0.4207) + (2,500,000,000){1 - 0.9192} = 604.9 million.
32.19. B. E[(X ∧ 10,000)2 ] =
exp[20.68]Φ[{ln(10000) - 10.98} / 0.8] + 100002 {1 - Φ[{ln(10000) - 9.7} / 0.8] } =
957,656,776Φ[-2.21] + 100,000,000{1 - Φ[-0.61]} =

(957,656,776)(0.0136) + (100,000,000)(0.7291) = 85.9 million.
32.20. D. c2 {E[(X ∧ L)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ L] - (E[X ∧ d]} - {E[X ∧ L] - (E[X ∧ d]}2 }
= 0.92 {E[(X ∧ 50000)2 ] - E[(X ∧ 10000)2 ] - 2(10000){E[X ∧ 50000] - (E[X ∧ 10000]}
- {E[X ∧ 50000] - (E[X ∧ 10000]}2 } =
0.81{604.8 million - 85.9 million - 20,000(20,345 - 9073) - (20,345 - 9073)2 } =
(0.81){166.4 million) = 134.8 million.
32.21. A. For a uniform distribution on (a, b), E[X2 ] = (b3 - a3 )/{3(b - a)}.
For those intervals above 80, E[(X ∧ 80)2 ] = 802 = 6400.
We need to divide the interval from 50 to 100 into two pieces.
For losses uniform on 50 to 100, 2/5 are expected to be greater than 80, so we assign
(2.5)(20) = 8 to the interval 80 to 100 and the remaining 12 to the interval 50 to 80.
Bottom of Top of Number Expected 2nd Moment
Interval Interval of Claims Limited to 80
0 25 30 208
25 50 32 1458
50 80 12 4300
80 100 8 6400
100 200 8 6400
Average 2299
32.22. B., 32.23. C., 32.24. E., 32.25. B. , 32.26. C.

10
E[X ∧ 10] = ∫0 x / 40 dx + (3/4)(10) = 8.75.
25
E[X ∧ 25] = ∫0 x / 40 dx + (15/40)(25) = 17.1875.
10
E[(X ∧ 10)2 ] = ∫0 x2 / 40 dx + (3/4)(102) = 83.333.
25
E[(X ∧ 25)2 ] = ∫0 x2 / 40 dx x + (15/40)(252) = 364.583.
Layer Average Severity is: E[X ∧ 25] - E[X ∧ 10] = 17.1875 - 8.75 = 8.4375.
2nd moment of the layer = E[(X ∧ 25)2 ] - E[(X ∧ 10)2 ] - (2)(10)(E[X ∧ 25] - E[X ∧ 10]) =
364.583 - 83.333 - (2)(10)(8.4375) = 112.5. Variance of the layer = 112.5 - 8.43752 = 41.31.
Alternately, the contributions to the layer from each small loss is 0, from each medium loss is x - 10,
and each large loss is 15. Thus the second moment of the layer is:
25
∫10 (x -10)2 / 40 dx + (15/40)(152) = 28.125 + 84.375 = 112.5. Proceed as before.

32.27. C. Mean = (50%)(10) + (30%)(50) + (20%)(100) = 40.

Second Moment = (50%)(102 ) + (30%)(502 ) + (20%)(1002 ) = 2800.
Variance = 2800 - 402 = 1200. coefficient of variation = 1200 / 40 = 0.866.
32.28. E. Third Central Moment = (50%)(10 - 40)3 + (30%)(50 - 40)3 + (20%)(100 - 40)3 =
30,000.
Skewness = Third Central Moment / Variance1.5 = 30,000 / 12001.5 = 0.722.
32.29. A. Mean = (50%)(10) + (30%)(25) + (20%)(25) = 17.5.

Second Moment = (50%)(102 ) + (30%)(252 ) + (20%)(252 ) = 362.5.
Variance = 362.5 - 17.52 = 56.25.
coefficient of variation = 56.25 / 17.5 = 0.429.
32.30. B. Mean = (50%)(10) + (30%)(50) + (20%)(60) = 32.

Second Moment = (50%)(102 ) + (30%)(502 ) + (20%)(602 ) = 1520.
Variance = 1520 - 322 = 496.
coefficient of variation = 496 / 32 = 0.696.
32.31. B. Third Central Moment = (50%)(10 - 32)3 + (30%)(50 - 32)3 + (20%)(60 - 32)3 = 816.
Skewness = Third Central Moment / Variance1.5 = 816 / 4961.5 = 0.074.
32.32. C. (50%)(0) + (30%)(20) + (20%)(40) = 14.
32.33. D. Second Moment = (50%)(02 ) + (30%)(202 ) + (20%)(402 ) = 440.

Variance = 440 - 142 = 244. coefficient of variation = 244 / 14 = 1.116.
500
32.34. B. E[X ∧ 500] = ∫0 x f(x) dx + 500 S(500) = 217 + (500)(1 - .685) = 374.5.
500
E[(X ∧ 500)2 ] = ∫0 x2 f(x) dx + 5002 S(500) = 76,616 + (5002 )(1 - .685) = 155,366.
Var[(X ∧ 500)2 ] = E[(X ∧ 500)2 ] - E[X ∧ 500]2 = 155,366 - 374.52 = 15,116.

Comment: Based on a Gamma Distribution with α = 4.3 and θ = 100.
32.35. C. From the Tables attached to the exam, for the Single Parameter Pareto, for x ≥ θ:
αθ θα 2 α θ2 2 θα
E[X ∧ x] = - . E[(X ∧ x) ] = - .
α - 1 (α - 1) x α - 1 α - 2 (α - 2) x α - 2
Thus E[X ∧ 250] = (3)(200) / 2 - 2003 / {(2) (2502 )} = 236.

E[X ∧ 1000] = (3)(200) / 2 - 2003 / {(2) (10002 )} = 296. S(250) = (200/250)3 = 0.512.
Thus the mean payment per payment is: (80%) (296 - 236) / 0.512 = 93.75.
2
E[(X ∧ 250) ] = (3)(2002 ) / 1 - (2)(2003 ) / 250 = 56,000.
2
E[(X ∧ 1000) ] = (3)(2002 ) / 1 - (2)(2003 ) / 1000 = 104,000.
Thus the second moment of the non-zero payments is:
(80%)2 {104,000 - 56,000 - (2)(250)(296 - 236)} / 0.512 = 22,500.
Thus the variance of the non-zero payments is: 22,500 - 93.752 = 13,711.
32.36. The probability that a loss exceeds d is: e-d/θ.

The non-zero payments excess of a deductible d is the same as the original Exponential.
Zero payments would contribute nothing to the aggregate amount paid.
One can think of Y = (X - d)+ as the aggregate that results from a Bernoulli frequency with q = e-d/θ,
and an Exponential severity with mean θ.

This has a variance of: (mean of Bernoulli)(var. of Expon.) + (mean of Expon.)2 (var. of Bernoulli)
= (q)(θ2) + (θ2)(q)(1 - q) = (θ2)(q)(2 - q).
Mean aggregate is: qθ.
Coefficient of variation is: (θ2)(q)(2 - q) / (qθ) = 2/ q - 1= 2 ed / θ - 1.
Alternately, the payment per payment variable, YP, is also Exponential with mean θ.
E[YL ] = E[YP] S(d) = θ e-d/θ .
Second Moment of YL is: (Second moment of YP) S(d) = 2θ2 e-d/θ .
1 + CV2 = (Second moment) / (First Moment)2 = 2 ed/θ. ⇒ CV = 2 ed / θ - 1.

32.37. (i) E[U] = E[X ∧ b] - E[X ∧ a]. E[V] = E[X ∧ d] - E[X ∧ c].
For x < c, V = 0. For c < x < d, U = b - a, and V = x - c. For d < x, U = b - a, and V = d - c.
d d
Therefore, E[UV] = ∫c (b - a)(x- c)f(x) dx + (b-a)(d-c) S(d) = (b - a){ ∫c (x- c) f(x) dx + (d-c) S(d)}
= (b - a)(expected losses in the layer from c to d) = (b - a){E[X ∧ d] - E[X ∧ c]}.
Cov[U, V] = E[UV] - E[U]E[V] = {b - a - E[X ∧ b] + E[X ∧ a]}{E[X ∧ d] - E[X ∧ c]}.
Cov[U, V] = (width of the lower interval minus the layer average severity of the lower interval)
(layer average severity of the upper interval).
Cov[U, V] = {E[(b - X)+] - E[(a - X)+]} {E[X ∧ d] - E[X ∧ c]}.
(ii) E[X ∧ 10] = 8(1 - e-10/8) = 5.708.
Using the result from part (i) with a = 0, b = c = 10, and d = ∞:
Covariance = {10 - E[X ∧ 10]} {E[X] - E[X ∧ 10]} = (10 - 5.708) (8 - 5.708) = 9.84.
(iii) Mean of the layer from 0 to 10 is: E[X ∧ 10] = 5.708.
E[(X ∧ 10)2 ] = 2(82 ) - 2(82 ) e-10/8 - 2(8)(10) e-10/8 = 45.487.
Second Moment of the layer from 0 to 10 is:
E[(X ∧ 10)2 ] = 2(82 ) - 2(82 ) e-10/8 - 2(8)(10) e-10/8 = 45.487.
Variance of the layer from 0 to 10 is: 45.487 - 5.7082 = 12.906.
(iv) Mean of the layer from 10 to ∞ is: E[X] - E[X ∧ 10] = 8 - 5.708 = 2.292.
E[X2 ] = 2θ2 = 2(82 ) = 128.
Second Moment of the layer from 10 to ∞ is:
E[X2 ] - E[(X ∧ 10)2 ] - (2)(10)(E[X] - E[X ∧ 10]) = 128 - 45.487 - (20)(2.292) = 36.673.
Variance of the layer from 10 to ∞ is: 36.673 - 2.2922 = 31.420.
(v) Correlation of the layer from 0 to 10 and the layer from 10 to infinity is:
9.84 / (12.906)(31.420) = 0.489.
(vi) E[X ∧ 10] = {θ/(α-1)}{1 - (θ/(θ+x))α−1} = (16/2){1 - (16/26)2 } = 4.970.

Using the result from part (i) with a = 0, b = c = 10, and d = ∞:
Covariance = {10 - E[X ∧ 10]} {E[X] - E[X ∧ 10]} = (10 - 4.970) (8 - 4.970) = 15.24.
(vii) Mean of the layer from 0 to 10 is: E[X ∧ 10] = 4.970.
Second Moment of the layer from 0 to 10 is:
E[(X ∧ 10)2 ] = 2(162 ){1 - (1 + 10/16)-2(1 + (2)(10)/16)} / {(2)(1)} = 37.870.
Variance of the layer from 0 to 10 is: 37.870 - 4.9702 = 13.169.
(viii) Mean of the layer from 10 to ∞ is: E[X] - E[X ∧ 10] = 8 - 4.970 = 3.030.
E[X2 ] = 2(162 ) / {(2)(1)} = 256.
Second Moment of the layer from 10 to ∞ is:
E[X2 ] - E[(X ∧ 10)2 ] - (2)(10)(E[X] - E[X ∧ 10]) = 256 - 37.870 - (20)(3.030) = 157.530.
Variance of the layer from 10 to ∞ is: 157.530 - 3.0302 = 148.35.
(ix) Correlation of the layer from 0 to 10 and the layer from 10 to infinity is:
15.24 / (13.169)(148.35) = 0.345.
Comment: Since a larger loss contributes more to both layers than a smaller loss, the losses in the
layers are positively correlated.
Equation 40 in “On the Theory of Increased Limits and Excess of Loss Pricing,”
by Robert S. Miccolis. PCAS 1977 is a special case of the result derived in part (a).
While the layers are positively correlated, this correlation diminishes as the distance between the
layers increases.
See “The Mathematics of Excess Losses,” by Leigh J. Halliwell, Variance Volume 6 Issue 1.
32.38. (i) E[X] = exp[4 + 0.32 /2] = 57.11.

(ii) E[X2 ] = exp[(2)(4) + (2)(0.32 )] = 3568.85.
VAR[X] = 3568.85 - 57.112 = 307.3.
(iii) E[Y] = 0 Prob[X < 70] + E[X - 70 | X > 70] Prob[X > 70] = E[X - 70 | X > 70] Prob[X > 70] =
e(70) S(70) = E[X] - E[X ∧ 70 ].
E[X ∧ 70 ] = exp[4 + 0.32 /2] Φ[(ln(70) - 4 - 0.32 )/0.3] + 70 {1 - Φ[(ln(70) - 4)/0.3] =
57.11 Φ[0.53] + (70){1 - Φ[0.83]} = (57.11)(0.7019) + (70)(1 - 0.7967) = 54.32.
Thus E[Y] = E[X] - E[X ∧ 70 ] = 57.11 - 54.32 = 2.79.
(iv) E[Y2 ] = 0 Prob[X < 70] + E[(X - 70)2 | X > 70] Prob[X > 70] =
(second moment of the layer from 70 to infinity) Prob[X > 70] =
E[X2 ] - E[(X ∧ 70)2 ] - (2)(70){E[X] - E[X ∧ 70 ]}.
For LogNormal Distribution the second limited moment is:
ln(x) − µ − 2σ2 ln(x) − µ
E[(X ∧ x)2 ] = exp[2µ + 2σ2] Φ [ σ
]
+ x2 {1 - Φ
σ
[ }. ]
E[(X ∧ 70)2 ] = exp[(2)(4) + (2)(0.32 )] Φ[(ln(70) - 4 - (2)(0.32 ))/0.3] + 702 {1 - Φ[(ln(70) - 4)/0.3]}
= 3568.85 Φ[0.23] + 4900 {1 - Φ[0.83]} = (3568.85)(0.5910) + (4900)(1 - 0.7967) = 3105.36.
Thus, E[Y2 ] = E[X2 ] - E[(X ∧ 70)2 ] - (2)(70){E[X] - E[X ∧ 70 ]} =
3568.85 - 3105.36 - (140)(57.11 - 54.32) = 72.89.
VAR[Y] = E[Y2 ] - E[Y]2 = 72.89 - 2.792 = 65.11.
(v) E[XY] = E[X (X - 70) | X > 70] Prob[X > 70] =
E[(70)(X - 70) - (X - 70) (X - 70) | X > 70] Prob[X > 70] =
{E[(70)(X - 70) | X > 70] - E[(X - 70) (X - 70) | X > 70]} Prob[X > 70] =
E[(70)(X - 70) | X > 70] Prob[X > 70] - E[(X - 70)2 | X > 70] Prob[X > 70] =
70 E[X - 70 | X > 70] Prob[X > 70] - E[Y2 | X > 70] Prob[X > 70] =
70 E[Y] + E[Y2 ] = (70) (2.79) + 72.89 = 268.19.
Cov[X, Y] = E[XY] - E[X] E[Y] = 268.19 - (57.11)(2.79) = 108.85.
Cov[X, Y] 108.85
(vi) Corr[X, Y] = = = 0.77.
VAR[X] VAR[Y] (307.3) (65.11)
32.39. A. The first moment of the layer is: 91,837 - 55,556 = 36,281.
The second moment of the layer is:
20,408 million - 4444 million - (2)(100,000)(36,281) = 8707.8 million.
Variance of the layer is: 8707.8 million - 36,2812 = 7392 million.
Standard Deviation of the layer is: 85,977.
CV of the layer is: 85,977/36,281 = 2.37.
Comment: Based on a Pareto Distribution with α = 3 and θ = 200,000.
32.40. C. The first moment of the layer is: 97,222 - 80,247 = 16,975.
The second moment of the layer is:
27,778 million - 12,346 million - (2)(250,000)(16,975) = 6944.5 million.
1 + CV2 = 6944.5 million / 16,9752 = 24.100. ⇒ CV = 4.81.
32.41. E. The mean is: E[X] - E[X ∧ 250,000] = 100,000 - 80,247 = 19,753.
Second moment of the payment per loss variable is:
E[X2 ] - E[(X ∧ 250,000)2 ] - (2)(250,000) (E[X] - E[X ∧ 250,000]) =
40,000 million - 12,346 million - (500,000)(19,753) = 17,778 million.
1 + CV2 = 17,778 million / 19,7532 = 45.563. ⇒ CV = 6.68.
32.42. C. The non-zero payments are uniform from 0 to 15, with variance:
(15 - 0)2 / 12 = 18.75.
32.43. A. The non-zero payments are uniform from 0 to 15,

with mean: 7.5, variance: (15 - 0)2 / 12 = 18.75,
and second moment: 18.75 + 7.52 = 75.
The probability of a non-zero payment is: 15/18 = 5/6.
Thus YL is a two-point mixture of a uniform distribution from 0 to 15 and a distribution that is always
zero, with weights 5/6 and 1/6.
The mean of the mixture is: (5/6)(7.5) + (1/6)(0) = 6.25.
The second moment of the mixture is: (5/6)(75) + (1/6)(02 ) = 62.5.
The variance of this mixture is: 62.5 - 6.252 = 23.44.
Alternately, YL can be thought of as a compound distribution, with Bernoulli frequency with mean 5/6
and Uniform severity from 0 to 15.
The variance of this compound distribution is:
(5/6)(18.75) + (7.5)2 {(5/6)(1/6)} = 23.44.
32.44. A. The probability that a loss exceeds 300 is: (4/7)5 = 0.06093.
The losses truncated and shifted from below at 300 is also a Pareto Distribution but with
α = 5 and θ = 400 + 300 = 700.
One can think of Y = (X - 300)+ as the aggregate that results from a Bernoulli frequency with
q = 0.06093, and a Pareto severity with α = 5 and θ = 700.
This Pareto has mean: 700/4 = 175, second moment: (2)(7002 ) / {(4)(3)} = 81,667,
and variance: 81,667 - 1752 = 51,042.
This has a variance of: (mean of Bernoulli)(var. of Pareto) + (mean of Pareto)2 (var. of Bernoulli)
= (0.06093)(51,042) + (1752 )(0.06093)(1 - 0.06093) = 4862.
Mean aggregate is: (0.06093)(175) = 10.663.
Coefficient of variation is: 4862 / 10.663 = 6.54.
Alternately, this is mathematically equivalent to a two point mixture, with 0.06093 weight to a Pareto
with α = 5 and θ = 700 (the non-zero payments) and (1 - 0.06093) weight to a distribution that is
always zero. The mean is: (0.06093)(175) + (1 - 0.06093)(0) = 10.663.
The second moment is the weighted average of the two second moments:
(0.06093)(81,667) + (1 - 0.06093)(0) = 4976.
Therefore, 1 + CV2 = 4976/10.6632 = 43.76. ⇒ CV = 6.54.
Comment: Similar to 4, 5/07, Q.13, which involves an Exponential rather than a Pareto.
32.45. D. E[X] = exp[7 + 0.62 /2] = 1313.

E[X2 ] = exp[(2)(7) + (2)(0.62 )] = 2,470,670.
E[X ∧ x] = exp(µ + σ2/2) Φ[(lnx - µ - σ2)/σ] + x {1 - Φ[(lnx - µ)/σ]}.
E[X ∧ 1000] = 1313 Φ[{ln(1000) - 7 - 0.62 } / 0.6] + 1000{1 - Φ[{ln(1000) - 7} / 0.6]}

= 1313 Φ[-0.75] + (1000)(1 - Φ[-0.15]) = (1313)(0.2266) + (1000)(1 - 0.4404) = 857.
E[(X ∧ x)2 ] = exp[2µ + 2σ2]Φ[{ln(x) - (µ + 2σ2)} / σ] + x2 {1 - Φ[{ln(x) - µ} / σ]}.
E[(X ∧ 1000)2 ] = 2,470,670 Φ[{ln(1000) - 7.72} / 0.6] + 10002 {1 - Φ[{ln(1000) - 7} / 0.6] }
= 2,470,670 Φ[-1.35] + 1,000,000 {1 - Φ[-0.15]} = (2,470,670)(0.0885) + (1,000,000)(0.5596)

= 778,254.
Mean of the payment per loss variable is: E[X] - E[X ∧ 1000] = 1313 - 857 = 456.
Second moment of the payment per loss variable is:
E[X2 ] - E[(X ∧ 1000)2 ] - (2)(1000) (E[X] - E[X ∧ 1000])
= 2,470,670 - 778,254 - (2)(1000)(456) = 780,416.
Variance of the payment per loss variable is: 780,416 - 4562 = 572,480.
32.46. E. The first moment of the per payment variable is:

(E[X ∧ 5000] - E[X ∧ 1000]) / S(1000) = (2828.18 - 976.66) / 0.91970 = 2013.
The second moment of the per payment variable is:
{E[(X ∧ 5000)2 ] - E[(X ∧ 1000)2 ] - (2)(1000)(E[X ∧ 5000] - E[X ∧ 1000])} / S(1000) =
{9.830,381 - 963,617 - (2)(1000)(2828.18 - 976.66)} / 0.91970 = 5,614,574.
Variance of the per payment variable is: 5,614,574 - 20132 = 1,562,405.
1,562,405 = 1250.
32.47. D. An Exponential distribution truncated and shifted from below is the same Exponential
Distribution, due to the memoryless property of the Exponential. Thus the nonzero payments are
Exponential with mean 1000. The probability of a nonzero payment is the probability that a loss is
greater than the deductible of 100; S(100) = e-100/1000 = 0.90484.
Thus the payments of the insurer can be thought of as a compound distribution, with Bernoulli
frequency with mean 0.90484 and Exponential severity with mean 1000. The variance of this
compound distribution is:
(0.90484)(10002 ) + (1000)2 {(0.90484)(0.09516)} = 990,945.
Equivalently, the payments of the insurer in this case are a two point mixture of an Exponential with
mean 1000 and a distribution that is always zero, with weights .90484 and .09516. This has a first
moment of: (1000)(0.90484) + (0.09516)(0) = 904.84, and a second moment of:
{(2)(10002 )}(0.90484) + (0.09516)(02 ) = 1,809,680.
Thus the variance is: 1,809,680 - 904.842 = 990,945.
Alternately, the payment per payment variable, YP, is also Exponential with mean 1000.
E[YL ] = E[YP] S(d) = 1000 e-100/1000 = 904.8.
Second Moment of YL is: (Second moment of YP) S(d) = 2θ2e-d/θ = (2,000,000)e-100/1000
= 1,809,675. Variance = 1,809,675 - 904.82 = 991,011.
Alternately, for the Exponential Distribution, E[X] = θ = 1000, and E[X2 ] = 2θ2 = 2 million.
For the Exponential Distribution, E[X ∧ x] = θ (1 - e-x/θ).
E[X ∧ 100] = 1000(1 - e-100/1000) = 95.16.

For the Exponential, E[(X ∧ x)n ] = n! θn Γ(n+1; x/θ) + xn e-x/θ.
E[(X ∧ 100)2 ] = (2)10002 Γ(3; 100/1000) + 1002 e-100/1000.
According to Theorem A.1 in Loss Models, for integral α, the incomplete Gamma function
Γ(α; y) is 1 minus the first α densities of a Poisson Distribution with mean y.
Γ(3; y) = 1 - e-y(1 + y + y2 /2). Γ(3; .1) = 1 - e-0.1(1 + 0.1 + 0.12 /2) = 0.0001546.
Therefore, E[(X ∧ 100)2 ] = (2 million)(0.0001546) + 10000e-0.1 = 9357.
The first moment of the layer from 100 to ∞ is: E[X] - E[X ∧ 100] = 1000 - 95.16 = 904.84.
The second moment of the layer from 100 to ∞ is:
E[X2 ] - E[(X ∧ 100)2 ] - (2)(100)(E[X] - E[X ∧ 100]) =
2,000,000 - 9357 - (200)(904.84) = 1,809,675.
Therefore, the variance of the layer from 100 to ∞ is: 1,809,675 - 904.842 = 990,940.
Alternately, one can work directly with the integrals, using integration by parts.
The first moment of the layer from 100 to ∞ is:
∞ ∞ ∞
∫100 (x - 100)e ∫100 xe- x / 1000 / 1000 dx - (1/10) 100∫ e- x / 1000 dx =

- x / 1000
/ 1000 dx =
x= ∞
-xe- x / 1000 - 1000e - x / 1000 ] - 100e-0.1 = 100e-0.1 + 1000e-0.1 - 100e-0.1 = 904.84.
x = 100
The second moment of the layer from 100 to ∞ is:

∞
∫100 (x - 100)2e- x / 1000 / 1000 dx

∞ ∞ ∞
= ∫ x2 e- x / 1000 / 1000 dx - ∫100 x e - x / 1000 / 5 dx + 10 ∫100 e- x / 1000 dx =
100
x= ∞
-x2 e- x / 1000 - 2000xe - x / 1000 - 2,000,000e - x / 1000 ] +
x = 100
x= ∞
200xe- x / 1000 + 200,000e - x / 1000 ] + 10,000e-0.1
x = 100
= e-0.1{10,000 + 200,000 + 2,000,000 - 20,000 - 200,000 + 10,000} =

2,000,000e-0.1 = 1,809,675.
Therefore, the variance of the layer from 100 to ∞ is: 1,809,675 - 904.842 = 990,940.
Comment: Very long and difficult, unless one uses the memoryless property of the Exponential
Distribution.
32.48. C. E[Y] = (2)(4/5) + 4(1/5) = 2.4.

4
E[Y2 ] = ∫0 x2 / 5 dx + 42(1/5) = 64/15 + 16/5 = 7.4667.
Var[Y] = 7.4667 - 2.42 = 1.7067.
32.49. C. For X ≤ 150, X = X ∧ 150.

So the only contribution to E[X2 ] - E[(X ∧ 150)2 ] comes from any losses of size > 150.
Losses uniform on (100, 200] ⇒ expect 3 claims greater than 150, out of a total of 74.
Uniform on (150, 200], E[X2 ] = variance + mean2 = 502 /12 + 1752 = 30,833.
On (150, 200], each loss is at least 150, and therefore E[(X ∧ 150)2 ] = 1502 = 22,500.
E(X2 ) - E[(X ∧ 150)2 ] = Prob[X>150] E[X2 - (X ∧ 150)2 | X uniform on (150, 200]] =
(3/74)(30,833 - 22,500) = 338.
32.50. A. qx = 0.1. qx+1 = 0.2. qx+2 = 0.3.

Prob[K = 0] = Prob[Die 1st Year] = qx = 0.1.
Prob[K = 1] = P[Alive @ x+1] P[Die 2nd Year | Alive @ x+1] = (1 - qx) qx+1 = (0.9)(0.2) = 0.18.
Prob[K = 2] = P[Alive @ x+2 ] P[Die 3rd Year | Alive @ x+2] =
(1 - qx) (1 - qx+1) qx+2 = (0.9)(0.8)(0.3) = 0.216.
Prob[K ≥ 3] = 1 - (0.1 + 0.18 + .0216) = 0.504.
K Prob K ∧ 3 (K ∧ 3)2
0 0.1 0 0
1 0.18 1 1
2 0.216 2 4
≥3 0.504 3 9
Avg. 2.124 5.580
(0.1)(0) + (0.18)(1) + (0.216)(2) + (0.504)(3) = 2.124.
(0.1)(0) + (0.18)(1) + (0.216)(4) + (0.504)(9) = 5.580.
Var(K ∧ 3) = E[(K ∧ 3)2 ] - E[K ∧ 3]2 = 5.580 - 2.1242 = 1.069.
32.51. C. The probability that a loss exceeds 30,000 is: e-30000/10000 = 0.049787.
The losses truncated and shifted from below at 30,000 is the same as the original Exponential.
One can think of Y = (X - 30000)+ as the aggregate that results from a Bernoulli frequency with
q = 0.049787, and an Exponential severity with mean 10,000.
This has a variance of: (mean of Bernoulli)(var. of Expon.) + (mean of Expon.)2 (var. of Bernoulli)
= (0.049787)(100002 ) + (100002 )(0.049787)(1 - 0.049787) = 9,710,096.
Mean aggregate is: ( 0.049787)(10000) = 497.87.
Coefficient of variation is: 9,710,096 / 497.87 = 6.26.
Alternately, this is mathematically equivalent to a two point mixture, with 0.049787 weight to an
Exponential with mean 10,000 (the non-zero payments) and (1 - 0.049787) weight to a distribution
that is always zero.
The mean is: (0.049787)(10,000) + (1 - 0.049787)(0) = 497.87.
The second moment is the weighted average of the two second moments:
(0.049787)(2)(10,0002) + (1 - 0.049787)(0) = 9,957,414.
Therefore, 1 + CV2 = 9,957,414/497.872 = 40.17. ⇒ CV = 6.26.

Alternately, E[Y2 ] = E[(X - 30000)+2 | X > 30000] Prob[X > 30000]
= (Second moment of an Exponential Distribution with θ = 10000) e-10000/30000
= (2)(100002 )(0.049787) = 9,957,414. Proceed as before.
∞
Alternately, E[Y] = ∫ (x - 30,000) exp[-x / 10,000] / 10,000 dx =
30,000
∞ ∞
∫0 y exp[-(y + 30,000) / 10,000]/ 10,000 dy = e-3 ∫0 y exp[-y / 10,000] / 10,000 dy

= e-3 (10,000) = 497.87.
∞
E[Y2 ] = ∫ (x - 30,000)2 exp[-x / 10,000] / 10,000 dx
30000
∞ ∞
= ∫0 y2 exp[-(y + 30,000)/ 10,000]/ 10,000 dy = e-3 ∫0 y2 exp[-y / 10,000] / 10,000 dy
= e-3 (2)(100002 ) = 9,957,414. Proceed as before.
2016-C-2, Loss Distributions, §33 Mean Excess Loss HCM 10/21/15, Page 537
Section 33, Mean Excess Loss
As discussed previously, the Mean Excess Loss or Mean Residual Life (complete expectation of
life), e(x), is defined as the mean value of those losses greater than size x, where each loss is
reduced by x. Thus one only includes those losses greater than size x, and only that part of each
such loss greater than x.
∞ ∞
∫x (t - x) f( t) dt ∫x t f(t) dt
e(x) = E[X - x | X > x] = = - x.
S(x) S(x)
The Mean Excess Loss at d, e(d) = average payment per payment with a deductible d.
The Mean Excess Loss is the mean of the loss distribution truncated and shifted at x:
e(x) = (average size of those losses greater in size than x) - x.
Therefore, the average size of those losses greater in size than x = e(x) + x.
On the exam, usually the easiest way to compute the Mean Excess Loss for a distribution is to use
the formulas for the Limited Expected Value in Appendix A of Loss Models, and the identity:
E[X] − E[X ∧x]
e(x) = .
S(x)
Therefore, e(0) = mean, provided the distribution has support, x > 0.235
Exercise: E[X ∧ $1 million] = $234,109. E[X] = $342,222. S($1 million) = 0.06119.

Determine e($1 million).
[Solution: e($1 million) = (342,222 - 234,109) / 0.06119 = $1.767 million.]
235
Thus e(0) = mean, with the notable exception of the Single Parameter Pareto.
Formulas for the Mean Excess Loss for Various Distributions:
Distribution Mean Excess Loss, e(x)

Exponential θ
θ+ x
Pareto , α>1
α − 1
⎡ ln(x) − µ − n σ2 ⎤
1 - Φ⎢ ⎥⎦
⎣ σ
LogNormal exp(µ + σ2/2) - x
⎡ln(x) − µ ⎤
1 - Φ⎢ ⎥⎦
⎣ σ
1 - Γ(α +1 ; x / θ)
Gamma αθ - x
1 - Γ(α ; x / θ)
Weibull θ Γ(1 +1/τ) {1 - Γ(1 +1/τ ; (x/θ)τ) } exp[(x/θ)τ] - x
x
Single Parameter Pareto
α − 1
Inverse Gaussian:
⎛ x⎞ θ ⎛x ⎞ θ
[
Φ ⎜1− ⎟
⎝ µ⎠ x
] [
+ e2θ / µ Φ − ⎜ + 1⎟
⎝µ ⎠ x
]
µ - x
⎛ x⎞ θ ⎛x ⎞ θ
[
Φ ⎜1− ⎟
⎝ µ⎠ x
] [
- e2 θ / µ Φ − ⎜ + 1⎟
⎝µ ⎠ x
]
Burr {θΓ(α− 1/γ)Γ(1+1/γ) / Γ(α)}{β[ α− 1/γ , 1+1/γ ; 1/(1+(x/θ)γ)]}(1+(x/θ)γ)α, αγ > 1
Trans. Gamma θ{Γ(α+(1/τ)) /Γ(α)}{1 - Γ(α+(1/τ) ; (x/θ)τ) } / {1-Γ[α ; (x/θ)τ]} - x
Gen. Pareto { θτ / (α-1)}β[α−1, τ+1; θ/(θ+x)] / β[ α,τ ; θ/(θ+x)]}, α > 1
Normal σ2φ[(x − µ )/σ] / {1 - Φ[(x − µ )/σ]} + µ - x
It should be noted that for heavier-tailed distributions, just as with the mean, the Mean Excess Loss
only exists for certain values of the parameters. Otherwise it is infinite.
For example, for the Pareto for α ≤ 1, the mean excess loss is infinite or does not exist.
The Exponential distribution is the only distribution with a constant Mean Excess Loss.
If F(x) represents the distribution of the ages of death, then e(x) is the (remaining) life expectancy of
a person of age x. A constant Mean Excess Loss is independent of age and is equivalent to a force
of mortality that is independent of age.
Exercise: For a Pareto with α = 4 and θ = 1000, determine e(800).
[Solution: E[X] = θ/(α-1) = 333.3333. E[X ∧ 800] = {θ/(α-1)} {1 - (θ/(θ + 800))α−1} = 276.1774.
S(800) = {θ/(θ + 800)}α = (1/1.8)4 = 0.09526.

e(800) = (333.3333 - 276.1774)/(0.09526) = 600.
Alternately, e(x) = (θ + x) / (α - 1) = (1000 + 800)/(4 - 1) = 600.]
Mean Excess Loss in terms of the Survival Function:
The Mean Excess Loss can be written in terms of the survival function, S(x) = 1 - F(x).
By definition, e(x) is the ratio of loss dollars excess of x divided by S(x).
∞ ∞
e(x) = ∫x (t - x) f( t)dt / S(x) = {∫x t f(t) dt - S(x)x} / S(x) .
Using integration by parts and the fact that the integral of f(x) is -S(x):236
∞
e(x) = {S(x)x + ∫x S(t) dt - S(x)x} / S(x).
∞
e(x) = ∫x S(t) dt / S(x).
So the Mean Excess Loss at x is the integral of the survival function from x to infinity divided by the
survival function at x.237
236
Note that the derivative of S(x) is d(1-F(x))/dx = - f(x). Remember there is an arbitrary constant for indefinite
integrals. Thus the indefinite integral of f(x) is either -F(x) or S(x) = 1-F(x).
237
The Mean Excess Loss as defined here is the same as the complete expectation of life as defined in Life
Contingencies. The formula given here is equivalent to formula 3.5.2 in Actuarial Mathematics by Bowers et. al.,
pp.62-63. sp x = S(x+s) / S(x), and therefore:
∞
e° x = ∫ spx ds.
0
For example, for the Pareto Distribution, S(x) = θα (θ+x)−α.
Therefore, e(x) = {θα (θ+x)1−α / (α−1)} / {θα (θ+x)−α} = (θ + x) / (α - 1).

This matches the formula given above for the Mean Excess Loss of the Pareto Distribution.
Behavior of e(x) in the Righthand Tail:
Here is a table of the behavior of the Mean Excess Loss as the loss size approaches infinity for
some distributions:
Distribution Behavior of e(x) as x→∞ For Extremely Large x

Exponential constant e(x) = θ
Single Par. Pareto increases linearly e(x) = x / (α-1)
Pareto increases linearly e(x) = (θ+x) / (α−1)
LogNormal increases to infinity less than linearly e(x) ≈ x σ2 / ln(x)

Gamma, α>1 decreases towards a horizontal asymptote e(x)→ θ
Gamma, α<1 increases towards a horizontal asymptote238 e(x)→ θ
Inverse Gaussian increases to a constant e(x)→ 2µ2/θ
Weibull, τ>1 decreases to zero e(x) ≅ θτ x1 − τ / τ
Weibull, τ<1 increases to infinity less than linearly e(x) ≅ θτ x1 − τ / τ
Trans. Gamma, τ>1 decreases to zero e(x) ≅ θτ x1 − τ / τ
Trans. Gamma, τ<1 increases to infinity less than linearly e(x) ≅ θτ x1 − τ / τ
Burr increases to infinity approximately linearly e(x) ≅ x / (αγ−1)
Gen. Pareto increases to infinity approximately linearly e(x) ≅ x / (α−1)
Inv. Trans. Gamma increases to infinity approximately linearly e(x) ≅ x / (ατ−1)
Normal decreases to zero approximately as 1/x e(x) ≅ σ2 / (x-µ)
238
For the Gamma Distribution for large x, e(x) ≅ θ − (α-1)/x .
Recall that the mean and thus the Mean Excess Loss fails to exist for: Pareto with α ≤ 1,
Inverse Transformed Gamma with ατ ≤ 1, Generalized Pareto with α ≤ 1, and Burr with αγ ≤ 1.
Also the Gamma with α = 1 and the Weibull with τ = 1 are Exponential distributions, and thus have
constant Mean Excess Loss.
The Transformed Gamma with τ = 1 is a Gamma distribution, and thus in this case has the
behavior of the Mean Excess Loss depend on whether alpha is greater than, less than, or
equal to one.
For the LogNormal, e(x) approaches its asymptotic behavior very slowly. Thus the formula
derived above e(x) ≅ x / {(ln(x) − µ) / σ2 -1}, will provide a somewhat better approximation than
the formula e(x) ≅ x σ2 / ln(x), until one reaches truly immense loss sizes.
Those curves with heavier tails have the Mean Excess Loss increase with x. Comparing the Mean
Excess Loss provides useful information on the fit of the curves to the data. Small differences in
the tail of the distributions that may not have been evident in the graphs of the Distribution Function,
are made evident by graphing the Mean Excess Loss.
I have found the Mean Excess Loss particularly useful at distinguishing between the tails of the
different distributions when using them to estimate Excess Ratios.
Below is shown for various Gamma distributions the behavior of the Mean Excess Loss as the loss
size increases. For α = 1, the Exponential Distribution has a constant mean excess loss equal to θ,
in this case 1000. For α > 1, the mean excess loss decreases to θ.
For a Gamma Distribution with α < 1, the mean excess loss increases to θ.
The tail of a Gamma Distribution is similar to that of an Exponential Distribution with the same θ.
e(x)
2000
1800
1600
1400 Gamma Distribution, alpha > 1
1200
1000
800 Gamma Distribution, alpha < 1
600
x
5000 10000 15000 20000 25000 30000
For the Weibull with τ = 1, the Exponential Distribution has a constant mean excess loss equal to θ,
in this 1000. For τ > 1, the mean excess loss decreases to 0.
For a Weibull Distribution with τ < 1, the mean excess loss increases to infinity less than linearly.
e(x)
7000
6000 Weibull Distribution, tau < 1
5000
4000
3000
2000
1000
Weibull Distribution, tau > 1
x
1000 2000 3000 4000 5000 6000
The Pareto and LogNormal Distribution each have heavy tails. However, the Pareto Distribution has
its mean excess loss increase linearly, while that of the LogNormal increases slightly less than
linearly. Thus the Pareto has a heavier (righthand) tail than the LogNormal, which in turn has a heavier
tail than the Weibull.239
40000
Pareto
30000
LogNormal
20000
10000
Weibull, tau < 1
x
10000 20000 30000 40000
All three distributions have mean residual lives that increase to infinity. Note that it takes a while for
the mean residual life of the Pareto to become larger than that of the LogNormal.240
239
The mean excess losses are graphed for a Weibull Distribution with θ = 500 and τ = 1/2, a LogNormal Distribution
with µ = 5.5 and σ = 1.678, and a Pareto Distribution with α = 2 and θ = 1000.
All three distributions have a mean of 1000.
240
In this case, the Pareto has a larger mean residual life for loss sizes 15 times the mean and greater.
The mean residual life of an Inverse Gaussian increases to a constant; e(x)→ 2µ2/θ.
Thus the Inverse Gaussian has a tail that is somewhat similar to a Gamma Distribution.
Here is e(x) for an Inverse Gaussian with µ = 1000 and θ = 500:

e(x)
3500
3000
2500
2000
1500
x
10000 20000 30000 40000 50000
Here is the e(x) for an Inverse Gaussian with µ = 1000 and θ = 2500:
850
800
750
700
650
x
5000 10000 15000
In this case, e(x) initially decreases and then increases towards: (2)(10002 )/2500 = 800.
Determining the Tail Behavior of the Mean Excess Loss:
The fact that the Mean Excess Loss is the integral of S(t) from x to infinity divided by S(x), is the
basis for a method of determining the behavior of e(x) as x approaches infinity. One applies
LʼHospitalʼs Rule twice.
∞
lim
x→ ∞
e(x) = lim
x→ ∞ ∫x S(t) dt /S(x) = xlim
→∞
S(x) / f(x) = lim -f(x) / f´(x).
x→ ∞
For example for the Gamma distribution:

f(x) = θ−αxα−1 e−x/ θ / Γ(α). fʼ(x) = (α−2)θ−αxα−2 e−x/ θ / Γ(α) − θ−(α+1)xα−1 e−x/ θ / Γ(α).
lim e(x) = lim -f(x) / f´(x) = lim 1 / {1/θ −(α −1)/ x} = θ.

x→ ∞ x→ ∞ x→ ∞
When the Mean Excess Loss increases to infinity, it may be useful to look at the limit of x/e(x). Again
one applies LʼHospitalʼs Rule twice.
∞
lim
x→ ∞
x/e(x) = lim
x→ ∞
xS(x) / ∫x S(t) dt = xlim
→∞
(-f(x)x + S(x))/ -S(x) =
lim (-f(x)-xf´(x) -f(x) )/ f(x) = lim {-xf´(x) / f(x) } - 2.

x→ ∞ x→ ∞
For example for the LogNormal distribution:

f(x) = ξ / x , where ξ = exp[-0.5 ({ln(x) − µ} / σ)2] /{σ 2 π )
f´(x) = -ξ / x2 - {(ln(x) − µ) / (xσ2 )} (ξ / x).
lim x/e(x) = lim -xf´(x) / f(x) - 2 = lim {1 + (ln(x) − µ) / σ2 } - 2 ≅ ln(x) / σ2 .
x→ ∞ x→ ∞ x→ ∞
Thus for the LogNormal distribution the Mean Excess Loss increases to infinity, but a little less
quickly than linearly: e(x) ≈ x / {(ln(x) − µ)/σ2 -1} ≅ x σ2 / ln(x).
As another example, for the Burr distribution:

f(x) = αγxγ−1θ−γ(1+(x/θ)γ)−(α + 1). f´(x) = (γ−1)f(x)/x - (α+1)γ xγ−1 f(x) / (1+(x/θ)γ).
lim x/e(x) = lim -xf´(x)/ f(x) - 2 = lim -(γ−1) +(α+1)γ xγθ−γ /(1+(x/θ)γ) - 2
x→ ∞ x→ ∞ x→ ∞
= −γ +1 + αγ + γ - 2 = αγ - 1.
The Mean Excess Loss for the Burr Distribution increases to infinity approximately linearly:
e(x) ≈ x / (αγ-1), provided αγ > 1.
Moments, CV, and the Mean Excess Loss:
When the relevant moments are finite and the distribution has support x > 0, then one can compute
the moments of the distribution in terms of the mean excess loss, e(x).241
We have E[X] = e(0).242 We will now show how to write the second moment in terms of an integral
of the mean excess loss and the survival function.
As shown in a previous section:
∞
E[X2 ] = 2
∫t=0 S(t) t dt
Note that the integral of S(t)/E[X] from x to infinity is the excess ratio, R(x), and thus
Rʼ(x) = -S(x)/E[X]. Using this fact and integration by parts:
∞ ∞
t =∞
E[X2 ]/E[X] = 2
∫0 t S(t) / E[X] dt = -2t R(t)
t =0
] ∫0
+ 2 R(t) dt .
For a finite second moment, tR(t) goes to zero as x goes to infinity, therefore:243
∞ ∞
∫0 ∫0
E[X2 ] = 2E[X] R(t) dt = 2 S(t)e(t) dt .
241
Since e(x) determines the distribution, it follows that e(x) determines the moments if they exist.
242
The numerator of e(0) is the losses excess of zero, i.e. all the losses.
The denominator of e(0) is the number of losses larger than 0, i.e., the total number of losses.
The support of the distribution has been assumed to be x > 0.
243
I have used the fact that E[X]R(t) = S(t)e(t). Both are the losses excess of t.
Exercise: For a Pareto Distribution, what is the integral from zero to infinity of S(x)e(x)?
[Solution: S(x) = (1+x/θ)−α. e(x) = (x+θ)/(α-1) = (1+ x/θ) θ/(α-1).
S(x)e(x) = (1+x/θ)−(α−1) θ/(α-1).

∞ ∞ t= ∞
∫0 S(t) e(t) dt = θ/(α-1)

∫0 (1+ t / θ)- (α - 1) dt = {θ / (α - 1)} θ(α - 2) (1+ t / θ) - (α - 2)
t =0
]
= θ2 / {(α-1)(α-2)}.
Comment: The integral is one half of the second moment for a Pareto, consistent with the above
result.]
Assume that the first two moments are finite and the distribution has support x > 0, and
e(x) > e(0) = E[X] for all x. Then:
∞ ∞ ∞
∫0 ∫0 ∫0
E[X2 ] = 2 S(t) e(t) dt > 2 S(t) E[X] dt = 2E[X] S(t) dt t = 2E[X]E[X].
⇒ E[X2 ] > 2E[X2 ]. ⇒ E[X2 ] / E[X]2 > 2. ⇒ CV2 = E[X2 ] / E[X]2 - 1 > 1. ⇒ CV > 1.
When the first two moments are finite and the distribution has support x > 0, then
if e(x) > e(0) = E[X] for all x, then the coefficient of variation is greater than one.244
Note that e(x) > e(0), if e(x) is monotonically increasing.
Examples where this result applies are the Gamma α < 1, Weibull τ < 1, Transformed Gamma with
α < 1 and τ < 1, and the Pareto.245 In each case CV > 1. Note that each of these distributions is
heavier-tailed than an Exponential, which has CV = 1.
While in the tail e(x) for the LogNormal approaches infinity, it is not necessarily true for the
LogNormal that e(x) > e(0) for all x. The mean excess loss of the LogNormal can decrease before it
finally increases as per x/ln(x) in the tail.
For example, here is a graph of the Mean Excess Loss for a LogNormal with µ = 1 and σ = 0.5:
244
See Section 3.4.5 in Loss Models.
245
For α > 2, so that the CV of the Pareto exists.
e(x)
x
20 40 60 80 100
In any case, the CV of the LogNormal is: exp(σ2 ) - 1. Thus for the LogNormal,
CV < 1 for σ < ln(2) ≅ 0.82, while CV > 1 for σ > ln(2) ≅ 0.82.
When the first two moments are finite and the distribution has support x > 0, then
if e(x) < e(0) = E[X] for all x, then CV < 1. Note that e(x) < e(0), if e(x) is monotonically decreasing.
Examples where this result applies are the Gamma α > 1, Weibull τ > 1, and the Transformed
Gamma with α > 1 and τ > 1. In each case CV < 1. Note that each of these distributions is
lighter-tailed than an Exponential, which has CV = 1.
One can get similar results to those above for higher moments.
Exercise: Assuming the relevant moments are finite and the distribution has support x>0,
express the integral from zero to infinity of S(x)xn , in terms of moments.
[Solution: One applies integration by parts and the fact that dS(x)/dx = - f(x):
∞ ∞
t =∞
∫0 S(t) tn dt = S(t) tn + 1 / (n +1)t =] 0 + ∫0 f(t) tn + 1 / (n +1) dt = E[Xn+1]/(n+1).
Where Iʼve used the fact, that if the n+1st moment is finite, then S(x)xn+1 must go to zero as x
Comment: For n = 0 one gets the result that the mean is the integral of the survival function
from zero to infinity. For n = 1 one gets the result used above, that the integral of xS(x) from
zero to infinity is half of the second moment.]
Exercise: Assuming the relevant moments are finite and the distribution has support x>0,
express the integral from zero to infinity of S(x)e(x)xn , in terms of moments.
[Solution: S(x)e(x) = R(x)E[X]. Then one applies integration by parts, differentiating R(t) and
integrating tn . Since the integral of S(t)/E[X] from x to infinity is R(x), the derivative of R(x) is
-S(x)/E[X].
∞ ∞
∫0 S(t) e(t) tn dt = E[X] ∫0 R(t) tn dt =
∞
t= ∞
E[X] {R(t) tn + 1 / (n +1) }
t =0
] + E[X] ∫0 (S(t) / E[X]) tn + 1 / (n +1) dt
∞
= (1/(n+1)) ∫0 S(t) tn + 1 dt = {1/(n+1)} E[Xn+2]/(n+2) = E[Xn+2] / {(n+1)(n+2)}.]
Where Iʼve used the result of the previous exercise and the fact, that if the n+2nd moment is
finite, then R(x)xn+1 must go to zero as x approaches infinity.]
Thus we can express moments, when they exist, either as integrals of S(t)e(t) times powers of t,
integrals of R(t) times powers of t, or as S(t) times powers of t.
Assuming the relevant moments are finite and the distribution has support x>0, then
if e(x) > e(0) = E[X] for all x, we have for n ≥ 1:
∞ ∞
E[Xn+1 ]/ {n(n+1)} =
∫0 S(t) e(t) tn -1 dt > E[X] ∫0 S(t) tn -1 dt = E[X]E[Xn]/n.
Thus if e(x) > e(0) for all x, E[Xn+1] > (n+1)E[X]E[Xn ], n≥1.
For n = 1 we get a previous result: E[X2 ] > 2E2 [X].
For n = 2 we get: E[X3 ] > 3E[X]E[X2 ].
Conversely, if e(x) < e(0) for all x, E[Xn+1] < (n+1)E[X]E[Xn ], n ≥1.
Equilibrium Distribution:
Given that X follows a distribution with survival function SX, for x > 0, then Loss Models defines the
density of the corresponding “equilibrium distribution” as:246
g(y) = SX(y) / E[X], y > 0.
Exercise: Demonstrate that the above is actually a probability density function.

[Solution: SX(y) / E[X] ≥ 0.
∞ ∞
∫0 ∫0 SX(y) dy = E[X]/ E[X] = 1. ]

1
{SX(y) / E[X]} dy =
E[X]
Exercise: If severity is Exponential with mean θ = 10, what is the density of the corresponding
equilibrium distribution?
[Solution: g(y) = SX(y) / E[X] = exp(-y/10)/ 10.]
In general, if the severity is Exponential, then the corresponding equilibrium distribution is also
Exponential with the same mean.
Exercise: If severity is Pareto, with α = 5 and θ = 1000, what is the corresponding equilibrium
distribution?
[Solution: g(y) = SX(y) / E[X] = (1 + y/1000)-5 / 250.
This is the density of another Pareto Distribution, but with α = 4 and θ = 1000.]
The distribution function of the corresponding equilibrium distribution is the loss elimination ratio of the
severity distribution:
y y
∫0 ∫0 SX(y) dy = E[X ∧ y]/ E[X] = LER(y).

1
G(y) = {SX(y) / E[X]} dy =
E[X]
Therefore the survival function of the corresponding equilibrium distribution is the excess ratio of the
246
For example, if severity is Pareto, the excess ratio, R(x) = {θ/(θ+x)}α−1, which is the survival function
for a Pareto with the same scale parameter and a shape parameter one less. Thus if severity is
Pareto, (with α > 1), then the distribution of the corresponding equilibrium distribution is also Pareto,
with the same scale parameter and shape parameter of α - 1.
The mean of the corresponding equilibrium distribution is:247

∞ ∞
E[X2 ]
∫0 ∫0
1
y {SX(y) / E[X]} dy = y SX(y) dy = .
E[X] 2 E[X]
The second moment of the corresponding equilibrium distribution is:

∞ ∞
E[X3 ]
∫0 ∫0
1
y2 {SX(y) / E[X]} dy = y2 SX(y) dy = .
E[X] 3 E[X]
Exercise: If severity is Pareto, with α = 5 and θ = 1000, what are the mean and variance of the
corresponding equilibrium distribution?
[Solution: The mean of the Pareto is: 1000/4 = 250. The second moment of the Pareto is:
2(10002 ) / {(5-1)(5-2)} = 166,667. The third moment of the Pareto is:
6(10003 ) / {(5-1)(5-2)(5-3)} = 250 million. The mean of the corresponding equilibrium distribution is:
E[X2 ]/ {2E[X]} = 166,667 / 500 = 333.33.
The second moment of the corresponding equilibrium distribution is: E[X3 ] / {3E[X]} =
250 million/ 750 = 333,333. Thus the variance of the corresponding equilibrium distribution is:
333,333 - 333.332 = 222,222. Alternately, the corresponding equilibrium distribution is a Pareto
Distribution, but with α = 4 and θ = 1000. This has mean: 1000/3 = 333.33, second moment:
2(10002 )/{(3)(2)} = 333,333, and variance: 333,333 - 333.332 = 222,222.]
The hazard rate of the corresponding equilibrium distribution is:

density of the corresponding equilibrium distribution S(x) / E[X] S(x)
= = =
survival function the corresponding equilibrium distribution R(x) E[X] R(x)
S(x)
= 1 / e(x).
expected losses excess of x
The hazard rate of the corresponding equilibrium distribution is the inverse of the mean excess loss.
247
Curtate Expectation of Life:
The mean excess loss is mathematically equivalent to what is called the complete expectation of life,
e° x ⇔ e(x).
Exercise: Five individuals live 53.2, 66.3, 70.8, 81.0, and 83.5 years.
What is the observed e(60) = e° for this group?
60
[Solution: (6.3 + 10.8 + 21 + 23.5)/4 = 15.4.]
If instead we ignore any fractions of a year lived, then we get what is called the curtate expectation of
life, ex. Loss Models does not cover the curtate expectation of life
Exercise: Five individuals live 53.2, 66.3, 70.8, 81.0, and 83.5 years.
What is the observed e60 for this group?
[Solution: (6 + 10 + 21 + 23)/4 = 15.0.]
Since we are ignoring any fractions of a year lived, ex ≤ e° x .
On average we are ignoring about 1/2 year of life, therefore, ex ≅ e° x - 1/2.
Just as we can write e(x) = e° x in terms of an integral of the Survival Function:248

∞
e° x = ∫x S(t)dt / S(x)
one can write the curtate expectation of life in terms of a summation of Survival Functions:249
∞ ∞
ex = ∑ S(t) / S(x). e0 = ∑ S(t) .
t = x+1 t=1
Exercise: Determine ex for an Exponential Distribution with mean θ.

∞ ∞ ∞
[Solution: ex = ∑ S(t) / S(x) = ∑ e- (x + t)/ θ / e- x/ θ = ∑ e- t / θ = e−1/θ / (1 - e−1/θ) = 1/(e1/θ - 1).
t = x+1 t=1 t=1
Comment: ex = 1/(e1/θ - 1) ≅ 1/{1/θ + 1/(2θ2)} = θ/{1 + 1/(2θ)} ≅ θ{1 - 1/(2θ)} = θ - 1/2.]

248
See equation 3.5.2 in Actuarial Mathematics, with tp x = S(x+t)/S(x).
249
See equation 3.5.7 in Actuarial Mathematics, with kp x = S(x+k)/S(x).
For example, for θ = 10, ex = 1/(e0.1 - 1) = 9.508.
This compares to e° x = θ = 10.
Exercise: Determine e0 for a Pareto Distribution with θ = 1 and α = 2.
[Solution: e0 = S(1) + S(2) + S(3) + ... = (1/2)2 + (1/3)2 + (1/4)2 + (1/5)2 + ... = π2/6 - 1 = 0.645.
Comment: e(0) = E[X] = θ/(α - 1) = 1.]

Problems:
33.1 (2 points) Assume you have a Pareto distribution with α = 5 and θ = $1000.
What is the Mean Excess Loss at $2000?
A. less than $500
E. at least $800
33.2 (1 point) Assume you have a distribution F(x) = 1 - e-x/666.

What is the Mean Excess Loss at $10,000?
A. less than $500
E. at least $800
33.3 (3 points) The random variables X and Y have joint density function
f(x, y) = 60,000,000 x exp(-10x2 ) / (100 + y)4 , 0 < x < ∞, 0 < y < ∞.
Determine the Mean Excess Loss function for the marginal distribution of Y evaluated at Y = 1000.
A. less than 200
E. at least 500
33.4 (1 point) Which of the following distributions would be most useful for modeling the age at
death of humans?
A. Gamma B. Inverse Gaussian C. LogNormal D. Pareto E. Weibull
33.5 (1 point) Given the following empirical mean excess losses for 500 claims:
x 0 5 10 15 25 50 100 150 200 250 500 1000
e(x) 15.6 16.7 17.1 17.4 17.6 18.0 18.2 18.3 18.3 18.4 18.5 18.5
Which of the following distributions would be most useful for modeling this data?
A. Gamma with α > 1 B. Gamma with α < 1 C. Pareto
D. Weibull with τ > 1 E. Weibull with τ < 1
33.6 (2 points) You have a Pareto distribution with parameters α and θ.

If e(1000) / e(100) = 2.5, what is θ?
A. 100 B. 200 C. 300 D. 400 E. 500
For the following three questions, assume you have a LogNormal distribution with parameters
µ = 11.6, σ = 1.60.
33.7 (3 points) What is the Mean Excess Loss at $100,000?

33.8 (1 point) What is the average size of those losses greater than $100,000?
33.9 (2 points) What percent of the total loss dollars are represented by those losses greater than
$100,000?
A. less than 0.91
E. at least 0.94
40, 41, 48, 49, 53, 60, 63, 78, 85, 103, 124, 140, 192, 198, 227, 330, 361, 421, 514, 546, 750,
864, 1638.
What is the (empirical) Mean Excess Loss at 500?
A. less than 350
E. at least 380

• The annual frequency of ground up losses is Negative Binomial with r = 4 and β = 1.3.
• The sizes of ground up losses follow a Pareto Distribution with α = 3 and θ = 5000.
• There is a franchise deductible of 1000.
33.11 (2 points) Determine the insurerʼs average payment per nonzero payment.
(A) 2500 (B) 3000 (C) 3500 (D) 4000 (E) 4500
33.12 (2 points) Determine the insurer's expected annual payments.

(A) 8,000 (B) 9,000 (C) 10,000 (D) 11,000 (E) 12,000
33.13 (3 points) F is a continuous size of loss distribution on (0, ∞).

LER(x) is the corresponding loss elimination ratio at x.
A. F(x) ≥ LER(x) for all x > 0.
B. F(x) ≥ LER(x) for all x > c, for some c > 0.
C. F(x) ≥ LER(x) for all x > 0, if and only if e(x) ≥ e(0).
D. F(x) ≥ LER(x) for all x > 0, if and only F is an Exponential Distribution.
E. None of A, B, C or D is true.
33.14 (2 points) The size of loss follows an Exponential Distribution with θ = 5.

The largest integer contained in each loss is the amount paid for that loss.
For example, a loss of size 3.68 results in a payment of 3.
What is the expected payment?
A. 4.46 B. 4.48 C. 4.50 D. 4.52 E. 4.54
33.15 (2 points) Losses follow a Single Parameter Pareto distribution with θ = 1000 and α > 1.
Determine the ratio of the Mean Excess Loss function at x = 3000 to the Mean Excess Loss function
at x = 2000.
A. 1 B. 4/3 C. 3/2 D. 2
33.16 (3 points) For a Gamma Distribution with α = 2, what is the behavior of the mean excess loss
e(x) as x approaches infinity?
33.17 (1 point) For a Pareto Distribution with α > 1:

E[X - 1000 | X > 1000] = 1.3 E[X - 700 | X > 700].
Determine θ.
33.18 (4, 5/86, Q.58) (2 points) For a certain machine part, the Mean Excess Loss e(x) varies as
follows with the age (x) of the part:
Age x e(x)
5 months 12.3 months
10 18.6
20 34.3
50 69.1
Which of the following continuous distributions best fits this pattern of Mean Excess Loss?
A. Exponential B. Gamma (α > 1) C. Pareto
D. Weibull (τ > 1) E. Normal
33.19 (160, 5/87, Q.1) (2.1 points) You are given the following survival function:
S(x) = (b - x/a)1/2, 0 ≤ x ≤ k. The median age is 75. Determine e(75).
(A) 8.3 (B) 12.5 (C) 16.7 (D) 20.0 (E) 33.3
33.20 (4B, 5/92, Q.14) (1 point) Which of the following statements are true about the Mean
Excess Loss function e(x)?
1. If e(x) increases linearly as x increases, this suggests that a Pareto model may be appropriate.
2. If e(x) decreases as x increases, this suggests that a Weibull model may be appropriate.
3. If e(x) remains constant as x increases, this suggests that an exponential model may be
appropriate.
A. 1 only B. 2 only C. 1 and 3 only D. 2 and 3 only E. 1, 2, and 3
33.21 (4B, 5/93, Q.24) (2 points) The underlying distribution function is assumed to be the
following: F(x) = 1 - e-x/10, x ≥ 0
Calculate the value of the Mean Excess Loss function e(x), for x = 8.
A. less than 7.00
E. at least 13.00
33.22 (4B, 5/94, Q.4) (2 points) You are given the following information from an unknown size of
loss distribution for random variable X:
Size k ($000s) 1 3 5 7 9
Count of X ≥ k 180 118 75 50 34
Sum of X ≥ k 990 882 713 576 459
If you are using the empirical Mean Excess Loss function to help you select a distributional family for
fitting the empirical data, which of the following distributional families should you attempt to fit first?
A. Pareto B. Gamma C. Exponential D. Weibull E. Lognormal
33.23 (4B, 5/95, Q.21) (3 points) Losses follow a Pareto distribution, with parameters θ and
α > 1. Determine the ratio of the Mean Excess Loss function at x = 2θ to the Mean Excess Loss
function at x = θ.
A. 1/2 B. 1 C. 3/2 D. 2
33.24 (4B, 11/96, Q.22) (2 points) The random variable X has the density function
f(x) = e-x/λ/λ , 0 < x < ∞, λ > 0.
Determine e(λ), the Mean Excess Loss function evaluated at λ.
A. 1 B. λ C. 1/λ D. λ/e E. e/λ
33.25 (4B, 5/97, Q.13) (1 point) Which of the following statements are true?
1. Empirical Mean Excess Loss functions are continuous.
2. The Mean Excess Loss function of an exponential distribution is constant.
3. If it exists, the Mean Excess Loss function of a Pareto distribution is decreasing.
A. 2 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
33.26 (4B, 5/98, Q.3) (3 points) The random variables X and Y have joint density function
f(x, y) = exp(-2x - y/2) 0 < x < ∞, 0 < y < ∞.
Determine the Mean Excess Loss function for the marginal distribution of X evaluated at
X = 4.
A. 1/4 B. 1/2 C. 1 D. 2 E. 4
33.27 (4B, 11/98, Q.6) (2 points) Loss sizes follow a Pareto distribution, with parameters
α = 0.5 and θ = 10,000. Determine the Mean Excess Loss at 10,000.
A. 5,000 B. 10,000 C. 20,000 D. 40,000 E. ∞
• The random variable X follows a Pareto distribution, as per Loss Models, with parameters θ = 100
and α = 2.
• The mean excess loss function, eX(k), is defined to be E[X - k I X ≥ k].
Determine the range of eX(k) over its domain of [0, ∞ ).
A. [0, 100] B. [0, ∞) C. 100 D. [100, ∞) E. ∞
• The random variable X follows a Pareto distribution, as per Loss Models, with parameters θ = 100
and α = 2 .
Z = min(X, 500).
Determine the range of eZ(k) over its domain of [0, 500].
A. [0, 150] B. [0, ∞) C. [100, 150] D. [100, ∞) E. [150, ∞)
33.30 (SOA3, 11/04, Q.24) (2.5 points) The future lifetime of (0) follows a two-parameter Pareto
distribution with θ = 50 and α = 3.
Calculate e° 20 .
(A) 5 (B) 15 (C) 25 (D) 35 (E) 45
33.31 (CAS3, 5/05, Q.4) (2.5 points) Well-Traveled Insurance Company sells a travel insurance
policy that reimburses travelers for any expenses incurred for a planned vacation that is canceled
because of airline bankruptcies. Individual claims follow a Pareto distribution with α = 2 and θ = 500.
Because of financial difficulties in the airline industry, Well-Traveled imposes a limit of $1,000 on each
claim. If a policyholder's planned vacation is canceled due to airline bankruptcies and he or she has
incurred more than $1,000 in expenses, what is the expected non-reimbursed amount of the claim?
A. Less than $500
B. At least $500, but less than $1,000
E. $2,000 or more
33.32 (SOA M, 5/05, Q.9 & 2009 Sample Q.162) (2.5 points) A loss, X, follows a 2-parameter
Pareto distribution with α = 2 and unspecified parameter θ. You are given:
E[X - 100 | X > 100] = (5/3) E[X - 50 | X > 50].
Calculate E[X - 150 | X > 150].
(A) 150 (B) 175 (C) 200 (D) 225 (E) 250
33.33 (CAS3, 11/05, Q.10) (2.5 points)

You are given the survival function s(x) as described below:
• s(x) = 1 - x/40 for 0 ≤ x ≤ 40.
• s(x) is zero elsewhere.
Calculate e° 25, the complete expectation of life at age 25.
A. Less than 7.7
B. At least 7.7 , but less than 8.2
E. At least 9.2
33.34 (CAS3, 5/06, Q.38) (2.5 points) The number of calls arriving at a customer service center
follows a Poisson distribution with λ = 100 per hour. The length of each call follows an exponential
distribution with an expected length of 4 minutes. There is a $3 charge for the first minute or any
fraction thereof and a charge of $1 per minute for each additional minute or fraction thereof.
Determine the total expected charges in a single hour.
A. Less than $375
E. At least $750
33.1. D. e(2000) = {mean - E[X ∧ 2000]} / S(2000) = (250 - 246.9) / .00411 = $754
Alternately, for the Pareto, e(x) = (θ + x) / (α - 1) = 3000 / 4 = $750.
Alternately, a Pareto truncated and shifted at 2000, is another Pareto with α = 5 and
θ = 1000 + 2000 = 3000. e(2000) is the mean of this new Pareto: 3000/(5 - 1) = $750.
∞ ∞
Alternately, e(2000) = ∫0 tp2000 dt = ∫0 S(2000 + t) / S(2000) dt =
∞
∫0 (2000 + θ)α / (2000 + θ + t)α dt = (2000 + θ) / (α - 1) = 3000 / 4 = $750.
33.2. C. For the exponential distribution the mean excess loss is a constant; it is equal to the mean.
The mean in this case is θ = $666.
33.3. E. X and Y are independent since the support doesnʼt depend on x or y and the density can
be factored into a product of terms each just involving x and y.
f(x, y) = 60,000,000 x exp(-10x2 ) / (100 + y)4 = {20 x exp(-10x2 )} {3000000 / (100 + y)4 }.
The former is the density of a Weibull Distribution with θ = 1/ 10 and τ = 2. The latter is the density
of a Pareto Distribution with α = 3 and θ = 100. When one integrates from x = 0 to ∞ in order to get
the marginal distribution of y, one is left with just a Pareto, since the Weibull integrates to unity and
the Pareto is independent of x. Thus the marginal distribution is just a Pareto, with parameters α = 3
and θ = 100. Thus e(y) = (θ + y)/(α - 1) = (100 + y)/(3 - 1).
e(1000) = 1100/2 = 550.
33.4. E. Of these distributions, only the Weibull (for τ >1) has mean residual lives decline to zero.
The Weibull (for τ > 1) has the force of mortality increase as the age approaches infinity, as is
observed for humans. The other distributions have the force of mortality decline or approach a
positive constant as the age increases.
33.5. B. The empirical mean residual lives seem to be increasing towards a limit of about 18.5 as x
approaches infinity. This is the behavior of a Gamma with alpha less than 1.
The other distributions given all exhibit different behaviors than this.
33.6. E. For the Pareto Distribution: e(x) = (E[X] - E[X ∧ x])/S(x) = (x + θ)/(α - 1).
e(1000)/e(100) = (1000 + θ)/(100 + θ) = 2.5. ⇒ θ = 500.

Comment: One can not determine α from the given information.
33.7. C. E[X ∧ x] = exp(µ + σ2/2) Φ[(lnx - µ − σ2)/σ] + x {1 - Φ[(lnx - µ)/σ]}.

E[X ∧ 100,000] = exp(12.88)Φ[-1.65] + (100,000) {1 - Φ[-0.05]} =
(392,385)(1 - 0.9505) + (100,000)(0.5199) = 71,413. For the LogNormal,
E[X] = exp(µ + σ2/2) = exp(12.88) = 392,385. For the LogNormal,
F(x) = Φ[{ln(x) − µ} / σ]. F(100000) = Φ[-0.05] = (1 - 0.5199). Therefore,
e(100000) = {E[X] - E[X ∧ 100000] } / {1 - F(100000)} = (392,385 - 71,413) / .5199 ≅ $617,000.
Alternately, for the LogNormal distribution,
e(x) = exp(µ + σ2/2){1 - Φ[(lnx − µ − σ2)/σ] / {1 - Φ[(lnx − µ)/σ]} - x.
For µ = 11.6, σ = 1.60, e(100000) = exp(12.88)(1 - Φ[-1.65]) / {1 - Φ[-.05]} - 100000 =
(392,385)(0.9505) / 0.5199 - 100,000 = $617 thousand.
33.8. D. The size of those claims greater than $100,000 = $100,000 + e(100000). But from the
previous question e(100000) ≅ $617,000. Therefore, the solution ≅ $717,000.
33.9. E. Use the results from the previous two questions. F(100000) = Φ[-0.05] = (1 -.5199).
Thus, S(100000) = 0.5199. E[X] = exp(µ + σ2/2) = exp(12.88) = 392,385.

Percent of the total loss dollars represented by those losses greater than $100,000 =
S(100,000) {size of those claims greater than $100,000} / E[X]
= (0.5199)(717,000)/392,385 = 0.95.
Alternately, the losses represented by the small losses are:
E[X ∧ 100,000] - S(100,000)(100,000) = 71,413 − 51,990 = 19,423.
Divide by the mean of 392,385 and get 0.049 of the losses are from small claims.
Thus the percentage of losses from large claims is: 1 - .049 = 0.95.
33.10. C. Each claim above 500 contributes its excess above 500 and then divide by the number
of claims greater than 500. e(500) = {14 + 46+250+364+1138}/ 5 = 362.4.
33.11. D. With a franchise deductible the insurer pays the full value of every large loss and pays
nothing for small losses. Therefore, the Pareto Distribution has been truncated from below at 1000.
The mean of a distribution truncated and shifted from below at 1000 is e(1000) ⇒
the mean of a distribution truncated from below at 1000 is: e(1000) + 1000.
For a Pareto Distribution e(x) = (E[X] - E[X ∧ x])/S(x) = (x + θ)/(α - 1).
e(1000) = (1000 + 5000)/(3 - 1) = 3000. e(1000) + 1000 = 4000.
33.12. E. For the Pareto Distribution, S(1000) = {5000/(5000 + 1000)}3 = 0.5787.

Mean frequency = rβ = (4)(1.3) = 5.2. Expected # of nonzero payments = (0.5787)(5.2) = 3.009.
From the previous solution, average nonzero payment is 4000.
Expected annual payments = (3.009)(4000) = 12,036.
Alternately, with a franchise deductible of 1000 the payment is 1000 more than that for an ordinary
deductible for each large loss, and thus the average payment per loss is:
E[X] - E[X ∧ 1000] + 1000S(1000) =
(5000/2) - (5000/2){1 - (5000/6000)2 } + (1000)(5000/6000)3 = 2315.
Expected annual payments = (5.2)(2315) = 12,038.
33.13. C. F(x) - LER(x) = 1 - S(x) - {1 - S(x)e(x)/E[X]} = {S(x)/E[X]} {e(x) - E[X]} =

{S(x)/E[X]} {e(x) - e(0)}. Therefore, F(x) ≥ LER(x) ⇔ e(x) ≥ e(0).
Alternately, e(x)/e(0) = {E[(X - x)+]/S(x)}/E[X] = {E[(X - x)+]/E[X]}/S(x) = R(x)/S(x).
Therefore, e(x) ≥ e(0). ⇔ R(x) ≥ S(x). ⇔ LER(x) = 1 - R(x) ≤ 1 - F(x) = S(x).
Comment: For an Exponential Distribution, e(x) = e(0) = θ, and therefore F(x) = LER(x).
For a Pareto Distribution with α > 1, e(x) increases linearly, and therefore F(x) > LER(x).
33.14. D. The expected payment is the curtate expectation of life at zero.

∞
e0 = ∑ S(t) = S(1) + S(2) + S(3) + ... = e-1/5 + e-2/5 + e-3/5 + ...
t=1
= e-1/5/(1 - e-1/5) = 1/(e0.2 - 1) = 4.517.

Comment: Approximately 1/2 less than the mean of 5.
33.15. C. e(x) = {E[X] - E[X ∧ x]} / S(x) = (αθ/(α−1) - {αθ/(α-1) - θα/((α−1)xα−1)}) / {(θ/x)α}
= x/(α-1). e(3000)/e(2000) = 3000/2000 = 3/2.
33.16. The value of the scale parameter θ does not affect the behavior, for simplicity set θ = 1.
∞
f(x) = x e-x, x > 0. S(x) = ∫x f(t) dt = x e-x + e-x.
∞
e(x) = ∫x S(t) dt / S(x) = (x e-x + 2e-x) / (x e-x + e-x) = 1 + 1/(1+x).
Thus, as x approaches infinity, e(x) decreases to a constant.
Comment: In this case the limit of e(x) is one, while in general it is θ.
In general, for α > 1, e(x) decreases to a constant, while h(x) increases to a constant.
For α < 1, e(x) increases to a constant, while h(x) decreases to a constant.
For α = 1, we have an Exponential, and e(x) and h(x) are each constant.
For α = 2 and θ = 1, h(x) = f(x) / S(x) = x e-x / (x e-x + e-x) = x / (x + 1).
33.17. E[X - x | X > x] = e(x). For the Pareto Distribution, e(x) = (x+θ)/(α-1).
e(1000) / e(700) = (1000 + θ) / (700 + θ) = 1.3. ⇒ θ = 300.

Comment: Similar to SOA M, 5/05, Q.9 (2009 Sample Q.162).
33.18. C. The mean residual life increases approximately linearly, which indicates a Pareto.
Comment: The Pareto has a mean residual life that increases linearly. The Exponential has a constant
mean residual life. For a Gamma with α > 1 the mean residual life decreases towards a horizontal
asymptote. For a Weibull with τ > 1 the mean residual life decreases to zero.
For a Normal Distribution the mean residual life decreases to zero.
33.19. C. We want S(0) = 1. b = 1. ⇒ We want S(k) = 0. ⇒ k = a.
0.5 = S(75) = (1 - 75/a)1/2. ⇒ a = 100.

100 100
e(75) = ∫75 S(x) dx /S(75) = 75∫ (1 - x / 100)1/ 2 dx / 0.5 = (25/3)/0.5 = 16.7.
33.20. E. 1. T. Mean Residual Life of the Pareto Distribution increases linearly.

2. T. The Weibull Distribution for τ > 1 has the mean residual life decrease (to zero.)
3. T. The mean residual life for the Exponential Distribution is constant.
33.21. C. For the Exponential Distribution, e(x) = mean = θ = 10.
33.22. C. The empirical mean residual life is calculated as:

e(k) = ($ excess of k) / (# claims > k) = {($ on claims > k) / (# claims > k)} - k =
(average size of those claims of size greater than k) - k.
Size k ($000) 1 3 5 7 9
# claims ≥ k 180 118 75 50 34
Sum of X ≥ k 990 882 713 576 459
average size of those claims of size > k 5.500 7.475 9.507 11.520 13.500
e(k) 4.500 4.475 4.507 4.520 4.500
Since the mean residual life is approximately constant, one would attempt first to fit an exponential
distribution, since it has a constant mean residual life.
33.23. C. For the Pareto, e(x) = (x+θ) / (α-1). e(2θ) = 3θ / (α-1). e(θ) = 2θ / (α-1).
e(2θ) / e(θ) = 3/2.
Comment: If one doesnʼt remember the formula for the mean residual life of the Pareto, it is a longer
question. In that case, one can compute: e(x) = (mean - E[X ∧ x]) / S(x).
33.24. B. The mean residual life of the Exponential is a constant equal to its mean, here λ.
33.25. A. 1. False. The empirical mean residual life is the ratio of the observed losses excess of
the limit divided by the number of observed claims greater than the limit. While the numerator is
continuous, the denominator is not. For example, assume you observe 3 claims of sizes 2, 6 and
20. Then e(5.999) = {(20-5.999) + (6-5.999)}/2 = 7.001, while e(6.001) = (20-6.001)/1 = 13.999.
The limit of e(x) as x approaches 6 from below is 7, while the limit of e(x) as x approaches 6 from
above is 14. Thus the empirical mean residual life is discontinuous at 6. 2. True. 3. False. For the
Pareto Distribution, the mean residual life increases (linearly).
Comment: A function is continuous at a point x, if and only if the limits approaching x from below and
above both exist and are each equal to the value of the function at x. The empirical mean residual life
is discontinuous at points at which there are observed claims, since so are the Empirical Distribution
Function and the tail probability. In contrast, the empirical Excess Ratio and empirical Limited
Expected Value are continuous. The numerator of the Excess Ratio is the observed losses excess
of the limit; the denominator is the total observed losses. This numerator is continuous, while this
denominator is independent of x. Thus the empirical Excess Ratio is continuous. The numerator of
the empirical Limited Expected Value is the observed losses limited by the limit; the denominator is
the total number of observed claims. This numerator is continuous, while this denominator is
independent of x. Thus the empirical Limited Expected Value is continuous.
33.26. B. The marginal distribution of X is obtained by integrating with respect to y:

∞ ∞ y =∞
f(x) = ∫0 exp(-2x - y / 2) dy = e-2x ∫0 exp(-y / 2) dy y = e - 2x (-2e- y / 2) ]
y=0
= 2e-2x.
Thus the marginal distribution is an Exponential with a mean of 1/2. It has a mean residual life of 1/2,
regardless of x.
33.27. E. The mean excess loss for the Pareto only exists for α > 1.
For α ≤ 1 the relevant integral is infinite.
∞
e(x) = ∫x t f(t) dt / S(x) - x.
∞ ∞
For a Pareto with α = 0.5: ∫x t f(t) dt = ∫x t (0.5θ0.5) (θ + t)- 1.5 dt .
For large t, the integrand is proportional to t t-1.5 = t-0.5, whose integral approaches infinity as the
upper limit of the integral approaches infinity. (The integral of t-0.5 is 2t0.5.)
Alternately, e(x) = (E[X] - E[X ∧ x]) / S(x). The limited expected value E[X ∧ x] is finite (it is less than
x), as is S(x). However, for α ≤ 1, the mean E[X] (does not exist or) is infinite. Therefore, so is the
mean excess loss.
Comment: While choice E is the best of those available, in my opinion a better answer might have
been that the mean excess loss does not exist.
33.28. D. For the Pareto Distribution e(k) = (k+θ)/(α-1) = k + 100.

Therefore, as k goes from zero to infinity, e(k) goes from 100 to infinity.
33.29. A. Z is for data censored at 500, corresponding to a maximum covered loss of 500.
eZ(k) = (dollars of loss excess of k) / S(k) = (E[X ∧ 500] - E[X ∧ k])/ S(k).
E[X ∧ x] = {θ/(α-1)} {1-(θ/(θ+x))α−1}, for the Pareto.
Thus (E[X ∧ 500] - E[X ∧ k])/ S(k) = {100/(100+k) - 100/600} {(100+k)/100}2 =
(100 + k) {600 - (100 + k)} /600 = (100 + k)(500 - k)/600.
eZ(0) = 83.33. eZ(500) = 0.
Setting the derivative equal to zero: (400-2k)/ 600 = 0. k = 200. eZ(200) = 150.
Thus the maximum over the interval is 150, while the minimum is 0.
Therefore, as k goes from zero to 500, eZ(k) is in the interval [0, 150].
33.30. D. E[X] = θ/(α-1) = 50/(3-1) = 25.
E[X ∧ 20] = {θ/(α-1)}{1 - (θ/(θ+x))α−1} = (25){1 - (50/(50 + 20))2 } = 12.245.

S(20) = {50/(50 + 20)}3 = 0.3644.
e(20) = (E[X] - E[X ∧ 20])/S(20) = (25 - 12.245) / 0.3644 = 35.
Alternately, for the Pareto, e(x) = (x + θ)/(α - 1). e(20) = (20 + 50)/(3 - 1) = 35.
33.31. D. Given someone incurred more than $1,000 in expenses, the expected
non-reimbursed amount of the claim is the mean residual life at e(1000).
For the Pareto, e(x) = (x + θ) / (α - 1). e(1000) = (1000 + 500)/(2 - 1) = 1500.
Alternately, (E[X] - E[X ∧ 1000]) / S(1000) = {500 - 500(1 - 500/1500)} / (500/1500)2 = 1500.
Alternately, a Pareto truncated and shifted from below is another Pareto, with parameters α and
θ + d. Therefore, the unreimbursed amounts follow a Pareto Distribution with parameters α = 2 and
θ = 500 + 1000 = 1500, with mean 1500/(2 - 1) = 1500.
33.32. B. e(d) = E[X - d | X > d] = (E[X] - E[X ∧ d]) / S (d) = {θ - θ(1 - θ/(θ + d))} / {θ/(θ + d)}2
= θ + d.
The given equation states e(100) = (5/3)e(50). ⇒ 100 + θ = (5/3)(50 + θ). ⇒ θ = 25.
E[X - 150 | X > 150] = e(150) = 150 + 25 = 175.
Comment: A Pareto truncated and shifted from below is another Pareto, with parameters α and
θ + d. ⇒ e(x) = (x + θ)/(α - 1).
40 40
33.33. A. e(25) = ∫25 S(x)dx / S(25) = 25∫ (1 - x / 40) dx / (1 - 25/40) = 2.8125 / 0.375 = 7.5.
Alternately, the given survival function is a uniform distribution on 0 to 40.
At age 25, the future lifetime is uniform from 0 to 15, with an average of 7.5.
Comment: DeMoivreʼs Law with ω = 40.
33.34. D. The charge per call of length t is: 3 + 1(if t > 1) + 1(if t > 2) + 1(if t > 3) + 1(if t > 4) + ...
The expected charge per call is: 3 + S(1) + S(2) + S(3) + ... = 3 + e-1/4 + e-2/4 + e-3/4 + ...
= 3 + e-1/4/(1 - e-1/4) = 6.521. (100)(6.521) = 652.1.
Comment: Ignore the possibility that a call lasts exactly an integer, since the Exponential is a
continuous distribution. Then the cost of a call is: 3 + curtate lifetime of a call.
For example, if the call lasted 4.6 minutes, the cost is: 3 + 4 = 7.
The expected cost of a call is: 3 + curtate expected lifetime of a call.
∞
∞
e0 = ∑ S(t) =
t=1
∑ e− t / 4 = e-1/4/(1 - e-1/4) = 1/(e1/4 - 1) = 3.521. (100)(3 + 3.521) = 652.1.
t=1
e0 ≅ e(0) - 1/2 = E[X] - 1/2 = 4 - 1/2 = 3.5. (100)(3 + 3.5) = 650, close to the exact answer.
2016-C-2, Loss Distributions, §34 Hazard Rate HCM 10/21/15, Page 569
Section 34, Hazard Rate
f(x)
The hazard rate, force of mortality, or failure rate, is defined as: h(x) = x ≥ 0.
S(x)
h(x) can be thought of as the failure rate of machine parts. The hazard rate can also be interpreted as
the force of mortality = probability of death / chance of being alive.250 For a given age x, h(x) is the
density of the deaths, divided by the number of people still alive.
Exercise: F(x) = 1 - e-x/10. What is the hazard rate?

[Solution: h(x) = f(x)/S(x) = (e-x/10/10)/e-x/10 = 1/10.]
The hazard rate determines the survival (distribution) function and vice versa.
d ln(S(x)) / dx = dS(x)/dx / S(x) = -f(x) / S(x) = - h(x).
Thus h(x) = -d ln(S(x)) / dx.
x
S(x) = exp - [ ∫ h(t) dt ]. 251
x
S(x) = exp[-H(x)], where H(x) = ∫ h(t) dt . H is called the cumulative hazard rate.252
0
Note that h(x) = f(x)/S(x) ≥ 0.
∞
S(∞) = exp[-H(∞)] = 0 ⇔ H(∞) = ∫0 h(t) dt = ∞.
h(x) ≥ 0, thus H(x) is nondecreasing and therefore, S(x) = exp[-H(x)] is nonincreasing.

H(x) usually increases, while S(x) decreases, although H(x) and S(x) can be constant on an interval.
Since H(0) = 0, S(0) = exp[-0] = 1.
A function h(x) defined for x > 0 is a legitimate hazard rate, in other words it corresponds
to a legitimate survival function, if and only if h(x) ≥ 0 and the integral of h(x) from 0 to
infinity is infinite, in other words H(∞) = ∞.
250
This is equation 3.2.13 in Actuarial Mathematics by Bowers et. al.
251
The lower limit of the integral should be the lower end of the support of the distribution.
252
“See Mahlerʼs Guide to Survival Analysis.”
As in Life Contingencies, one can write the distribution function and the density function in terms of
the force of mortality h(t):253
x x
[ ∫ h(t) dt].
F(x) = 1 - exp - [ ∫ h(t) dt].
f(x) = h(x) exp -
0 0
Exercise: h(x) = 1/10. What is the distribution function?

[Solution: F(x) = 1 - e-x/10, an Exponential Distribution with θ = 10.]
h constant ⇔ the Exponential Distribution, with constant hazard rate of 1/θ = 1/mean.
The Exponential is the only continuous distribution with a constant hazard rate, and therefore constant
mean excess loss.
The Force of Mortality for various distributions is given below:
Distribution Force of Mortality or Hazard Rate Behavior as x approaches ∞

Exponential 1/θ h(x) constant
τ xτ − 1
Weibull τ < 1, h(x) → 0. τ > 1, h(x) → ∞.
θτ
α
Pareto h(x) → 0, as x → ∞
θ + x
α γ xγ −1
Burr254 h(x) → 0, as x → ∞
θγ + xγ
Single Parameter Pareto α/x h(x) → 0, as x → ∞
Gompertzʼs Law255 Bcx h(x) → ∞, as x → ∞
Makehamʼs Law256 A + Bcx h(x) → ∞, as x → ∞

253
See equation 3.2.14 in Actuarial Mathematics, np x = chance of living n years for those who have reached age x =
{1-F(x+t)} / {1-F(x)} = S(x+t) / S(x) = exp (- integral from x to x+n of µt).
254
The Loglogistic is a special case of the Burr with α = 1.
255
As per Life Contingencies. See Actuarial Mathematics Section 3.7.
256
As per Life Contingencies. See Actuarial Mathematics Section 3.7.
Exercise: h(x) = 3 / (10 + x). What is the distribution function?

x
⎛ 10 ⎞ 3
∫0
[Solution: F(x) = 1 - exp[ - 3 / (10 + t) dt ] = 1 - exp[-3 {ln(10 +x) - ln(10)}] = 1 -
⎝ 10 + x⎠
.
A Pareto Distribution with α = 3 and θ = 10.]
Relationship to Other Items of Interest:
One can obtain the Mean Excess Loss from the Hazard Rate:
t x x t x
∫0 ∫0
S(t) / S(x) = exp[- h(s) ds ] / exp[- h(s) ds ] = exp[ ∫0 h(s) ds - ∫0 h(s) ds ] = exp[ ∫t h(s) ds ].
∞ ∞ ∞ x
e(x) = ∫x S(t) dt / S(x) = ∫x S(t)/ S(x) dt = ∫x exp[∫t h(s) ds] dt .
∞ ∞
Thus, e(x) = ∫x exp[H(x) - H(t)] dt = exp[H(x)] ∫x exp[-H(t)] dt. 257
Exercise: Given a hazard rate of h(x) = 4 / (100+x), what is the mean excess loss, e(x)?
x
[Solution: H(x) = ∫0 h(t) dt = 4 ln(100+x).
∞ t= ∞
∫x ]
3
e(x) = (100+x)4 1/ (100 + t)4 dt = -(100 + x)4 / {3(100 + t)} = (100 + x ) / 3.
t=x
Comment: This is Pareto Distribution with α = 4 and θ = 100; e(x) = (θ+x) / (α-1), h(x) = α / (θ + x). ]
x
257
H(x) =
∫0 h(t) dt. H is called the cumulative hazard rate and is used in Survival Analysis. S(x) = Exp[-H(x)].
One can obtain the Hazard Rate from the Mean Excess Loss as follows:
∞
e(x) = ∫x S(t) dt / S(x).
∞
-S2 (x) ∫x
+ f(x) S(t) dt
Thus e′(x) = = -1 + f(x) e(x) / S(x) = -1 + e(x)h(x).

S2 (x)
1 + e' (x)
Thus, h(x) = .
e(x)
Exercise: Given a mean excess loss of e(x) = (100+x) / 3, what is the hazard rate, h(x)?
[Solution: eʼ(x) = 1/3 . h(x) = {1 + e′(x)} / e(x) = (4/3) {3/(100+x)} = 4 / (100+x).
Comment: This is Pareto Distribution with α = 4 and θ = 100.
It has e(x) = (θ+x) / (α-1) and, h(x) = α / (θ + x). ]
Finally, one can obtain the Survival Function from the Mean Excess Loss as follows:
1 + e' (x)
h(x) = .
e(x)
x x x
∫0 ∫0 ∫0
1
H(x) = h(t) dt = {1/ e(t) + e'(t) / e(t)} dt = dt + ln[e(x)/e(0)].
e(t)
∫0
1
Thus, S(x) = exp[-H(x)] = {e(0)/e(x)} exp[- dt ].
e(t)
For example, for the Pareto, e(x) = (θ + x) / (α - 1).

x x
∫0 ∫0 θ + t dt = (α -1) ln[(θ + x)/θ].

1 1
dt = (α - 1)
e(t)
x
(θ + x) / (α - 1) ⎛ θ + x ⎞ - (α - 1) ⎛ θ ⎞ α
∫0
1
Thus, S(x) = {e(0)/e(x)} exp[- dt ] = = .
e(t) (θ + 0)/ (α - 1) ⎝ θ ⎠ ⎝ θ + x⎠
Tail Behavior of the Hazard Rate:
As was derived previously, the limit as x approaches infinity of e(x) is equal to the limit as x
approaches infinity of: S(x) / f(x) = 1/h(x).
lim e(x) = lim 1/h(x).

x→ ∞ x→ ∞
Thus an increasing mean excess loss, e(x), is equivalent to a decreasing hazard or failure rate, h(x),
and vice versa.
Since the force of mortality for the Pareto, α / (θ+x), decreases with age,
the Mean Excess Loss increases with age.258
The more quickly the hazard rate declines, the faster the Mean Excess Loss increases and the
heavier the righthand tail of the distribution.
For the Weibull, if τ > 1 then the hazard rate, τxτ−1/θτ, increases and thus the Mean Excess Loss
decreases.
For the Weibull with τ < 1, the hazard rate decreases and thus the Mean Excess Loss increases.
Lighter Righthand Tail e(x) decreases h(x) increases
Heavier Righthand Tail e(x) increases h(x) decreases
Hazard Rate of the LogNormal Distribution:
⎡ ln(x) − µ ⎤
S(x) = 1 - Φ ⎢ ⎥⎦ .
⎣ σ
exp[-y2 / 2]
For very large y, 1 - Φ[y] ≅ φ[y] / y = .259
y 2π
( ln(x)- µ)2
Therefore, taking y =
ln(x) − µ
, for very large x, S(x) ≅
[
exp -
2σ2 ] σ
.
σ 2π ln(x) − µ
258
Unlike the situation for mortality of humans. For Gompertzʼs or Makehamʼs Law with B > 0 and c > 0, the
force of mortality increases with age, so the Mean Excess Loss decreases with age. For the Pareto, if α ≤ 1, then the
force of mortality is sufficiently small so that there exists no mean; for α ≤ 1 the mean lifetime is infinite.
259
See the Handbook of Mathematical Functions, by Abramowitz, et. al., p. 932.
( ln(x)
- µ)2
For the LogNormal Distribution, f(x) =
[
exp -
2σ2 ] .
x σ 2π
f(x) 1 ln(x) - µ ln(x) - µ

Therefore, for very large x, h(x) = ≅ = .
S(x) xσ σ x σ2
Thus, as x approaches infinity, the hazard rate h(x) approaches zero.260 However, this behavior may
start to take over only in the extreme righthand tail, beyond most of the probability. Prior to the
extreme righthand tail, the behavior of h(x) depends on σ.
For µ = 6 and σ = 0.1, here is a graph of h(x), out to the 99.9th percentile:
hazard rate
0.06
0.05
0.04
0.03
0.02
0.01
size
350 400 450 500 550
For small σ, h(x) increases, prior to the extreme righthand tail.
260
x increases more quickly than ln(x).
As discussed previously, for the LogNormal Distribution, the mean excess loss e(x) approaches to infinity as x
approaches infinity. Therefore, it follows that h(x) approaches zero as x approaches infinity.
For µ = 6 and σ = 0.4, here is a graph of h(x), out to the 99.9th percentile:
hazard rate
0.006
0.005
0.004
0.003
0.002
0.001
size
400 600 800 1000 1200 1400
For medium σ, h(x) is relatively flat above the median, prior to the extreme righthand tail.261
For µ = 6 and σ = 1, here is a graph of h(x), out to the 99.9th percentile:
hazard rate
0.0020
0.0015
0.0010
0.0005
size
2000 4000 6000 8000
For large σ, beyond the very low values of x, h(x) decreases as x increases.
261
The median is: exp[µ] = exp[6] = 403.
Problems:
34.1 (1 point) You are given the following three loss distributions.
1. Gamma α = 1.5, θ = 0.5
2. LogNormal µ = 0.1, σ = 0.6
3. Weibull θ = 1.4, τ = 0.8
For which of these distributions does the hazard rate increase?
A. 1 B. 2 C. 3 D. 1,2,3 E. None of A, B, C, or D

e(x) = 72 - 0.8x, 0 < x < 90.
34.2 (2 points) What is the force of mortality at 50?

A. less than 0.003
E. at least 0.006
34.3 (3 points) What is the Survival Function at 60?

A. 72% B. 74% C. 76% D. 78% E. 80%
34.4 (2 points) What is 50p 30?

A. 62% B. 64% C. 66% D. 68% E. 70%
34.5 (1 point) What is the mean lifetime?

A. 72 B. 74 C. 76 D. 78 E. 80
34.6 (2 points) What is the probability density function at 40?

A. less than 0.002
E. at least 0.005
34.7 (2 points) For a LogNormal distribution with parameters µ = 11.6, σ = 1.60,

what is the hazard rate at $100,000?
A. less than 4 x 10-6
B. at least 4 x 10-6 but less than 5 x 10-6
C. at least 5 x 10-6 but less than 6 x 10-6
D. at least 6 x 10-6 but less than 7 x 10-6
E. at least 7 x 10-6
34.8 (2 points) The hazard rate h(x) = 0.002 + 1.1x / 10,000, x > 0. What is S(50)?
(A) 0.76 (B) 0.78 (C) 0.80 (D) 0.82 (E) 0.84
34.9 (1 point) If the hazard rate of a certain machine part is a constant 0.10 for t > 0, what is the Mean
Excess Loss at age 25?
A. less than 10
E. at least 25
34.10 (2 points) Losses follow a Weibull Distribution with θ = 25 and τ = 1.7.

What is the hazard rate at 100?
A. less than 0.05
E. at least 0.20
34.11 (2 points) For a loss distribution where x ≥ 10, you are given:
i) The hazard rate function: h(x) = z/x, for x ≥ 10.
ii) A value of the survival function: S(20) = .015625.
Calculate z.
A. 2 B. 3 C. 4 D. 5 E. 6
34.12 (2 points) For a loss distribution where x ≥ 0, you are given:

i) The hazard rate function: h(x) = z x2 , for x ≥ 0.
ii) A value of the distribution function: F(5) = 0.1175.
Calculate z.
A. 0.002 B. 0.003 C. 0.004 D. 0.005 E. 0.006

Ground up losses follow a Weibull Distribution with τ = 2 and θ = 10.
34.13 (3 points) There is an ordinary deductible of 5.

What is the hazard rate of the per loss variable?
34.14 (3 points) There is an ordinary deductible of 5.

What is the hazard rate of the per payment variable?
34.15 (3 points) There is a franchise deductible of 5.

What is the hazard rate of the per loss variable?
34.16 (3 points) There is a franchise deductible of 5.

What is the hazard rate of the per payment variable?
34.17 (3 points) X follows a Gamma Distribution with parameters α = 3 and θ.

Determine the form of the hazard rate h(x). What is the behavior of h(x) as x approaches infinity?
34.18 (2 points) The hazard rate h(x) = 4/(100 + x), x > 0. What is S(50)?
(A) 0.18 (B) 0.20 (C) 0.22 (D) 0.24 (E) 0.26
34.19 (2 points) Determine the hazard rate at 300 for a Loglogistic Distribution with γ = 2 and
θ = 100.
(A) 0.005 (B) 0.006 (C) 0.007 (D) 0.008 (E) 0.009
34.20 (1 point) F(x) is a Pareto Distribution.

If the hazard rate h(x) is doubled for all x, what is the new distribution function?
34.21 (2 points) You are using a Weibull Distribution to model the length of time workers remain
unemployed. Briefly discuss the implications of different values of the parameter τ.
34.22 (2 points) S(0) = 1. S(x) = h(x). Determine the form of S(x).
1
34.23 (2 points) F(x) = , -∞ < x < ∞, s > 0.
1 + exp[-(x - µ)/ s]
Find the form of the hazard rate as a function of x.
34.24 (5 points) The following data is from the mortality study of Edmond Halley published in 1693.
x 0 5 10 15 20 25 30
S(x) 1 0.710 0.653 0.622 0.592 0.560 0.523
x 35 40 45 50 55 60 65
S(x) 0.481 0.436 0.387 0.335 0.282 0.232 0.182
x 70 75 80 85
S(x) 0.131 0.078 0.034 0
Using this data, graph the hazard rate as a function of age.
34.25 (2 points) Robots can fail due to two independent decrements: Internal and External.
(Internal includes normal wear and tear. External includes accidents.)
Assuming no external events, a robotʼs time until failure is given by a Pareto DIstribution with α = 2
and θ = 10.
Assuming no internal events, a robotʼs time until failure is given by a Pareto DIstribution with α = 4
and θ = 10.
At time = 5, what is the hazard rate of the robotʼs time until failure?
A. 0.35 B. 0.40 C. 0.45 D. 0.50 E. 0.55
34.26 (1 point) F(x) is a Weibull Distribution.

If the hazard rate h(x) is doubled for all x, what is the new distribution function?
34.27 (2 point) Two independent variables X and Y have hazard rates as a function of time of hX(t)
and hY(t). Given that the minimum of X and Y is m, what is the probability that X < Y?
34.28 (2 points) For a Gamma Distribution, determine the behavior of the hazard rate h(x)
as x approaches infinity.
34.29 (160, 5/87, Q.2) (2.1 points)

The exponential distribution defined by S(t) = e-t/2, t ≥ 0, is truncated from above at t = 4.
Calculate the hazard rate of the truncated distribution at t = 2.
1 e2 e e2
(A) (B) (C) 1/2 (D) (E)
2 (e - 1) 2 (e2 -1) 2 (e - 1) 2 (e - 1)
34.30 (160, 11/87, Q.7) (2.1 points) Which of the following are true for all values of x > 0?
S(x) - S(x +1)
I. For every exponential survival model h(x) = .
x+1
∫ S(t)dt
x
II. For every survival model f(x) ≤ h(x).

III. For every survival model f(x) ≤ f(x + 1).
(A) I and II only (B) I and III only (C) II and III only (D) I, II and III
(E) The correct answer is not given by (A), (B), (C), or (D).
34.31 (160, 11/87, Q.8) (2.1 points) The force of mortality for a survival distribution is given by:
1
h(x) = , 0 < x < 100. Determine e(64).
2 (100 - x)
(A) 16 (B)18 (C) 20 (D) 22 (E) 24
34.32 (160, 11/87, Q.15) (2.1 points) For a Weibull distribution as per Loss Models,
the hazard rate at the median age is 0.05. Determine the median age.
(A) τ ln(2) (B) τ ln(20) (C) 20τ ln(2) (D) 2ln(τ) (E) 2τ ln(20)
34.33 (160, 11/88, Q.2) (2.1 points) A survival model is represented by the following probability
density function: f(t) = (0.1)(25 - t)-1/2; 0 ≤ t ≤ 25. Calculate the hazard rate at 20.
(A) 0.05 (B) 0.10 (C) 0.15 (D) 0.20 (E) 0.25
34.34 (160, 11/89, Q.1) (2.1 points) For a survival model, you are given:
(i) The hazard rate is h(t) = 2/(w - t), 0 ≤ t < w.
(ii) T is the random variable denoting time of failure.
Calculate Var(T).
(A) w2 /18 (B) w2 /12 (C) w2 /9 (D) w2 /6 (E) w2 /3
34.35 (160, 11/89, Q.2) (2.1 points) S(x) = 0.1(100 - x)1/2, 0 ≤ x ≤ 100.
Calculate the hazard rate at 84.
(A) 1/32 (B) 1/24 (C) 1/16 (D) 1/8 (E) 1/4
34.36 (160, 5/90, Q.4) (2.1 points) You are given that y is the median age for the survival function
S(x) = 1 - (x/100)2 , 0 ≤ x ≤ 100. Calculate the hazard rate at y.
(A) 0.013 (B) 0.014 (C) 0.025 (D) 0.026 (E) 0.028
X has a uniform distribution from 0 to 10.
Y = 4X2 . Calculate the hazard rate of Y at 4.
(A) 0.007 (B) 0.014 (C) 0.021 (D) 0.059 (E) 0.111
1
(i) A survival model has a hazard rate h(x) = , 0 ≤ x ≤ ω.
3 (ω - x)
(ii) The median age is 63.
Calculate the mean residual life at 63, e(63).
(A) 4.5 (B) 6.8 (C) 7.9 (D) 9.0 (E) 13.5
(i) For a Weibull distribution with parameters θ and τ, the median age is 22.
(ii) At the median age, the value of the Hazard Rate Function is 1.26.
Calculate τ.
(A) 37 (B) 38 (C) 39 (D) 40 (E) 41
Losses follow a Loglogistic Distribution, with parameters γ = 3 and θ = 0.1984.
For what value of x is the hazard rate, h(x), a maximum?
(A) 0.18 (B) 0.20 (C) 0.22 (D) 0.25 (E) 0.28
A sample of 10 batteries in continuous use is observed until all batteries fail. You are given:
(i) The times to failure (in hours) are 14.1, 21.3, 23.2, 26.2, 29.8, 31.3, 35.7, 39.4, 39.2, 45.3.
(ii) The composite hazard rate function for these batteries is defined by
h(t) = λ, 0 ≤ t < 27.9,
h(t) = λ + β(t - 27.9)2 , t ≥ 27.9.
(iii) S(15) = 0.7634, S(30) = 0.5788.
Calculate the absolute difference between the cumulative hazard rate at 34, H(34), based on the
assumed hazard rate, and the cumulative hazard rate at 34, HO(34), based on the observed data.
(A) 0.03 (B) 0.06 (C) 0.08 D) 0.11 (E) 0.14
34.42 (CAS3, 11/03, Q.19) (2.5 points) For a loss distribution where x ≥ 2, you are given:
i) The hazard rate function: h(x) = z2 / (2x), for x ≥ 2.
ii) A value of the distribution function: F(5) = 0.84.
Calculate z.
A. 2 B. 3 C. 4 D. 5 E. 6
34.43 (CAS3, 11/04, Q.7) (2.5 points)

Which of the following formulas could serve as a force of mortality?
1. µx = BCx, B > 0, C > 1
2. µx = a(b+x)-1, a > 0, b > 0
3. µx = (1+x)-3, x≥0

• X has density f(x), where f(x) = 500,000 / x3 , for x > 500 (single-parameter Pareto with α = 2).
• Y has density g(y), where g(y) = y e-y/500 / 250,000 (gamma with α = 2 and θ = 500).
1. X has an increasing mean residual life function.
2. Y has an increasing hazard rate.
3. X has a heavier tail than Y based on the hazard rate test.
A. 1 only. B. 2 only. C. 3 only. D. 2 and 3 only. E. All of 1, 2, and 3.
Note: I have rewritten this exam question.
34.45 (CAS3, 5/05, Q.30) (2.5 points) Acme Products will offer a warranty on their products for x
years, where x is the largest integer for which there is no more than a 1% probability of product
failure.
Acme introduces a new product with a hazard function for failure at time t of 0.002t.
Calculate the length of the warranty that Acme will offer on this new product.
A. Less than 3 years B. 3 years C. 4 years D. 5 years E. 6 or more years
34.46 (CAS3, 11/05, Q.11) (2.5 points) Individuals with Flapping Gum Disease are known to
have a constant force of mortality µ. Historically, 10% will die within 20 years.
A new, more serious strain of the disease has surfaced with a constant force of mortality equal to 2µ.
Calculate the probability of death in the next 20 years for an individual with this new strain.
A. 17% B. 18% C. 19% D. 20% E. 21%
34.47 (SOA M, 11/05, Q.13) (2.5 points) The actuarial department for the SharpPoint
Corporation models the lifetime of pencil sharpeners from purchase using a generalized DeMoivre
model with s(x) = (1 - x/ω)α, for α > 0 and 0 < x ≤ ω.
A senior actuary examining mortality tables for pencil sharpeners has determined that the
original value of α must change. You are given:
(i) The new complete expectation of life at purchase is half what it was previously.
(ii) The new force of mortality for pencil sharpeners is 2.25 times the previous force of
mortality for all durations.
(iii) ω remains the same.
Calculate the original value of α.
(A) 1 (B) 2 (C) 3 (D) 4 (E) 5
34.48 (CAS3, 5/06, Q.10) (2.5 points) The force of mortality is given as:
µ(x) = 2 / (110 - x), for 0 ≤ x < 110.
Calculate the expected future lifetime for a life aged 30.
A. Less than 20
E. At least 50
34.49 (CAS3, 5/06, Q.11) (2.5 points) Eastern Digital uses a single machine to manufacture digital
widgets. The machine was purchased 10 years ago and will be used continuously until it fails.
The failure rate of the machine, u(x), is defined as:
u(x) = x2 / 4000, for x ≤ 4000 , where x is the number of years since purchase.
Calculate the probability that the machine will fail between years 12 and 14, given that the machine
has not failed during the first 10 years.
A. Less than 1.5%
E. At least 7.5%
34.50 (CAS3, 5/06, Q.16) (2.5 points) The force of mortality is given as:
µ(x) = 1 / (100 - x), for 0 ≤ x < 100.
Calculate the probability that exactly one of the lives (40) and (50) will survive 10 years.
A. 9/30 B. 10/30 C. 19/30 D. 20/30 E. 29/30
34.1. A. For a Gamma with α > 1, hazard rate increases (toward a horizontal asymptote given by
an exponential.) For α > 1 the Gamma is lighter-tailed than an Exponential. For a LogNormal the
hazard rate decreases. For a Weibull with τ < 1, the hazard rate decreases; for τ < 1 the Weibull is
heavier-tailed than an Exponential. Alternately, the hazard rate increases if and only if the mean
excess loss decreases. For a Gamma with α > 1, mean excess loss decreases (toward a horizontal
asymptote given by an exponential.) For a LogNormal mean excess loss increases. For a Weibull
with τ < 1, mean excess loss increases.
34.2. E. h(x) = {1 + e′(x)} / e(x) = (1 - 0.8)/(72 - 0.8x) = 1/(360 - 4x). h(50) = 0.00625.
x x
t= x
∫0 ∫0
1
34.3. C. S(x) = exp[- h(t) dt ] = exp[- dt ] = exp[ln(360 - 4t) / 4] ] =
360 - 4t t =0
exp[ln(360 - 4x)/4 - ln(360)/4] = exp[(1/4)ln(1 - x/90)] = (1 - x/90)1/4. S(60) = 0.760.
34.4. B. 50p 30 = probability that a life aged 30 lives at least 50 more years = S(80)/S(30) =
(1/9)1/4 / (2/3)1/4 = 0.639.
34.5. A. mean lifetime = e(0) = 72. Alternately,

90 90 x = 90
E[X] = ∫0 S(t) dt = ∫0 (1 - x / 90)1/ 4 dt = -72(1 - x / 90) 5/ 4]
x= 0
= 72.
34.6. D. f(x) = -dS(x)/dx = (1 - x/90)-3/4 / 360. f(40) = 0.0043.
34.7. B. For the LogNormal, F(x) = Φ[{ln(x) - µ} / σ]. F(100,000) = Φ[-0.0544] =
(1 - 0.5217). Therefore, S(100,000) = 0.5217. f(x) = exp[-0.5 ({ln(x)−µ} /σ)2] / {xσ 2 π }.
f(100,000) = exp[-0.5({ln(100,000)-11.6}/1.6)2 )]/{ 160,000 2 π } = 2.490 x 10-6.

h(100,000) = f(100,000) / S(100,000) = 2.490 x 10-6 / 0.5217 = 4.77 x 10- 6.
x x
34.8. C. S(x) = exp(- ∫0 h(t) dt) = exp(- ∫0 0.002 + 1.1t / 10000 dt )
= exp[-0.002x + (1 - 1.1x) / (10,000 ln[1.1])].
S(50) = exp(-0.1 - 0.1221) = 0.80.
Comment: This is an example of Makehamʼs Law of mortality.
34.9. B. A constant rate of hazard implies an Exponential Distribution, with

θ = 1 / the hazard rate. The mean excess loss is θ at all ages. Thus the mean excess loss at age 25
(or any other age) is: 1 / 0.10 = 10.
Comment: Note, one can write down the equation:
hazard rate = chance of failure / probability of still working = F´(x) / S(x) = -S´(x) / S(x) = 0.10 and
solve the resulting differential equation: S´(x) = -0.1 S(x), for S(x) = e-0.1x or F(x) = 1 - e-0.1x.
34.10. D. h(x) = f(x) / S(x) = {τ(x/θ)τ exp[-(x/θ)τ] /x} / exp[-(x/θ)τ] = τxτ−1/θτ.
h(100) = (1.7)(1000.7)/(251.7) = 0.179.
x
34.11. E. S(x) = exp(- ∫10 h(t) dt = exp[-z{ln(x) - ln(10)}] = (10/x)z, for x ≥ 10.
0.015625 = S(20) = (1/2)z. ⇒ z = 6.
Comment: S(x) = (10/x)6 , for x ≥ 10. A Single Parameter Pareto, with α = 6 and θ = 10.
Similar to CAS3, 11/03, Q.19.
x
34.12. B. S(x) = exp(- ∫0 h(t) dt) = exp[-z x3/3] , for x ≥ 0.
0.8825 = S(5) = exp[-z 53 /3]. ⇒ z = -ln(.8825) 3/125 = 0.0030.
Comment: S(x) = exp[-(x/10)3 ], for x ≥ 0. A Weibull Distribution, with θ = 10 and τ = 3.
34.13. For the Weibull, f(x) = τ(x/θ)τ exp(-(x/θ)τ) / x = x exp(-(x/10)2 )/ 50.
S(x) = exp(-(x/θ)τ) = exp(-(x/10)2 ). h(x) = f(x)/S(x) = x/50.

The per loss variable Y is 0 for x ≤ 5, and is X - 5 for x > 5.
Y has a point mass of F(5) at 0. Thus fY(y) is undefined at zero. fY(y) = fX(y+5) for y > 0.
S Y(0) = SX(5). SY(y) = SX(y+5) for y > 0.
hY(y) undefined at zero. hY(y) = fY(y)/SY(y) = fX(y+5)/SX(y+5) = hX(y+5) = (y+5)/50 for y > 0.
Comment: Similar to Example 8.1 in Loss Models. Loss Models uses the notation YP for the per
payment variable and YL for the per loss variable.
34.14. The per payment variable Y is undefined for x ≤ 5, and is X - 5 for x > 5.
fY(y) = fX(y+5)/SX(5) for y > 0. SY(y) = SX(y+5)/SX(5) for y > 0.
hY(y) = fY(y)/SY(y) = fX(y+5)/SX(y+5) = hX(y+5) = (y+5)/50 for y > 0.
34.15. The per loss variable Y is 0 for x ≤ 5, and is X for x > 5.

Y has a point mass of FX(5) at 0. Thus fY(y) is undefined at zero.
fY(y) = 0 for 0 < y ≤ 5. fY(y) = fX(y) for y > 5. SY(y) = SX(5) for 0 < y ≤ 5. SY(y) = SX(x) for y > 5.
hY(y) undefined at zero. hY(y) = 0 for 0 < y ≤ 5.
hY(y) = fY(y)/SY(y) = fX(x)/SX(x) = hX(y) = y/50 for y > 5.
34.16. The per payment variable Y is undefined for x ≤ 5, and is X for x > 5.
fY(y) = fX(x)/SX(5) for y > 5. SY(y) = SX(x)/SX(5) for y > 5.
hY(y) = fY(y)/SY(y) = fX(y)/SX(y) = hX(y) = y/50 for y > 5.
Comment: Similar to Example 8.2 in Loss Models.
34.17. f(x) = 0.5 x2 e-x/θ / θ3. S(x) = 1 - Γ(3 ; x/θ) = e-x/θ + (x/θ)e-x/θ + (x/θ)2 e-x/θ/2.
h(x) = f(x)/S(x) = x2 /(2θ3 + 2θ2x + θx2 ).
h(x) = 1/(2θ3/x2 + 2θ2/x + θ), which increases to 1/θ as x approaches infinity.

Comment: I have used Theorem A.1 in Appendix A of Loss Models, in order to write out the
incomplete Gamma Function for an integer parameter. One can also verify that dS(x)/dx = -f(x).
A Gamma Distribution for α > 1 is lighter tailed than an Exponential (α = 1), and the hazard rate
increases to 1/θ, while the mean excess loss decreases to θ.
x x
∫0 ∫0 100 + t dt ] =
4
34.18. B. S(x) = exp[ - λ( t) dt ] = exp[ -
4
⎛ 100 ⎞
exp[-4{ln(100+x) - ln(100)}] = . S(50) = (100/150)4 = 0.198.
⎝ 100 + x ⎠
Comment: This is a Pareto Distribution, with α = 4 and θ = 100.
34.19. B. F(x) = (x/θ)γ / {1 + (x/θ)γ}. F(300) = 32 / (1 + 32 ) = 0.9. S(300) = 0.1.
f(x) = γ (x/θ)γ / (x{1 + (x/θ)γ}2 ). f(300) = (2) (32 ) / {(300)(1 + 32 )2 } = 0.0006.

h(300) = f(300)/S(300) = 0.0006/0.1 = 0.006.
Comment: For the Loglogistic, h(x) = f(x)/S(x) = γ xγ−1 θ−γ / {1+ (x/θ)γ}.
For γ = 2 and θ = 100: h(x) = 0.0002 x / {1 + (x/100)2 }.
The hazard rate increases and then decreases:
hazard rate
0.010
0.008
0.006
0.004
0.002
x
200 400 600 800 1000
α 2α
34.20. For the Pareto the hazard rate is: h(x) = f(x) / S(x) = . ⇒ 2 h(x) = .
θ + x θ + x
This is the hazard rate for another Pareto Distribution with parameters 2α and θ.
34.21. If τ = 1, then we have an Exponential with constant hazard rate.

The probability of the period of unemployment ending is independent of how long the worker has
been out of work.
If τ < 1, then we have decreasing hazard rate. As a worker remains out of work for longer periods of
time, his chance of finding a job declines. This could be due to exhaustion of possible employment
opportunities, some employers being unwilling to hire the longterm unemployed, or the worker
becoming discouraged.
If τ > 1, then we have increasing hazard rate. As a worker remains out of work for longer periods of
time, his chance of going back to work increases. This could be due to worry about exhaustion of
unemployment benefits, the worker becoming more willing to settle for a less than desired job, or
the worker being more willing to relocate.
Comment: A Weibull with τ < 1 has a heavier righthand tail than an Exponential.
A Weibull with τ > 1 has a lighter righthand tail than an Exponential.
dS
34.22. S(x) = h(x) = f(x) / S(x). ⇒ S(x)2 = f(x) = - .
dx
dy
Let y = S(x). Then: y2 = - . ⇒ -dy / y2 = dx. ⇒ 1/y = x + c. ⇒ S(x) = 1/(x+c).
dx
Since S((0) = 1, S(x) = 1/(x+1).
Comment: A Pareto Distribution with α = 1 and θ = 1.
exp[-(x - µ) / s]
34.23. S(x) = 1 - F(x) = .
1 + exp[-(x - µ)/ s]
exp[-(x - µ) / s] / s
f(x) = Fʼ(x) = .
{1 + exp[-(x - µ) / s]}2
1/ s
h(x) = f(x) / S(x) = .
1 + exp[-(x - µ ) / s]
Comment: Logistic Distribution with scale parameter s and location parameter µ, not on the syllabus.
34.24. Take the difference of survival functions to get the probability in each interval.
For example, the height of the rectangle from 5 to 10 years old is: 0.710 - 0.653 = 0.057.
The estimate of the density at 7.5 years is: 0.057/5 = 0.0114.
The estimate of S(7.5) is: (0.710 + 0.653)/2 = 0.6815.
The estimate of hazard rate at 7.5 is: 0.0114/0.6815 = 0.0167.
h(12.5) = (2/5) (0.653 - 0.622) / (0.653 + 0.622) = 0.0097.
The vector of estimated hazard rates: 0.0678, 0.0167, 0.0097, 0.0099, 0.0111, 0.0137, 0.0167,
0.0196, 0.0238, 0.0288, 0.0344, 0.0389, 0.0483, 0.0652, 0.1014, 0.1571, 0.4000.
hazard rate
0.4
0.3
0.2
0.1
Age
10 20 30 40 50 60 70 80
Comment: This was the first published mortality study.
Note the higher hazard rates for young children and old people.
“A few products show a decreasing failure rate in the early life and an increasing failure rate in later life.
Reliability engineers call such a hazard function a bathtub curve. ... Some products, such as
high-reliability capacitors and semiconductor devices, are subjected to a burn-in to weed out infant
mortality before they are put into service, and they are removed from service before wear out
starts. Thus they are in service only in the low failure rate portion of their life. This increases their
reliability in service. While in service, such products may have an essentially constant failure rate, and
the exponential distribution may adequately describe their lives.”
Quoted from Applied Life Data Analysis by Wayne Nelson, not on the syllabus.
α
34.25. B. For the Pareto Distribution, h(x) = f(x)/S(x) = .
θ + x
2 4
Thus h1 (x) = , and h2 (x) = .
10 + x 10 + x
6
Since the decrements are independent, the hazard rates add, and h(x) = .
10 + x
h(5) = 6/15 = 0.4.
Alternately, the probability of surviving past age x is the product of the probabilities of surviving
both of the independent decrements:
⎛ 10 ⎞ 2 ⎛ 10 ⎞ 4 ⎛ 10 ⎞ 6
S(x) = S1 (x) S2 (x) = = .
⎝ 10 + x⎠ ⎝ 10 + x⎠ ⎝ 10 + x⎠
This is a Pareto DIstribution with α = 6 and θ = 10. ⇒

6
h(x) = . ⇒ h(5) = 6/15 = 0.4.
10 + x
τ xτ − 1
34.26. For the Weibull the hazard rate is: h(x) = f(x) / S(x) = .⇒
θτ
τ x τ− 1 τ x τ− 1
2 h(x) = 2 = .
θτ (θ / 21/ τ )τ
This is the hazard rate for another Weibull Distribution with parameters τ and θ / 21/τ.
Comment: If τ = 1 we have an Exponential, and if the hazard rate is doubled then the mean is
halved.
34.27. If the minimum is m and Y > X, then X = m and Y > m; this has probability: fX(m) SY(m).
If the minimum is m and Y < X, then Y = m and X > m; this has probability: fY(m) SX(m).
Thus the desired probability is:
fX(m) SY(m) fX(m) / SX(m) hX(m)
= = .
fX(m) S Y(m) + fY(m) SX(m) fX(m) / SX(m) + fY(m) / S Y(m) hX(m) + hY(m)
Comment: See Exercise 5.7 in Introduction to Probability Models by Ross.
hX(m) λX
If both X and Y are Exponential, then = .
hX(m) + hY(m) λ X + λY
34.28. For α = 1, we have an Exponential with constant hazard rate.

For α > 1, the Gamma has a lighter righthand tail than the Exponential, and thus the hazard rate
increases. For α < 1, the Gamma has a heavier righthand tail than the Exponential, and thus the
hazard rate decreases.
Alternately, for α > 1, the mean excess loss decreases to a constant, and thus the hazard rate
increases to a constant. For α < 1, the mean excess loss increases to a constant, and thus the hazard
rate decreases to a constant.
Alternately, h(x) = f(x) / S(x). As x → ∞, both the numerator and the denominator approach 0.
Thus using LʼHopitalʼs rule, h(x) approaches:
(α - 1) x α -2 e - x / θ - x α -1 e - x / θ / θ x α -1 e - x / θ
fʼ(x) / {-f(x)} = - / = 1/θ - (α-1)/x.
θ α Γ(α) θ α Γ(α)
Therefore, for α > 1, the hazard increases to 1/θ as x approaches infinity.

For α < 1, the hazard decreases to 1/θ as x approaches infinity.
Comment: As α approaches ∞, the Gamma approaches a Normal, which has a lighter righthand tail.
Using a computer, here is the hazard rate for θ = 10 and α = 2 or 0.5:
Hazard Rate
0.11
alpha = 1/2
0.10
0.09 alpha = 2
0.08
0.07
x
50 100 150 200 250
In both cases, the hazard rate approaches 1/θ = 0.1 as x approaches infinity.
34.29. D. For truncation from above at 4, the truncated distribution is F(x) / F(4).
F(2) / F(4) = (1 - e-1) / (1 - e-2).
For truncation from above at 4, the truncated density is f(x) / F(4).
f(2) / F(4) = (e-1/2) / (1 - e-2).
Thus the hazard rate at 2 of the truncated distribution is the ratio of the density to the survival function:
(e- 1 / 2 ) / (1 - e - 2 ) e- 1 / 2 e
- 1 - 2 = - 2 - 1 = .
1 - (1 - e ) / (1 - e ) (1 - e ) - (1 - e ) 2 (e - 1)
x+1
34.30. A. For the Exponential, h(x) = 1/θ and S(x) = e-x/θ. {S(x) - S(x+1)}/ ∫x S(t)dt =
{e-x/θ - e-(x+1)/θ}/{e-x/θ/θ - e-(x+1)/θ/θ} = 1/θ. Statement I is true.
h(x) = f(x)/S(x) ≥ f(x), since S(x) ≤ 1. Therefore, Statement II is true.
While the density must go to zero as x approaches infinity, the density can either increase or
decrease over short periods. Statement III is not true.
x+1
Comment: {S(x) - S(x+1)}/ ∫x S(t)dt = mx = central death rate.
∫0
34.31. E. S(x) = exp[- h(t) dt ] = exp[ln(100-x)/2 - ln(100)/2] = 1 - x / 100 .
100
e(64) = ∫64 S(t)dt /S(64) = (200/3)(1 - 64/100)3/2 / (1 - 64/100)1/2 = 24.
34.32. C. For the Weibull, S(x) = exp(-(x/θ)τ), and f(x) = τxτ−1exp(-(x/θ)τ)/θτ.
⇒ h(x) = f(x)/S(m) = τxτ−1/θτ.

Let m be the median. S(m) = 0.5. ⇒ lnS(m) = -(m/θ)τ = -ln(2). h(m) = 0.05. ⇒ τmτ−1/θτ = 0.05.
Dividing the two equations: m/τ = ln(2)/0.05. ⇒ m = 20τ ln(2).
34.33. B. Integrating f(t) from t to 25, S(t) = (0.2)(25 - t)1/2.

h(t) = f(t)/S(t) = 1/(50 - 2t). h(20) = 1/10.
x
34.34. A. h(t) = 2/(w - t). H(t) = ∫0 h(t) dt = 2ln(w) - 2ln(2 - t).
S(t) = exp[-H(t)] = {(w - t)/w}2 = (1 - t/w)2 . f(t) = 2(1 - t/w)/w. = 2/w - 2t/w2 .
t t
∫0 x f(x)dx = w/3. ∫0 x2 f(x) dx = w2/6. Var(T) = w2 /6 - (w/3)2 = w 2 /18.
34.35. A. S(x) = 0.1(100 - x)1/2. f(x) = 0.05(100 - x)-1/2. h(x) = f(x)/S(x) = 0.5/(100 - x).
h(84) = 0.5/16 = 1/32.
34.36. E. 0.5 = (y/100)2 . ⇒ y = 70.71. f(x) = x/50,000.

f(y) = f(70.71) = 70.71/50,000 = 0.01414. h(y) = f(y)/S(y) = 0.01414/.5 = 0.0283.
34.37. B. Y = 4X2 . ⇒ X = Y / 2.
S X(x) = 1 - x/10, 0 ≤ x ≤ 10. ⇒ SY(y) = 1 - y / 20, 0 ≤ y ≤ 400.
fY(y) = 1/(40 y ). fY(4) = 1/80. SY(4) = .9. hY(4) = (1/80)/0.9 = 1/72 = 0.0139.
t
34.38. B. H(t) = ∫0 h(x)dx = ln(ω)/3 - ln(ω - t)/3.
S(x) = exp[-H(t)] = exp[ln(ω - x)/3 - ln(ω)/3] = (ω - x)1/3/ω1/3 = (1 - x/ω)1/3.
Median age is 63. ⇒ .5 = S(63) = (1 - 63/ω)1/3. ⇒ ω = 63/.875 = 72. ⇒ S(t) = (1 - t/72)1/3.

72
e(63) = ∫63 S(t)dt / S(63) = {(3/4)(72)(1 - 63/72)4/3}/0.5 = 6.75.
34.39. D. 0.5 = S(22) = exp[-(22/θ)τ]. ⇒ 0.69315 = (22/θ)τ.
h(x) = f(x)/S(x) = τxτ−1/θτ. We are given: 1.26 = h(22) = τ22τ−1/θτ.
Dividing the two equations: τ/22 = 1.8178. ⇒ τ = 40.

Comment: θ = 22.2.
34.40. D. S(x) = 1/(1 + (x/θ)γ). f(x) = γxγ−1 θ−γ / (1+ (x/θ)γ)2 .
h(x) = f(x)/S(x) = γxγ−1 θ−γ / (1+ (x/θ)γ) = {(3)x2 /(.19843 )}/{1 + (x/.1984)3 } = 3x2 /(.00781 + x3 ).
0 = hʼ(x) = {6x(0.00781 + x3 ) - (3x2 )(3x2 )} / (0.00781 + x3 )2 . ⇒ x3 = 0.01562. ⇒ x = 0.25.

Comment: Here is a graph of h(x) = 3x2 / (0.00781 + x3 ):
hazard rate
8
x
0.5 1.0 1.5 2.0 2.5 3.0
t
34.41. E. For t < 27.9, H(t) = ∫0 h(x)dx = λt. S(t) = e−λt. S(15) = 0.7634. ⇒ λ = 0.018.
t
For t ≥ 27.9, H(t) = H(27.9) + ∫ h(x)dx = (27.9)(0.018) + (t - 27.9)(0.018) + β(t - 27.9)3 /3.
27.9
S(t) = exp[-.018t - β(t - 27.9)3 /3]. 0.5788 = S(30) = exp[-(.018)(30) - β(30 - 27.9)3 /3].
⇒ (0.018)(30) + β(30 - 27.9)3 /3 = .5468. ⇒ β = .00222.

H(34) = -lnS(34) = (0.018)(34) + (0.00222)(34 - 27.9)3 /3 = 0.780.
For the observed data, SO(34) = 4/10 = 0.4. HO(34) = -ln(.4) = 0.916.
|H(34) - HO(34)| = 0.916 - 0.780 = 0.14.
∫2
2
34.42. A. S(x) = exp(- h(t) dt ) = exp[- (z2 /2){ln(x) - ln(2)}] = (x / 2)-z / 2 , for x ≥ 2.
2
0.16 = S(5) = 2.5-z / 2 . ⇒ ln 0.16 = (-z2 /2) ln 2.5. ⇒ z2 = -2 ln 0.16 / ln 2.5 = 4. ⇒ z = 2.
Comment: S(x) = (2/x)2 , for x ≥ 2. A Single Parameter Pareto, with α = 2 and θ = 2.

34.43. D. µx corresponds to a legitimate survival function, if and only if it is nonnegative and its
integral from 0 to infinity is infinite. All the candidates are nonnegative.
∞ x =∞
∫0 BC ]
x x
dx = B lnC C = ∞, since C > 1.
x =0
∞ ∞
x= x =∞
∫0 a(b + x)- 1 dx = a ln(b + x)
x= 0
] = ∞. ∫0 (1+ x)- 3 -2
dx = -(1+ x) / 2
x =0
] = 1/2 ≠ ∞.
Thus 1 and 2 could serve as a force of mortality.

Comment: #1 is Gompertz Law. #2 is a Pareto Distribution with a = α and b = θ.
The Pareto is not useful for modeling human lives, since its force of mortality decreases to zero as x
34.44. E. 1. Single Parameter Pareto is heavy tailed with an increasing mean residual life.
2. A Gamma with α > 1 is lighter tailed than an Exponential; it has a decreasing mean residual life and
an increasing hazard rate.
3. Single Parameter Pareto has a heavier tail than a Gamma.
Comments: The mean residual life of a Single Parameter Pareto increases linearly as x goes to
infinity, e(x) = x/(α-1). The hazard rate of a Single Parameter Pareto goes to zero as x goes to
infinity, h(x) = α/x. For this Gamma Distribution, h(x) = f(x)/S(x) =
{y e-y/500/ 250,000} / {1 - Γ[2 ; y/500]} = {y e-y/500/ 250,000} / {e-y/500 + (y/500)e-y/500} =
1/(250000/y + 500), where I have used Theorem A.1 to write out the incomplete Gamma function
for integer parameter. h(x) increases to 1/500 = 1/θ as x approaches infinity.
A Single Parameter Pareto, which has a decreasing hazard rate, has a heavier righthand tail than a
Gamma, which has an increasing hazard rate.
A Gamma with α = 1 is an Exponential with constant hazard rate. For α integer, a Gamma is a sum of
α independent, identically distributed Exponentials. Therefore, as α → ∞, the Gamma Distribution

approaches a Normal Distribution. The Normal Distribution is very light-tailed and has an increasing
hazard rate. This is one way to remember that for α > 1, the Gamma Distribution has an increasing
hazard rate. For α < 1, the Gamma Distribution has a decreasing hazard rate.
t
34.45. B. h(t) = 0.002t. H(t) = ∫0 h(x)dx = 0.001t2. S(t) = exp[-H(t)] = exp[-0.001t2].
We want F(t) ≤ 1%. F(3) = 1 - exp[-0.009] = 0.009 ≤ 1%, so 3 is OK.
F(4) = 1 - exp[-0.016] = 0.016 > 1%, so 4 is not OK.
Comment: A Weibull Distribution with τ = 2. 99% = S(t) = exp[-0.001t2 ]. ⇒ t = 3.17.
34.46. C. For a constant force of mortality (hazard rate) one has an Exponential Distribution. For the
original strain of the disease: 10% = 1 - e-20µ. ⇒ µ = 0.005268.
For the new strain, the probability of death in the next 20 years is:
1 - exp[-(20)(2µ)] = 1 - exp[-(20)(2)(0.005268)] = 1 - e-0.21072 = 19.0%.
Alternately, for twice the hazard rate, the survival function is squared.
For the original strain, S(20) = 1 - 0.010 = 0.90.
For the new strain, S(20) = 0.92 = 0.81.
For the new strain, the probability of death in the next 20 years is: 1 - 0.81 = 19%.
∞ ∞
34.47. D. e(x) = ∫x S(t) dt / S(x) = ∫x (1 - t / ω)α dt / (1 - x/ω)α = {ω(1 - x/ω)α/(α + 1)} / (1 - x/ω)α
= (ω - x)/(α + 1). ⇒ e(0) = ω/(α + 1).
By differentiating, f(x) = -dS(x)/dx = α(1 - x/ω)α−1/ω.
h(x) = f(x)/S(x) = {α(1 - x/ω)α−1/ω}/(1 - x/ω)α = α/(ω - x).

Let α be the original value and αʼ be the new value of this parameter.
From bullet i: ω/(αʼ + 1) = 0.5ω/(α + 1). ⇒ αʼ = 2α + 1.
From bullet ii: αʼ/(ω - x) = 2.25α/(ω - x). ⇒ αʼ = 2.25α.
Therefore, 2.25α = 2α + 1. ⇒ α = 4.
Alternately, H(x) = -ln S(x) = -α ln(1 - x/ω).
h(x) = d H(x) / dx = (α/ω)/(1 - x/ω) = α/(ω - x). Proceed as before.
Comment: If α = 1, then one has DeMoivreʼs Law, the uniform distribution.
A Modified DeMoivre model has α times the hazard rate of DeMoivreʼs Law for all ages.
x x
34.48. B. H(x) = ∫0 h(t) dt = ∫0 2 / (110 - t) dt = -2{ln(110 - x) - ln(110)}.
S(x) = exp[-H(t)] = {(110 - x)/110}2 = (1 - x/100)2 , for 0 ≤ x < 110.
110 110
e(30) = ∫30 S(t) dt / S(30) = 30∫ (1 - x / 100)2 dt / (1 - 30/100)2
= (110/3)(1 - 30/110)3 / (1 - 30/100)2 = (110 - 30)/3 = 26.67.
Comment: Generalized DeMoivreʼs Law with ω = 110 and α = 2. µ(x) = α/(ω - x), 0 ≤ x < ω.
e(x) = (ω - x)/(α+1) = (110 - x)/3.
The remaining lifetime at age 30 is a Beta Distribution with a = 1, b = α = 2, and θ = ω - 30 = 80.
x x
34.49. E. H(x) = ∫0 h(t) dt = ∫0 t2 / 4000 dt = x3/12,000.
S(x) = exp[-H(t)] = exp[- x3 /12,000], for 0 ≤ x < 4000 .
S(10) = 0.9200. S(12) = 0.8659. S(14) = 0.7956.
Prob[fail between 12 and 14 | survive until 10] = {S(12) - S(14)} / S(10) =
(0.8659 - 0.7956) / 0.9200 = 0.0764.
Comment: Without the restriction, x ≤ 4000 , this would be a Weibull Distribution with τ = 3.
x
34.50. A. H(x) = ∫0 h(t) dt = ln(100) - ln(100 - x). S(x) = exp[-H(x)] = (100 - x)/100.
Prob[life aged 40 survives at least 10 years] = S(50)/S(40) = 0.5/0.6 = 5/6.
Prob[life aged 50 survives at least 10 years] = S(60)/S(50) = 0.4/0.5 = 4/5.
Prob[exactly one survives 10 years] = (5/6)(1 - 4/5) + (1 - 5/6)(4/5) = 9/30.
Comment: DeMoivreʼs Law with ω = 100.
2016-C-2, Loss Distributions, §35 Loss Elimination Ratio HCM 10/21/15, Page 598
Section 35, Loss Elimination Ratios and Excess Ratios
As discussed previously, the Loss Elimination Ratio (LER) is defined as the ratio of the losses
eliminated by a deductible to the total losses prior to imposition of the deductible. The losses
eliminated by a deductible d, are E[x ∧ d], the Limited Expected Value at d.262
E[X ∧ x]
LER(x) = .
E[X]
The excess ratio R(x), is defined as the ratio of loss dollars excess of x divided by the total loss
dollars.263 It is the complement of the Loss Elimination Ratio; they sum to unity.
E[X] - E[X ∧ x] E[X ∧ x]
R(x) = =1- = 1 - LER(x).
E[X] E[X]
Using the formulas in Appendix A of Loss Models for the Limited Expected Value one can use the
E[X ∧ x]
relationship: R(x) = 1 - to compute the Excess Ratio.
E[X]
For various distributions, here are the resulting formulas for the excess ratios, R(x):
Distribution Excess Ratio, R(x)

Exponential e-x/θ
⎛ θ ⎞ α− 1
Pareto ⎜ ⎟ ,α>1
⎝θ + x ⎠
⎡ ln(x) − µ ⎤
1− Φ ⎢ ⎥⎦
⎡ ln(x) − µ − σ2 ⎤ ⎣ σ
LogNormal 1 -Φ⎢ ⎥⎦ - x
⎣ σ exp[µ + σ2 / 2]
Gamma 1 - Γ(α+1 ; x/θ) - x{1-Γ(α ; x/θ)}/ αθ
Weibull 1 - Γ[1 +1/τ ; (x/θ)τ] - (x/θ) exp( -(x/θ)τ) / Γ[1 +1/τ]
Single Parameter Pareto (1/α) (x/θ)1−α , α > 1, x > θ

262
The losses eliminated are paid by the insured rather than the insurer. The insured would generally pay less for its
insurance in exchange for accepting a deductible. By estimating the percentage of losses eliminated the insurer can
price how much of a credit to give the insured for selecting various deductibles. How the LER is used to price
deductibles is beyond the scope of this exam, but generally the higher the loss elimination ratio, the greater the
deductible credit.
263
The excess ratio is used by actuaries to price reinsurance, workers compensation excess loss factors, etc.
Recall that the mean and thus the Excess Ratio fails to exist for: Pareto with α ≤ 1,
Generalized Pareto with α ≤ 1, and Burr with αγ ≤ 1. Except where the formula could be simplified,
there is a term in the Excess Ratio which is: -x(S(x)) / mean.264
Due to the computational length, exam questions involving the computation of Loss Elimination or
Excess Ratios are most likely to involve the Exponential, Pareto, Single Parameter Pareto, or
LogNormal Distributions.265
Exercise: Compute the excess ratios at $1 million and $5 million for a Pareto with parameters
α = 1.702 and θ = 240,151.
[Solution: For the Pareto R(x) = {θ/(θ+x)}α−1. R($1 million) = (240,151/1,240,151)0.702 = 0.316.
R($5 million) = (240,151/5,240,151)0.702 = 0.115.]
Since LER(x) = 1 - R(x), one can use the formulas for the Excess Ratio to get the Loss Elimination
Ratio and vice-versa.
Exercise: Compute the loss elimination ratio at 10,000 for the Pareto with parameters:
α = 1.702 and θ = 240,151.
[Solution: For the Pareto, R(x) = {θ/(θ+x)}α−1. Therefore, LER(x) = 1 - {θ/(θ+x)}α−1.

LER(10,000) = 1 - (240,151/250,151)0.702 = 2.8%.
Comment: One could get the same result by using LER(x) = E[X ∧ x] / mean.]
Loss Elimination Ratio and Excess Ratio in Terms of the Survival Function:
As discussed previously, for a distribution with support starting at zero, the Limited Expected Value
can be written as an integral of the Survival Function from 0 to the limit:
x
E[X ∧ x] = ∫0 S(t) dt .
x x
∫0 S(t) dt ∫0 S(t) dt
LER(x) = E[X ∧ x] / E[X], therefore: LER(x) = = ∞ .
E[X]
∫0 S(t) dt
264
This term comes from the second part of - E[X ∧ x] in the numerator, -xS(x). For example for the Gamma
Distribution, the excess ratio has a term -xλ{1-Γ(α ; λx)}/α = -xS(x)/( α/λ) = - xS(x)/mean.
265
The name “Excess Ratio” is not on the syllabus.
Thus, for a distribution with support starting at zero, the Loss Elimination Ratio is the
integral from zero to the limit of S(x) divided by the mean.
Since R(x) = 1 - LER(x) = (E[X] - E[X ∧ x]) / E[X], the Excess Ratio can be written as:
∞ ∞
∫ S(t) dt ∫x S(t) dt
x
R(x) = = ∞ .
E[X]
∫0 S(t) dt
So the excess ratio is the integral of the survival from the limit to infinity, divided by the mean.266
For example, for the Pareto Distribution, S(x) = θα (θ+x)−α. So that:
θα (θ + x)1- α / (α -1)
R(x) = = {θ/(θ+x)}α−1.
θ / (α - 1)
This matches the formula given above for the Excess Ratio of the Pareto Distribution.
∫0 S(t) dt d LER(x) S(x)

LER(x) = .⇒ = .
E[X] dx E[X]
S(x)
Since ≥ 0, the loss elimination ratio is a increasing function of x.267
E[X]
d LER(x) S(x) d2 LER(x) f(x)

= .⇒ 2 =- .
dx E[X] dx E[X]
f(x)
Since ≥ 0, the loss elimination ratio is a concave downwards function of x.
E[X]
The loss elimination ratio as a function of x is increasing, concave downwards, and approaches one
as x approaches infinity.
266
This result is used extremely often by property/casualty actuaries. See for example, “The Mathematics of Excess
of Loss Coverage and Retrospective Rating -- A Graphical Approach,” by Y.S. Lee, PCAS LXXV, 1988.
267
If S(x) = 0, in other words there is no possibility of a loss of size greater than x, then the loss elimination is a
constant 1, and therefore, more precisely the loss elimination is nondecreasing.
For example, here is a graph of the loss elimination ratio for a Pareto Distribution with parameters
α = 1.702 and θ = 240,151:268
LER
0.8
0.6
0.4
0.2
size in millions
1 2 3 4 5
Since the loss elimination ratio is increasing and concave downwards, the excess ratio is decreasing
and concave upwards (convex).
For a distribution with support starting at zero:

d LER(x) S(x) d LER(0) 1 d LER(x) d LER(0)
= .⇒ = . ⇒ S(x) = / .
dx E[X] dx E[X] dx dx
Therefore, the loss elimination ratio function determines the distribution function, as well as vice-versa.
Layers of Loss:
As discussed previously, layers can be thought of in terms of the difference of loss elimination ratio
or the difference of excess ratios in the opposite order.
Exercise: Compute the percent of losses in the layer from $1 million to $5 million for a Pareto
Distribution with parameters α = 1.702 and θ = 240,151.
[Solution: For this Pareto Distribution, R($1 million) - R($5 million) = 0.316 - 0.115 = 0.201.
Thus for this Pareto, 20.1% of the losses are in the layer from $1 million to $5 million.]
268
As x approaches infinity, the loss elimination ratio approaches one. In this case it approaches the limit slowly.
Moments in Terms of Integrals of Excess Ratios:269
∞ ∞
∫ S(t) dt ∫x S(t) dt
x
R(x) = = ∞ .
E[X]
∫0 S(t) dt
Thus, Rʼ(x) = -S(x) / E[X].
∞
E[X2 ]
As has been discussed previously: ∫0 R(x) dx =
2 E[X]
.
Assuming they exist, the higher moments can be written in terms of integrals of R(x) xk.
Using integration by parts:270
∞ ∞ ∞
x =∞
∫0 R(x) x dx = ∫0 {-S(x) / E[X]} ∫0 S(x) x2 dx =

1
R(x)x2 / 2 ] - x2 / 2 dx =
2 E[X]
x =0
∞
x= ∞ 3
∫0
1
2 E[X]
{ S(x) x3 / 3 ] - -f(x) x3 / 3 dx } = 6E[X ]
E[X]
.
x= 0
In a similar manner one can show that provided the moments exist:
∞
E[Xk + 2]
∫0 R(x) xk dx =
(k + 2) (k +1) E[X]
, k ≥ 0.
269
See for example “The Mathematics of Excess Losses,” by Leigh J. Halliwell, Variance Volume 6 Issue 1.
270
Where the terms at infinity vanish, since R(x) and S(x) must go to zero sufficiently quickly as x approaches infinity
in order for the third moment to exist.
Exercise: Verify that the above equation holds for the Exponential Distribution.
∞ ∞
[Solution: R(x) = e-x/θ.
∫0 R(x) xk dx = ∫0 e- x/ θ x k dx .
This is a Gamma type integral with α = k + 1.

∞ ∞
Thus, ∫0 R(x) xk dx = ∫0 e- x/ θ x k dx = θk+1 Γ[k+1] = θk+1 k!.
E[Xk + 2] (k + 2)! θ k + 2
= = k! θk+1.
(k + 2) (k +1) E[X] (k + 2) (k +1) θ
∞
E[Xk + 2]
Showing that in this case: ∫0 R(x) xk dx =
(k + 2) (k +1) E[X]
.]
∞
Rewriting this equation: E[Xk+2] = (k+1) (k+2) E[X]
∫0 R(x) xk dx , k ≥ 0.
Problems:
35.1 (2 points) Assume you have a Pareto distribution with α = 5 and θ = $1000.
What is the Loss Elimination Ratio for $500?
A. less than 78%
E. at least 81%
35.2 (2 points) Assume you have Pareto distribution with α = 5 and θ = $1000.
What is the Excess Ratio for $2000?
A. less than 1%
E. at least 4%
40, 41, 48, 49, 53, 60, 63, 78, 85, 103, 124, 140, 192, 198, 227, 330, 361, 421, 514 ,546, 750,
864, 1638.
What is the (empirical) Loss Elimination Ratio at 50?
A. less than 0.14
E. at least 0.20
35.4 (2 points) The size of losses follows a LogNormal distribution with parameters µ = 11 and
σ = 2.5. What is the Excess Ratio for 100 million?
A. less than 5%
E. at least 20%
35.5 (2 points) The size of losses follows a Gamma distribution with parameters
α = 3, θ = 100,000. What is the excess ratio for 500,000?
j=n-1
Γ(n; x) = 1 - ∑ xj e- x / j! , for n a positive integer.
j=0
A. less than 5.6%

E. at least 6.2%
35.6 (2 points) The size of losses follows a LogNormal distribution with parameters µ = 10, σ = 3.
What is the Loss Elimination Ratio for 7 million?
A. less than 10%
E. at least 25%

• Accident sizes for Risk 1 follow an Exponential distribution, with mean θ.
• Accident sizes for Risk 2 follow an Exponential distribution, with mean 1.2θ.
• The insurer pays all losses in excess of a deductible of d.
• 10 accidents are expected for each risk each year.
35.7 (1 point) Determine the expected amount of annual losses paid by the insurer for Risk 1.
A. 10dθ B. 10 / (dθ) C. 10θ D. 10θe-d/θ E. 10e-d/θ
35.8 (1 point) Determine the limit as d goes to infinity of the ratio of the expected amount of annual
losses paid by the insurer for Risk 2 to the expected amount of annual losses paid by the insurer for
Risk 1.
A. 0 B. 1/1.2 C. 1 D. 1.2 E. ∞
35.9 (2 points) For a Beta Distribution with b = 1 and θ = 1, determine the form of the
Loss Elimination Ratio as function of x, for 0 ≤ x ≤ 1.
35.10 (1 point) You have the following estimates of integrals of the Survival Function.
1000 ∞
∫0 S(x) dx ≅ 400.
∫ S(x) dx ≅ 2300.
1000
Estimate the Loss Elimination Ratio at 1000.

A. less than 15%
E. at least 18%
35.11 (4 points) For a LogNormal distribution with coefficient of variation equal to 3,

what is the Loss Elimination Ratio at twice the mean?
A. less than 50%
E. at least 65%
ln[a + bx] - ln[a +1]

35.12 (3 points) The loss elimination ratio at 1 ≥ x ≥ 0 is: , 1 > b > 0, a > 0.
ln[a + b] - ln[a + 1]
Determine the form of the distribution function.
⎧ 0.0001 x, 0 < x ≤ 100

⎪
35.13 (3 points) f(x) = ⎨0.0001 (200 - x), 100 < x ≤ 200 .
⎪ 0, x > 200
⎩
Calculate the loss elimination ratio for an ordinary deductible of 50.

A. 44% B. 46% C. 48% D. 50% E. 52%
35.14 (3 points) Sizes of loss follow a Poisson Distribution with mean 6.

What is the loss elimination ratio at 5?
A. 55% B. 60% C. 65% D. 70% E. 75%
35.15 (3 points) For a Beta Distribution with a = 1 and θ = 1, determine the form of the
Loss Elimination Ratio as function of x, for 0 ≤ x ≤ 1.
35.16 (4, 5/86, Q.59) (2 points) Assume that losses follow the probability density function
f(x) = x/18 for 0 ≤ x ≤ 6 with f(x) = 0 otherwise.
What is the loss elimination ratio (LER) for a deductible of 2?
A. Less than 0.35
E. 0.50 or more.
35.17 (4, 5/87, Q.57) (2 points) Losses are distributed with a probability density function
f(x) = 2/x3 , 1 < x < ∞. What is the expected loss eliminated by a deductible of d = 5?
A. Less than 0.5
B. At least 0.5, but less than 1
C. At least 1, but less than 1.5
D. At least 1.5, but less than 2
E. 2 or more.
• Deductible $250
• Expected size of loss with no deductible $2,500
• Probability of a loss exceeding deductible 0.95
• Mean Excess Loss value of the deductible $2,375
Determine the loss elimination ratio.
A. Less than 0.035
E. At least 0.140
• Deductible, d $ 500
• Expected value limited to d, E[x ∧ d] $ 465
• Probability of a loss exceeding deductible, 1-F(d) 0.86
• Mean Excess Loss value of the deductible, e(d) $5,250
Determine the loss elimination ratio.
A. Less than 0.035
E. At least 0.095
• The amount of a single loss has a Pareto distribution with parameters α = 2 and θ = 2000.
Calculate the Loss Elimination Ratio (LER) for a $500 deductible.
A. Less than 0.18
E. At least 0.33
35.21 (4B, 5/96, Q.9 & Course 3 Sample Exam, Q.17) (2 points)
You are given the following:
• Losses follow a lognormal distribution,with parameters µ = 7 and σ = 2.
• There is a deductible of 2,000.
Determine the loss elimination ratio (LER) for the deductible.
A. Less than 0.10
E. At least 0.25
• Losses follow a Pareto distribution, with parameters θ = k and α = 2, where k is a constant.
• There is a deductible of 2k.
What is the loss elimination ratio (LER)?
A. 1/3 B. 1/2 C. 2/3 D. 4/5 E. 1
• Losses follow a distribution with density function
f(x) = (1/1000) e-x/1000, 0 < x < ∞.
• 10 losses are expected to exceed the deductible each year.
Determine the amount to which the deductible would have to be raised to double the loss
elimination ratio (LER).
A. Less than 550
C. At least 850, but less than 1,150
E. At least 1,450

• Loss sizes for Risk 1 follow a Pareto distribution, with parameters θ and α, α > 2 .
• Loss sizes for Risk 2 follow a Pareto distribution, with parameters θ and 0.8α, α > 2 .
• The insurer pays losses in excess of a deductible of k.
• 1 loss is expected for each risk each year.
35.24 (4B, 11/97, Q.22) (2 points) Determine the expected amount of annual losses paid by the
insurer for Risk 1.
θ+k θα α θα
A. B. C.
α-1 (θ +k)α (θ +k)α + 1
θ α+ 1 θα
D. E.
(α -1) (θ +k)α (α -1) (θ +k)α− 1
35.25 (4B, 11/97, Q.23) (1 point) Determine the limit of the ratio of the expected amount of annual
losses paid by the insurer for Risk 2 to the expected amount of annual losses paid by the insurer for
Risk 1 as k goes to infinity.
A. 0 B. 0.8 C. 1 D. 1.25 E. ∞
35.26 (4B, 5/99, Q.20) (2 points) Losses follow a lognormal distribution, with parameters
µ = 6.9078 and σ = 1.5174. Determine the ratio of the loss elimination ratio (LER) at 10,000 to the
loss elimination ratio (LER) at 1,000.
A. Less than 2
E. At least 8
The graph of the density function for losses is:
f(x)
0.010
0.008
0.006
0.004
0.002
Loss amount, x
80 120
Calculate the loss elimination ratio for an ordinary deductible of 20.

(A) 0.20 (B) 0.24 (C) 0.28 (D) 0.32 (E) 0.36
A risk has a loss amount which has a Poisson distribution with mean 3.
An insurance covers the risk with an ordinary deductible of 2.
An alternative insurance replaces the deductible with coinsurance α, which is the proportion of the
loss paid by the insurance, so that the expected insurance cost remains the same.
Calculate α.
(A) 0.22 (B) 0.27 (C) 0.32 (D) 0.37 (E) 0.42
35.1. D. LER(x) = { E[X ∧ x] / mean } , for the Pareto: mean = θ/(α-1) = $250 and
E[X ∧ x] = {θ/(α-1)} {1-(θ/θ+x)α−1} = $200.6. LER(x) = 200.6/250 = 80.24%
35.2. B. Excess Ratio = 1 - { E[X ∧ x] / mean } = {θ/(θ +x)}α−1 = 1.23%.

Comment: E[X ∧ x] = 246.925, mean = 250.
35.3. D. E[X ∧ 50] = { 6 + 7 + 11+ 14 +15 + 17 + 18 + 19 + 25+ 29+ 30+ 34 + 40 + 41+ 48 +

49 + (19)(50)} /35 = (403 + 950)/35 = 38.66. E[X] = {6 + 7 + 11 +14 + 15 + 17 + 18 +19 + 25 +
29 + 30 + 34 + 40 + 41+ 48 + 49 + 53 + 60 + 63 + 78 + 85 + 103 + 124 + 140 + 192 + 198 +
227 + 330 + 361 + 421 + 514 + 546 + 750 + 864 + 1638}/35 = 7150 /35 = 204.29.
LER(50) = E[X ∧ 50] / E[X] = 38.66 / 204.29 = 0.189.
35.4. E. mean = exp(µ + σ2/2) = exp(11 + 2.52 /2) = 1362729.
E{X ∧ x] = exp(µ + σ2/2)Φ[(lnx - µ − σ2)/σ] + x {1 − Φ[(lnx - µ)/σ]}.

E{X ∧ 100 million] = 1362729Φ[(18.421 - 11 - 2.52 )/2.5] − (100 million){1 - Φ[(18.421 - 11)/2.5]} =
1362729Φ[0.47] - (100 million){1 - Φ[2.97]} = 1,362,729(0.6808) - (100 million){1 - 0.9985) =
1,077,745. Then R(100 million) = 1 - 1,077,745 / 1,362,729 = 1 - .791 = 20.9%.
35.5. B. Γ[3 ; 5] = 1 - e-5(1 + 5 + 52 /2) = 0.875.

Γ[4 ; 5] = 1 - e-5(1 + 5 + 52 /2 +53 /6) = 0.735.
For Gamma E[X] = αθ = 300,000.
E[X ∧ 500,000] = (αθ)Γ[α+1 ; 500,000/θ] + 500,000 (1 - Γ[α ; 500,000/θ]) =
300000Γ[4 ; 5] + 500,000(1 - Γ[3 ; 5]) = 283,000.
Excess Ratio = 1 - { E[X ∧ 500,000] / E[X] } = 1 - 283,000/300,000 = 1 - 0.943 = 5.7%.
35.6. D. mean = exp(µ + σ2/2) = exp(10 + 32 /2) = 1,982,759.
E{X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}.

E{X ∧ 7 million] = 1982759Φ[(15.761 - 10 - 32 )/3] + (7 million){1 - Φ[(15.761 - 10)/3]}
= 1,982,759 Φ[-1.08] + (7 million){1 - Φ[1.92]} =
1,982,759(0.1401) + (7 million)(1 - 0.9726) = 469,585.
Then LER(7 million) = E{X ∧ 7 million]/ E[X] = 469,585 / 1,982,759 = 0.237.
35.7. D. The expected amount paid by the insurer is: 10{E[X] - E[X ∧ d]} =
10{θ - θ(1−e-d/θ)} = 10 θ e-d/θ .
Alternately, per claim the losses excess of the limit d are: e(k)(1-F(k)) = θ e-d/θ.
Thus for 10 claims we expect the insurer to pay: 10 θ e-d/θ .
Alternately, per claim the losses excess of the limit k are: R(k)E[X] = e-d/θ θ = θ e-d/θ.
Thus for 10 claims we expect the insurer to pay: 10 θ e-d/θ .
35.8. E. Using the solution to the previous question, the expected amount paid by the insurer for
Risk 1 is: 10θ e-d/θ.
Similarly, the expected amount paid by the insurer for Risk 2 is: 12θ e-d/1.2θ.
Therefore, the ratio of the expected amount of annual losses paid by the insurer for Risk 2 to the
expected amount of annual losses paid by the insurer for Risk 1 is:
{12θ e-d/1.2θ} / {10θ e-d/θ} = 1.2 e.167d/θ. As d goes to infinity this goes to infinity.
35.9. E[X] = a/(a+b) = a/(a+1).

f(x) = axa-1, 0 ≤ x ≤ 1. ⇒ S(x) = 1 - xa.
x x
E[X ∧ x] = ∫0 t f(t) dt + x S(x) = ∫0 t ata - 1 dt + x(1 - xa) = xa+1 a/(a+1) + x - xa+1 = x - xa+1 / (a+1).
LER[x] = E[X ∧ x] / E[X] = x (a+1)/a - xa + 1/a.
1000
∫ S(t) dt
0
35.10. A. LER(1000) = ∞
≅ 400/(400+2300) = 14.8%.
∫0 S(t) dt
Comment: The estimated mean is 400 + 2300 = 2700.

The estimated limited expected value at 1000 is 400.
35.11. D. 1 + CV2 = E[X2 ]/E[X]2 = exp[2µ + 2σ2]/exp[µ + σ2/2]2 = exp[σ2].
1 + 32 = exp[σ2]. ⇒ σ = ln(10) = 1.5174.
LER[x] = E[X ∧ x] / mean = {exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}} / exp(µ + σ2/2) =
Φ[(lnx − µ)/σ - σ] + (x/mean) {1 - Φ[(lnx − µ)/σ]}. x = 2 mean. ⇒ ln(x) = ln(2) + µ + σ2/2.
⇒ (lnx − µ)/σ = ln(2)/σ + σ/2 = 0.69315/1.5174 + 1.5174/2 = 1.2155.

LER[x] = Φ[1.2155 - 1.5174] + (2){1 - Φ[1.2155]} = Φ[-0.30] + 2(1 - Φ[1.22]) =
0.3821 + 2(1 - 0.8888) = 60.5%.
Comment: See Table I in “Estimating Pure Premiums by Layer - An Approach” by Robert J.
Finger, PCAS 1976. Finger calculates excess ratios, which are one minus the loss elimination ratios.
Here is a graph of Loss Elimination Ratios as a function of the ratio to the mean, for LogNormal
Distributions with some different values of the coefficient of variation:
LER
1 CV= 1
CV= 2
0.8 CV= 3
0.6
0.4
0.2
Ratio to Mean
1 2 3 4 5
∫0 S(t) dt d LER(x) S(x)

35.12. LER(x) = .⇒ = .
E[X] dx E[X]
d LER(0) 1 d LER(x) d LER(0) d LER(x) d LER(0)

⇒ = . ⇒ S(x) = / . ⇒ F(x) = 1 - / .
dx E[X] dx dx dx dx
d LER(x) 1 ln[b] bx d LER(0) 1 ln[b ]

= x . = .
dx ln[a + b] - ln[a +1] a + b dx ln[a + b] - ln[a +1] a + 1
bx
⇒ F(x) = 1 - (a+1) , 0 ≤ x < 1.
a + bx
S(1-) = (a+1)b / (a + b) > 0.
Thus there is a point mass of probability at 1 of size: (a+1)b / (a + b).
Comment: Note that F(1-) = a (1 - b) / (a + b) < 1.
This is a member of the MBBEFD Distribution Class, not on the syllabus of your exam.
See “Swiss Re Exposure Curves and the MBBEFD Distribution Class,” by Stefan Bernegger,
ASTIN Bulletin, Vol. 27, No. 1, May 1997, pp. 99-111, on the syllabus of CAS Exam 8.
Here is a graph of the loss elimination ratio for b = 0.2 and a = 3:
LER
1.0
0.8
0.6
0.4
0.2
size
0.2 0.4 0.6 0.8 1.0
As it should, the LER is increasing, concave downwards, and approaches 1 as x approaches 1.
Here is a graph of the Survival Function for b = 0.2 and a = 3:
Survival Function
1.0
0.8
0.6
0.4
0.2
size
0.2 0.4 0.6 0.8 1.0
There is a point mass of probability at x = 1 of size: (a+1)b / (a + b) = (4)(0.2)/3.2 = 25%.
100 200
35.13. C. E[X] = ∫0 x 0.0001 x dx + 100∫ x 0.0001 (200 - x) dx =
100 200
0.0001 ∫0 x2 dx + 0.0001 ∫100 200x - x2 dx =
(0.0001) {1003 /3 + (100)(2002 - 1002 ) - (2003 /3 - 1003 /3)} = 100.
Alternately, the density is symmetric around 100, so the mean is 100.
50 100 200
E[X ∧ 50] = ∫0 x 0.0001 x dx + 50 50∫ 0.0001 x dx + 50 100∫ 0.0001 (200 - x) dx =
50 100 200
0.0001 ∫0 x2 dx + 0.005 50∫ x dx + 0.005 100∫ (200 - x) dx =
(0.0001)(503 /3) + (0.005) (1002 /2 - 502 /2) + (0.005) {(100)(200) - (2002 /2 - 1002 /2)} = 47.9167.
Loss Elimination Ratio at 50 is: E[X ∧ 50]/E[X] = 47.9167/100 = 47.9%.
Comment: Similar to SOA3, 11/03, Q.29 & (2009 Sample Q.87).
35.14. E. f(0) = e-6 = 0.0025. f(1) = 6e-6 = 0.0149. f(2) = 62 e-6/2 = 0.0446.
f(3) = 63 e-6/6 = 0.0892. f(4) = 64 e-6/24 = 0.1339.
1 - f(0) - f(1) - f(2) - f(3) - f(4) =
1 - 0.0025 - 0.0149 - 0.0446 - 0.0892 - 0.1339 = 0.7149.
E[X ∧ 5] = 0 f(0) + 1 f(1) + 2 f(2) + 3 f(3) + 4 f(4) + 5{1 - f(0) - f(1) - f(2) - f(3) - f(4)} =
0.0149 + (2)(0.0446) + (3)(0.0892) + (4)(0.1339) + (5)(0.7149) = 4.4818.
Loss Elimination Ratio at 5 is: E[X ∧ 5]/E[X] = 4.4818/6 = 74.7%.
35.15. E[X] = a/(a+b) = 1/(1+b).

f(x) = b (1-x)b-1, 0 ≤ x ≤ 1. ⇒ S(x) = (1-x)b .
x x
E[X ∧ x] = ∫0 t f(t) dt + x S(x) = ∫0 t b (1- t)b - 1 dt + x(1-x)b =
x x
b ∫0 (1- t)b - 1 dt -b ∫0 (1- t)b dt + x (1-x)b = -{(1-x)b - 1} + {(1-x)b+1 - 1}b/(b+1) + x (1-x)b.
LER[x] = E[X ∧ x] / E[X] = -{(1-x)b - 1}(b+1) + {(1-x)b+1 - 1}b + x (1-x)b (b+1) =
1 + b (1-x)b+1 - (b+1)(1-x)b + x (1-x)b (b+1) = 1 + b (1-x)b+1 - (b+1)(1-x)b+1 = 1 - (1-x)b + 1.
35.16. D. F(x) = x2 / 36.

2 6
∫0
LER(2) = { f(x) x dx + 2S(2) } / ∫0 f(x) x dx = {(23)/(54) + 2 (1 - 22 / 36)} / {(63)/54} =
1.926 / 4 = 0.481.
35.17. D. Integrating f(x) from 1 to x, F(x) = 1 - 1/x2 . A deductible of d eliminates the size of the
loss for small losses and d per large loss. The expected losses eliminated by a deductible of d is:
d d
∫1 f(x) x dx + d S(d) = ∫1 2x- 2 dx + d(1/d2) = (2 -2/d) + 1/d = 2 - 1/d .

For d = 5, the expected losses eliminated are: 2 - 1/5 = 1.8.
Comment: A Single Parameter Pareto with α = 2 and θ = 1.
35.18. C. e(x) = {mean - E[X ∧ x]} / S(x). Therefore: 2375 = (2500 - E[X ∧ x]) / 0.95.
Thus E[X ∧ x] = 243.75. Then, LER(x) = E[X ∧ x] / E[X] = 243.75 / 2500 = 0.0975.
Alternately, LER(x) = 1 - e(x) S(x) / E[X] = 1 - (2375)(1 - 0.05)/2500 = 0.0975.
35.19. D. LER(d) = E[x ∧ d] / Mean. e(d) = (Mean - E[X ∧ d]) / S(d).

Therefore, 5250 = (Mean - 465) / 0.86. Therefore Mean = 4515 + 465 = 4980.
Thus LER(d) = 465/4980 = 0.093.
Comment: One does not use the information that the deductible amount is $500.
However, note that E[x ∧ d] ≤ d, as it should.
35.20. B. For the Pareto distribution LER(x) = 1 - (1 + x / θ)1 − α. For α = 2 and

θ = 2800, LER(500) = 1 - 1/1.25 = 0.2.
Alternately, E[X ∧ x] = {θ/(α-1)} {1-(θ/θ+x)α−1}. E[X ∧ 500] = (2000)(1 - 0.8) = 400.
The mean is: θ / (α-1) = 2000. LER(x) = E[X ∧ x] / mean = 400 / 2000 = 0.2.
35.21. B. Mean = exp(µ + 0.5 σ2) = exp(7 + 2) = e9 = 8103.

E[X ∧ 2000] = 8103Φ[(ln2000 - 7 - 4)/2] + 2000 {1 - Φ[(ln2000 − 7)/2]} =
8103 Φ(-1.7) + 2000{1 - Φ(0.3)} = (8103)(1 - 0.9554) + (2000)(1 - 0.6179) = 361 +764 = 1125.
LER(2000) = E[X ∧ 2000] / Mean = 1125 / 8103 = 0.139.
35.22. C. For the Pareto Distribution, the Loss Elimination Ratio is: 1 - (θ/(θ+x))α−1.
For θ = k and α = 2, LER(x) = 1 - (k/(k+x)) = x / (k+x). Thus LER(2k) = 2k/ 3k = 2/3.
Comment: If one does not remember the formula for the LER for the Pareto, one can use the
formula for the limited expected value and the fact that LER(x) = E[X ∧ x] / E[X].
35.23. E. For the Exponential Distribution: LER(x) = 1 - e-x/θ. For θ = 1000,

LER(500) = 1 - e-0.5 = 0.3935. In order to double the LER, then (2)(0.3935) = 1 - e-x/1000.
Thus e-x/1000 = 0.214. ⇒ x = -1000 ln(0.213) = 1546.
Comment: For the Exponential, e(x) = θ, and thus R(x) = e(x) S(x) / mean = (θ)(e-x/ θ)/(θ) = e-x/ θ.
Thus LER(X) = 1 - R(x) = 1 - e-x/ θ.

35.24. E. Since there is 1 loss expected per risk per year, the expected amount paid by the
insurer is: E[X] - E[X ∧ k] = θ/(α−1) − {θ/(α−1)}{1 - θα−1/(θ+k)α−1} =
{θ/(α−1)}θα−1/(θ + k)α−1 = θα / {(α−1)(θ + k)α−1}.
Alternately, the losses excess of the limit k are e(k)S(k) = {(k + θ)/(α−1)} θα/(θ+k)α =
θα / {(α−1)(θ + k)α−1}.
Alternately, the losses excess of the limit k are R(k)E[X] =
{θ/(θ + k)}α−1 {θ/(α-1)} = θα / {(α−1)(θ + k)α−1}.
35.25. E. Using the solution to the prior question, but with 0.8α rather than α, the expected amount
of annual losses paid by the insurer for Risk 2 is: θ0.8α/ {(0.8α - 1)(θ + k).8α−1}.
That for Risk 1 is: θα / {(α−1)(θ + k)α−1}. The ratio is: {(α-1)/(0.8α - 1)} (θ + k).2α / θ.2α.
As k goes to infinity, this ratio goes to infinity.
Comment: The loss distribution of Risk 2 has a heavier tail than Risk 1. The pricing of very large
deductibles is very sensitive to the value of the Pareto shape parameter, α.
35.26. B. LER = E[X ∧ x] / E[X]. ⇒ LER(10,000) / LER (1000) = E[X ∧ 10000] / E[X ∧ 1000].

E[X ∧ 10,000] = e8.059Φ[0] + 10000 {1 − Φ[1.52]} = (3162)(0.5) + 10000(1 - 0.9357) = 2224.
E[X ∧ 1000] = e8.059Φ[-1.52] + 1000 {1−Φ[0]} = (3162)(0.0643) + 1000(0.5) = 703.
E[X ∧ 10000] / E[X ∧ 1000] = 2224 / 703 = 3.16.
35.27. E. F(20) = (20)(0.010) = 0.2. S(20) = 1 - 0.2 = 0.8.

f(x) = 0.01, 0 ≤ x ≤ 80.
From 80 to 120 the graph is linear, and it is 0 at 120 and 0.010 at 80.
⇒ f(x) = (0.01)(120 - x)/40 = 0.03 - 0.00025x, 80 ≤ x ≤ 120.
80 120
E[X] = ∫0 0.01x dx + 80∫ x (0.03 - 0.00025x) dx =
x = 80 x = 120 x = 120
0.01x2 / 2 ] + 0.03x2 / 2 ] - 0.00025x3 / 3 ] = 50.67.
x= 0 x = 80 x = 80
20
E[X ∧ 20] = ∫0 0.01x dx + 20S(20) = 2 + (20)(0.8) = 18.
LER(20) = E[X ∧ 20]/E[X] = 18 / 50.67 = 35.5%.
35.28. E. E[X ∧ 2] = 0f(0) + 1f(1) + 2{1 - f(0) - f(1)} = 2 - 2f(0) - f(1) = 2 - 2e-3 -3e-3 = 2 - 5e-3.
Loss Elimination Ratio = E[X ∧ 2]/E[X] = (2 - 5e-3)/3 = 0.584.
We told that, α = 1 - LER = 1 - 0.584 = 0.416.
Comment: α is set equal to the excess ratio for a deductible of 2.
2016-C-2, Loss Distributions, §36 Inflation HCM 10/21/15, Page 620
Section 36, The Effects of Inflation
Inflation is a very important consideration when pricing Health Insurance and Property/Casualty
Insurance. Important ideas include the effect of inflation when there is a maximum covered loss
and/or deductible, in particular the effect on the average payment per loss and the average
payment per payment, the effect on other quantities of interest, and the effect on size of loss
distributions. Memorize the formulas for the average sizes of payment including inflation,
discussed in this section!
On this exam, we deal with the effects of uniform inflation, meaning that a single inflation factor is
applied to all sizes of loss.271 For example, if there is 5% inflation from 1999 to the year 2000, we
assume that a loss of size x in 1999 would have been of size 1.05x if it had instead occurred in the
year 2000.
Effect of a Maximum Covered Loss:
Exercise: You are given the following:

• For 1999 the amount of a single loss has the following distribution:
Amount Probability
$500 20%
$1,000 30%
$5,000 25%
$10,000 15%
$25,000 10%
• An insurer pays all losses after applying a $10,000 maximum covered loss to each loss.
• Inflation of 5% impacts all losses uniformly from 1999 to 2000.
Assuming no change in the maximum covered loss, what is the inflationary impact on dollars paid by
the insurer in the year 2000 as compared to the dollars the insurer paid in 1999?
[Solution: One computes the average amount paid by the insurer per loss in each year:
1999 Amount 1999 2000 Amount 2000
Probability of Loss Insurer Payment of Loss Insurer Payment
0.20 500 500 525 525
0.30 1,000 1,000 1,050 1,050
0.25 5,000 5,000 5,250 5,250
0.15 10,000 10,000 10,500 10,000
0.10 25,000 10,000 26,250 10,000
Average 5650.00 4150.00 5932.50 4232.50
4232.50 / 4150 = 1.020, therefore the insurerʼs payments increased 2.0%.]
271
Over a few years inflation can often be assumed to be approximately uniform by size of loss. However, over
longer periods of time the larger losses often increase at a different rate than the smaller losses.
Inflation on the limited losses is 2%, less than that of the total losses. Prior to the application of the
maximum covered loss, the average loss increased by the overall inflation rate of 5%, from 5650 to
5932.5. In general, for a fixed limit, limited losses increase more slowly than the overall
rate of inflation.
Effect of a Deductible:

Amount Probability
$500 20%
$1,000 30%
$5,000 25%
$10,000 15%
$25,000 10%
• An insurer pays all losses after applying a $1000 deductible to each loss.
Assuming no change in the deductible, what is the inflationary impact on dollars paid by the insurer in
the year 2000 as compared to the dollars the insurer paid in 1999?
1999 Amount 1999 2000 Amount 2000
0.20 500 0 525 0
0.30 1,000 0 1,050 50
0.25 5,000 4,000 5,250 4,250
0.15 10,000 9,000 10,500 9,500
0.10 25,000 24,000 26,250 25,250
Average 5650.00 4750.00 5932.50 5027.50
Inflation on the losses excess of the deductible is 5.8%, greater than that of the total losses. Prior to
the application of the deductible, the average loss increased by the overall inflation rate of 5%, from
5650 to 5932.5. In general, for a fixed deductible, losses paid excess of the deductible
increase more quickly than the overall rate of inflation.
The Loss Elimination Ratio in 1999 is: (5650 - 4750) / 5650 = 15.9%.
The Loss Elimination Ratio in 2000 is: (5932.5 -5027.5) / 5932.5 = 15.3%.
In general, under uniform inflation for a fixed deductible amount the LER declines.
The effect of a fixed deductible decreases over time.
Similarly, under uniform inflation the Excess Ratio over a fixed amount increases.272
If a reinsurer were selling reinsurance excess of a fixed limit such as $1 million, then over time
the losses paid by the reinsurer would be expected to increase faster than the overall rate of
inflation, in some cases much faster.
Limited Losses increase slower than the total losses.

Excess Losses increase faster than total losses.
Limited Losses plus Excess Losses = Total Losses.
Graphical Examples:
Assume for example that losses follow a Pareto Distribution with α = 3 and θ = 5000 in the earlier
year.273 Assume that there is 10% inflation and the same limit in both years. Then the increase in
limited losses as a function of the limit is:
(%
Inflation (%)
Limit
10000 20000 30000 40000 50000
As the limit increases, so does the rate of inflation. For no limit the rate is 10%.
272
See 3, 11/00, Q.42.
273
How to work with the Pareto and other continuous size of loss distributions under uniform inflation will be
discussed subsequently.
If instead there were a fixed deductible, then the increase in losses paid excess of the deductible as
a function of the deductible is:
Inflation (%)
24
22
20
18
16
14
12
Deductible
2000 4000 6000 8000 10000
For no deductible the rate of inflation is 10%. As the size of the deductible increases, the losses
excess of the deductible becomes “more excess”, and the rate of inflation increases.
Effect of a Maximum Covered Loss and Deductible, Layers of Loss:

Amount Probability
$500 20%
$1,000 30%
$5,000 25%
$10,000 15%
$25,000 10%
• An insurer pays all losses after applying a $10,000 maximum covered loss
and then a $1000 deductible to each loss.
• Inflation of 5% impacts all loss uniformly from 1999 to 2000.
Assuming no change in the deductible or maximum covered loss, what is the inflationary impact on
dollars paid by the insurer in the year 2000 as compared to 1999?
1999 Amount 1999 2000 Amount 2000
0.20 500 0 525 0
0.30 1,000 0 1,050 50
0.25 5,000 4,000 5,250 4,250
0.15 10,000 9,000 10,500 9,000
0.10 25,000 9,000 26,250 9,000
Average 5650.00 3250.00 5932.50 3327.50
In this case, the layer of loss from 1000 to 10,000 increased more slowly than the overall rate of
inflation. However, there were two competing effects. The deductible made the rate of increase
larger, while the maximum covered loss made the rate of increase smaller. Which effect dominates
depends on the particulars of a given situation.
For example, for the ungrouped data in Section 1, the dollars of loss in various layers are as follows
for the original data and the revised data after 50% uniform inflation:
LAYER ($ million) Dollars of Loss ($000)
Bottom Top Original Data Revised Data Ratio
0 0.5 24277 30174 1.24
0.5 1 6424 10239 1.59
1 2 4743 8433 1.78
2 3 2441 4320 1.77
3 4 1961 2661 1.36
4 5 802 2000 2.49
5 6 0 1942 infinite
6 7 0 1000 infinite
7 8 0 203 infinite
0 infinity 40648 60972 1.50
We observe that the inflation rate for higher layers is usually higher than the uniform inflation rate, but
not always. For the layer from 3 to 4 million dollars the losses increase by 36%, which is less than
the overall rate of 50%.
A layer (other than the very first) “gains” dollars as loss sizes increase and are thereby pushed
above the bottom of the layer. For example, a loss of size 0.8 million would contribute nothing to
the layer from 1 to 2 million prior to inflation, while after 50% inflation it would be of size 1.2 million,
and would contribute 0.2 million. In addition, losses which were less than the top of the layer and
more than the bottom of the layer, now contribute more dollars to the layer. For example, a loss of
size 1.1 million would contribute 0.1 million to the layer from 1 to 2 million prior to inflation, while after
50% inflation it would be of size 1.65 million, and would contribute 0.65 million to this layer. Either of
these two types of increases can be very big compared to the dollars that were in the layer prior to
the effects of inflation.
On the other hand, a loss whose size was bigger than the top of a given layer, contributes no more
to that layer no matter how much it grows. For example, a loss of size 3 million would contribute
1 million to the layer from 1 to 2 million prior to inflation, while after 50% inflation it would be of size
4.5 million, and would still contribute 1 million. A loss of size 3 million has already contributed the
width of the layer, and that is all that any single loss can contribute to that layer. So for such losses
there is no increase to this layer.
Thus for an empirical sample of losses, how inflation impacts a particular layer depends how the
varying effects from the various sizes of losses contribute to the combined effect.
Manners of Expressing the Amount of Inflation:
There are a number of different ways to express the amount of inflation:

1. State the total amount of inflation from the earlier year to the later year.
2. Give a constant annual inflation rate.
3. Give the different amounts of inflation during each annual period between the earlier and
later year.
4. Give the value of some consumer price index in the earlier and later year.
In all cases, you want to determine the total inflation factor, (1+r), to get from the earlier
year to the later year.
For example, from the year 2001 to 2004, inflation might be:
1. A total of 15%; 1 + r = 1.15.
2. 4% per year; 1 + r = (1.04)3 = 1.125.
3. 7% between 2001 and 2002, 4% between 2002 and 2003, and 5% between 2003 and 2004;
1 + r = (1.07)(1.04)(1.05) = 1.168.
4. The CPI (Consumer Price Index) was 327 in 2001 and is 366 in 2004; 1 + r = 366/327 = 1.119.
Moments, etc.:
If one multiplies all of the loss sizes by 1.1, then the mean is also multiplied by 1.1.
E[1.1X] = 1.1 E[X].
Since each loss is multiplied by the inflation factor, (1+r), so are the Mean, Mode and
Median of the distribution.
Any percentile of the distribution is also multiplied by (1+r); in fact this is the definition of
inflation uniform by size of loss.
Any quantity in dollars is expected to be multiplied by the inflation factor, 1+ r.
If one multiplies all of the loss sizes by 1.1, then the second moment is multiplied by 1.12 .
E[(1.1X)2 ] = 1.12 E[X2 ].
E[{(1+r)X}n ] = (1+r)n E[Xn ].

In general, under uniform inflation the nth moment is multiplied by (1+r)n .
Exercise: In 2003 the mean loss is 100 and the second moment is 50,000. Between 2003 and
2004 there is 5% inflation. What is the variance of the losses in 2004?
[Solution: In 2004, the mean is: (1.05)(100) = 105, and the second moment is: (1.052 )(50000) =
55,125. Thus in 2004, the variance is: 55,125 - 1052 = 44,100.]
The variance in 2003 was 50,000 - 1002 = 40,000. The variance increased by a factor of:
44,100/40,000 = 1.1025 = 1.052 = (1+r)2 .
Var[(1+r)X] = E[{(1+r)X}2 ] - E[(1+r)X]2 = (1+r)2 E[X2 ] - (1+r)2 E[X]2 = (1+r)2 Var[X].
Under uniform inflation, the Variance is multiplied by (1+r)2 . Any quantity in dollars squared
is expected to the multiplied by the square of the inflation factor, (1+r)2 .
Since the Variance is multiplied by (1+r)2 , the Standard Deviation is multiplied by (1+r).
CV, Skewness, and Kurtosis:
Exercise: In 2003 the mean loss is 100 and the second moment is 50,000. Between 2003 and
2004 there is 5% inflation. What is the coefficient of variation of the losses in 2004?
[Solution: In 2004, the mean is 105, and the standard deviation is 44,100 = 210.
The Coefficient of Variation is: 210 / 105 = 2.]
In this case, the CV for 2003 is: 40,000 / 100 = 2. Thus the coefficient of variation remained the
same. CV = standard deviation / mean, and in general both the numerator and denominator are
multiplied by (1+r), and therefore the CV remains the same.
Skewness = (3rd central moment)/ standard deviation3 .

Both the numerator and denominator are in dollars cubed, and under uniform inflation they are each
multiplied (1+r)3 . Thus the skewness is unaffected by uniform inflation.
Kurtosis = (4th central moment)/ standard deviation4 .

Both the numerator and denominator are in dollars to the fourth power, and under uniform inflation
they are each multiplied (1+r)4 . Thus the kurtosis is unaffected by uniform inflation.
The Coefficient of Variation, the Skewness, and the Kurtosis are each unaffected by
uniform inflation. Each is a dimensionless quantity, which helps to describe the shape of a
distribution and is independent of the scale of the distribution.
Limited Expected Values:
As discussed previously, losses limited by a fixed limit increase slower than the rate of inflation. For
example, if the expected value limited to $1 million is $300,000 in the prior year, then after uniform
inflation of 10%, the expected value limited to $1 million is less than $330,000 in the later year.

Amount Probability
$500 20%
$1,000 30%
$5,000 25%
$10,000 15%
$25,000 10%
An insurer pays all losses after applying a maximum covered loss to each loss.
The maximum covered loss in 1999 is $10,000.
The maximum covered loss in 2000 is $10,500, 5% more than that in 1999.
What is the inflationary impact on dollars paid by the insurer in the year 2000 as compared to the
dollars the insurer paid in 1999?
1999 Amount 1999 2000 Amount 2000
0.20 500 500 525 525
0.30 1,000 1,000 1,050 1,050
0.25 5,000 5,000 5,250 5,250
0.15 10,000 10,000 10,500 10,500
0.10 25,000 10,000 26,250 10,500
Average 5650.00 4150.00 5932.50 4357.50
On exam questions, the maximum covered loss would usually be the same in the two years.
In that case, as discussed previously, the insurerʼs payments would increase at 2%, less than the
overall rate of inflation. In this exercise, instead the maximum covered loss was increased in order to
keep up with inflation. The result was that the insurerʼs payments, the limited expected value,
increased at the overall rate of inflation.
Provided the limit keeps up with inflation, the Limited Expected Value is multiplied by
the inflation factor.274 If we increase the limit at the rate of inflation, then the Limited Expected
Value, which is in dollars, also keeps up with inflation.
274
As discussed previously, if rather than being increased in order to keep up with inflation the limit is kept fixed,
then the limited losses increase slower than the overall rate of inflation.
Exercise: The expected value limited to $1 million is $300,000 in the 2007.

There is 10% uniform inflation between 2007 and 2008.
What is the expected value limited to $1.1 million in 2008?
[Solution: Since the limit kept up with inflation, ($300,000)(1.1) = $330,000.]
Proof of the Result for Limited Expected Values:
The Limited Expected Value is affected for two reasons by uniform inflation. Each of the losses
entering into its computation is multiplied by (1+r), but in addition the relative effect of the limit has
been affected. Due to the combination of these two effects it turns out that if Z = (1+r) X,
then E[Z ∧ u(1+r)] = (1+r) E[X ∧ u].
In terms of the definition of the Limited Expected Value:

u(1+r)
E[Z ∧ u(1+r)] = ∫0 z fZ(z) dz + {SZ(u(1+r))} {u(1+r)} =
∫0 x fX(x) dx + {SX(u)} {L(1+u)} = (1+r) E[X ∧ u].

Where we have applied the change of variables, z = (1+r) x and thus FZ(L(1+r)) = FX(L),
and fX(x) dx = fZ(z) dz.
We have shown that E[(1+r)X ∧ u(1+r)] = (1+r) E[X ∧ u]. The left hand side is the Limited
Expected Value in the later year, with a limit of u(1+r); we have adjusted u, the limit in the prior year,
in order to keep up for inflation via the factor 1+r. This yields the Limited Expected Value in the prior
year, except multiplied by the inflation factor to put them in terms of the subsequent yearʼs dollars,
which is the right hand side.
Mean Excess Loss:

Amount Probability
$500 20%
$1,000 30%
$5,000 25%
$10,000 15%
$25,000 10%
Compute the mean excess loss at $3000 in 1999.
[Solution: In 1999, e(3000) = {(2000)(.25) + (7000)(.15) + (22,000)(.1)}/(.25 + .15 + .1) = 7500.
In 2000, e(3000) = {(5250 - 3000)(.25) + (10,500 - 3000)(.15) + (26,250 - 3000)(.1)}/.5 = 8025.
In 2000, e(3150) = {(5250 - 3150)(.25) + (10,500 - 3150)(.15) + (26,250 - 3150)(.1)}/.5 = 7875.]
In this case, if the limit is increased for inflation, from 3000 to (1.05)(3000) = $3150 in 2000, then the
mean excess loss increases by the rate of inflation; (1.05)(7500) = 7875.
The mean excess loss in the later year is multiplied by the inflation factor, provided the
limit has been adjusted to keep up with inflation.
Exercise: The mean excess loss beyond $1 million is $3 million in 2007.

What is the mean excess loss beyond $1.1 million in 2008?
[Solution: Since the limit kept up with inflation, ($3 million)(1.1) = $3.3 million.]
If the limit is fixed, then the behavior of the mean excess loss, depends on the particular size of loss
distribution.275
Proof of the Result for the Mean Excess Loss:
The Mean Excess Loss or Mean Residual Life at L in the prior year is given by
eX(L) = {E[X] - E[X ∧ L]} / S(L).
Letting Z = (1+r)X, the mean excess loss at L(1+r) in the latter year is given by
eZ(L(1+r)) = { E[Z] - E[Z ∧ L(1+r)] } / SZ(L(1+r)) =
{(1+r)E[X] - (1+r)E[X ∧ L] } / {SX(L)} = (1+r)eX(L).
275
As was discussed in a previous section, different distributions have different behaviors of the mean excess loss
as a function of the limit.
Loss Elimination Ratio:
As discussed previously, for a fixed deductible, the Loss Elimination Ratio declines under uniform
inflation. For example, if the LER(1000) = 13% in the prior year, then after uniform inflation,
LER(1000) is less than 13% in the latter year.

Amount Probability
$500 20%
$1,000 30%
$5,000 25%
$10,000 15%
$25,000 10%
An insurer pays all losses after applying a deductible to each loss.
The deductible in 1999 is $1000.
The deductible in 2000 is $1050, 5% more than that in 1999.
Compare the loss elimination ratio in the year 2000 to that in the year 1999.
1999 Amount 1999 2000 Amount 2000
0.20 500 0 525 0
0.30 1,000 0 1,050 0
0.25 5,000 4,000 5,250 4,200
0.15 10,000 9,000 10,500 9,450
0.10 25,000 24,000 26,250 25,200
Average 5650.00 4750.00 5932.50 4987.50
The Loss Elimination Ratio in 1999 is: 1 - 4750/5650 = 15.9%.
The Loss Elimination Ratio in 2000 is: 1 - 4987.5/5932.5 = 15.9%.
Comment: 4987.50 / 4750 = 1.050, therefore the insurerʼs payments increased 5.0%.]
On exam questions, the deductible would usually be the same in the two years. In that case, as
discussed previously, the loss elimination ratio would decrease from 15.9% to 15.3%. In this
exercise, instead the deductible was increased in order to keep up with inflation. The result was that
the insurerʼs payments increased at the overall rate of inflation, and the loss elimination ratio stayed
the same.
The Loss Elimination Ratio in the later year is unaffected by uniform inflation, provided
the deductible has been adjusted to keep up with inflation.276
276
As discussed above, for a fixed deductible the Loss Elimination Ratio decreases under uniform inflation.
Exercise: The Loss Elimination Ratio for a deductible of $1000 is 13% in 2007.
What is the Loss Elimination ratio for a deductible of $1100 in 2008?
[Solution: Since the deductible keeps up with inflation, the Loss Elimination Ratio is the same in
2008 as in 2007, 13%.]
Since the Excess Ratio is just unity minus the LER, the Excess Ratio in the latter year is unaffected
by uniform inflation, provided the limit has been adjusted to keep up with inflation.277
Proof of the Result for Loss Elimination Ratios:
The Loss Elimination Ratio at d in the prior year is given by LERX(d) = E[X ∧ d] / E[X].
Letting Z = (1+r)X, the Loss Elimination Ratio at d(1+r) in the latter year is given by LERZ(d(1+r)) =
E[Z ∧ d(1+r)] / E[Z] = (1+r)E[X ∧ d] / {(1+r)E[X]} = E[X ∧ d] / E[X] = LERX(d).
Using Theoretical Loss Distributions:
It would also make sense to use continuous distributions, obtained perhaps from fitting to a data set,
in order to estimate the impact of inflation. We could apply a factor of 1+r to every loss in the data
set and then fit a distribution to the altered data. In most cases, it would be a waste of time fitting new
distributions to the data modified by the uniform effects of inflation. For most size of loss
distributions, after uniform inflation one gets the same type of distribution with the scale parameter
revised by the inflation factor. For example, for a Pareto Distribution with parameters α = 1.702 and
θ = 240,151, under uniform inflation of 50% one would get another Pareto Distribution with
parameters: α = 1.702, θ = (1.5)(240,151) = 360,360.278
Behavior of Specific Distributions under Uniform Inflation of (1+r):
For the Pareto, θ becomes θ(1+r). The Burr and Generalized Pareto have the same
behavior. Not coincidentally, for these distributions the mean is proportional to θ. As discussed in a
previous section, theta is the scale parameter for these distributions; everywhere x appears in the
Distribution Function it is divided by θ. In general under inflation, scale parameters are transformed
under inflation by being multiplied by (1+r). For the Pareto the shape parameter α remains the
same. For the Burr the shape parameters α and γ remain the same. For the Generalized Pareto
the shape parameters α and τ remain the same.
277
As discussed above, for a fixed limit the Excess Ratio increases under uniform inflation.
278
Prior to inflation,this the Pareto fit by maximum likelihood to the ungrouped data in Section 1.
Similarly, for the Gamma, and Weibull, θ becomes θ(1+r). The Transformed Gamma has the
same behavior. As parameterized in Loss Models, theta is the scale parameter for the Gamma,
Weibull, and Transformed Gamma distributions. For the Gamma the shape parameter α remains the
same. For the Weibull the shape parameter τ remains the same. For the Transformed Gamma the
shape parameters α and τ remain the same. Since the Exponential is a special case of the
Gamma, for the Exponential θ becomes θ(1+r), under uniform inflation of 1+r.
Exercise: In 2001 losses follow a Gamma Distribution with parameters α = 2 and θ = 100.
There is 10% inflation in total between 2001 and 2004. What is loss distribution in 2004?
[Solution: Gamma with α = 2 and θ = (1.1)(100) = 110.]
The behavior of the LogNormal under uniform inflation is explained by noting that multiplying each
loss by a factor of (1+r) is the same as adding a constant amount ln(1+r) to the log of each loss.
Adding a constant amount to a Normal distribution, gives another Normal Distribution, with the same
variance but with the mean shifted. µʼ = µ + ln(1+r), and σʼ = σ.
X ~ LogNormal(µ , σ). ⇔ ln(X) ~ Normal(µ , σ). ⇒
ln[(1+r)X] = ln(X) + ln(1+r) ~ Normal(µ , σ) + ln(1+r) = Normal(µ + ln(1+r), σ).
⇔ (1+r)X ~ LogNormal(µ + ln(1+r), σ). Thus under uniform inflation for the LogNormal, µ
becomes µ + ln(1+r). The other parameter, σ, remains the same.
The behavior of the LogNormal under uniform inflation can also be explained by the fact that
e µ is the scale parameter and σ is a shape parameter. Therefore, eµ is multiplied by (1+r);
eµ becomes eµ(1+r) = eµ+ln(1+r). Therefore, µ becomes µ + ln(1+r).
Exercise: In 2001 losses follow a LogNormal Distribution with parameters µ = 5 and σ = 2.

There is 10% inflation in total between 2001 and 2004. What is loss distribution in 2004?
[Solution: LogNormal with µ = 5 + ln(1.1) = 5.095, and σ = 2.]
Note that in each case, the behavior of the parameters under uniform inflation depends on the
particular way in which the distribution is parameterized. For example, in Loss Models the
Exponential distribution is given as: F(x) = 1 - e-x/θ. Thus in this parameterization of the Exponential,
θ acts as a scale parameter, and under uniform inflation θ becomes θ(1+r). This contrasts with the
parameterization of the Exponential in Life Contingencies, F(x) = 1 - e-λx, where 1/λ acts as a scale
parameter, and under uniform inflation λ becomes λ/(1+r). θ ⇔ 1/λ.

Conveniently, most of the distributions in Loss Models have a scale parameter, which is multiplied
by (1+r), while the shape parameters are unaffected. Exceptions are the LogNormal Distribution
and the Inverse Gaussian.279
Note that all of the members of “Transformed Beta Family” all act similarly.280 The scale parameter θ
is multiplied by the inflation factor and all of the shape parameters remain the same. All of the
members of the “Transformed Gamma Family” all act similarly.281 The scale parameter θ is multiplied
by the inflation factor and all of the shape parameters remain the same.
Distribution Parameters Prior to Inflation Parameters After Inflation
Pareto α θ α θ(1+r)
Generalized Pareto α θ τ α θ(1+r) τ
Burr α θ γ α θ(1+r) γ
Inverse Burr τ θ γ τ θ(1+r) γ
Transformed Beta α θ γ τ α θ(1+r) γ τ

Inverse Pareto τ θ τ θ(1+r)
Loglogistic γ θ γ θ(1+r)
Paralogistic α θ α θ(1+r)
Inverse Paralogistic τ θ τ θ(1+r)
Exponential θ θ(1+r)
Gamma α θ α θ(1+r)
Weibull θ τ θ(1+r) τ
Inverse Gamma α θ α θ(1+r)
Inverse Weibull θ τ θ(1+r) τ

Trans. Gamma α θ τ α θ(1+r) τ
Inv. Trans. Gamma α θ τ α θ(1+r) τ
279
This is discussed along with the behavior under uniform inflation of the LogNormal and Inverse Gaussian, in
Appendix A of Loss Models. However it is not included in the Tables attached to the exam.
280
See Figures 5.2 and 5.4, and Appendix A of Loss Models.
281
See Figure 5.3 and Appendix A of Loss Models
Distribution Parameters Prior to Inflation Parameters After Inflation
Normal µ σ µ(1+r) σ(1+r)

LogNormal µ σ µ + ln(1+r) σ
Inverse Gaussian µ θ µ(1+r) θ(1+r)
Single Par. Pareto α θ α θ(1+r)
Uniform Distribution a b a(1+r) b(1+r)
Beta Distribution a b θ a b θ(1+r)

Generalized Beta Dist. a b θ τ a b θ(1+r) τ
Note that all the distributions in the above table are preserved under uniform inflation.
After uniform inflation we get the same type of distributions, but some or all of the parameters have
changed. If X follows a type of distribution implies that cX, for any c > 0, also follows the same type
of distribution, then that is defined as a scale family.
So for example, the Inverse Gaussian is a scale family of distributions, even though it does not have
scale parameter. If X follows an Inverse Gaussian, then Y = cX also follows an Inverse Gaussian.
Any distribution with a scale parameter is a scale family.
In order to compute the effects of uniform inflation on a loss distribution, one can adjust the
parameters as in the above table. Then one can work with the loss distribution revised by inflation in
the same manner one would work with any loss distribution.
Exercise: Losses prior to inflation follow a Pareto Distribution with parameters α = 1.702 and
θ = 240,151. Losses increase uniformly by 50%.
What are the means prior to and subsequent to inflation?
[Solution: For the Pareto Distribution, E[X] = θ/(α−1).
Prior to inflation, E[X] = 240,151 / 0.702 = 342,095.
After inflation, with parameters α = 1.702 and θ = (1.5)(240,151) = 360,227.
After inflation, E[X] = 360,227 / 0.702 = 513,143.
Alternately, inflation increases the mean by 50% to: (1.5)(342,095) = 513,143.]
θ = 240,151. Losses increase uniformly by 50%.
What are the limited expected values at 1 million prior to and subsequent to inflation?
θ ⎧ ⎛ θ ⎞ α− 1⎫
[Solution: For the Pareto Distribution, E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬.
α −1 ⎩ ⎝ θ + x⎠ ⎭
Prior to inflation, E[X ∧ 1 million] = (240,151 / 0.702) {1 - (240,151/1,240,151).702} = 234,044.

After inflation, E[X ∧ 1 million] = (360,227 / 0.702) {1 - (360,227/1,360,227).702} = 311,232.]
θ = 240,151. Losses increase uniformly by 50%. Excess Ratio = 1 - LER.
What are the excess ratios at 1 million prior to and subsequent to inflation?
[Solution: Excess ratio = R(x) = (E[X] - E[X ∧ x]) / E[X] = 1 - E[X ∧ x] / E[X].
Prior to inflation, R(1 million) = 1 - 234,044 / 342,095 = 31.6%.
After inflation, R(1 million) = 1 - 311,232 / 513,143 = 39.3%.
Comment: As expected, for a fixed limit the Excess Ratio increases under uniform inflation.
⎛ θ ⎞ α− 1
For the Pareto the excess ratio is given by R(x) = ⎜ ⎟ .]
⎝θ + x ⎠
Behavior in General of Distributions under Uniform Inflation of (1+r):
For distributions in general, including those not discussed in Loss Models, one can determine the
behavior under uniform inflation as follows. One makes the change of variables Z = (1+r) X.
For the Distribution Function one just sets FZ(z) = FX(x); one substitutes x = z / (1+r).
Alternately, for the density function fZ(z) = fX(x) / (1+r).282
(x -µ)2
exp[- ]
2σ2 . Under uniform inflation, x = z/(1+r) and
For example, for the Normal Distribution f(x) =
σ 2π
(z / (1+r) -µ)2 {z - µ(1+ r)}2

exp[- ] exp[- ]
2σ 2 2 {σ(1+ r)}2
fZ(z) = fX(x) / (1+r) = = .
(1+ r) σ 2 π (1+ r) σ 2 π
This is a Normal density function with sigma and mu each multiplied by (1+r). Thus under inflation, for
the Normal µ becomes µ(1+r) and σ becomes σ(1+r). The location parameter µ has been
multiplied by the inflation factor, as has the scale parameter σ.
282
Under change of variables you need to divide by dz/dx = 1+ r, since fZ(z) = dF/dz = (dF/dx) / (dz/dx) = fX(x) / (1+r).
Alternately, the Distribution Function for the Normal is Φ[(x-µ)/σ]. Therefore, FZ(z) = FX(x) =
Φ[(x-µ)/σ] = Φ[({z/(1+r)}-µ)/σ] = Φ[{z- µ(1+r)}/{σ(1+r)}]. This is the Distribution Function for a Normal
with sigma and mu each multiplied by (1+r), which matches the previous result.
Exercise: What is the behavior under inflation of the distribution function:

F(x) = xa / (xa + ba), x > 0.
[Solution: Under uniform inflation, FZ(z) = FX(x) = xa / (xa + ba) = {z/(1+r)}a / ({z/(1+r)}a + ba) =
za/(za + {b(1+r)}a). This is the same type of distribution, where b has become b (1+r).
The scale parameter b has been multiplied by the inflation factor (1+r).
Alternately, one can work with the density function f(x) = a ba xa-1 / (xa + ba)2 =
(a/b)(x/b)a-1 / (1+ (x/b)a)2 . Then under uniform inflation: x = z/(1+r) and fZ(z) = fX(x) / (1+r) =
(a/b)(x/b)a-1 / {(1+r)(1+ (x/b)a)2 } = (a / {b (1+r)})(z / {b (1+r)} )a-1 / { (1+ (z / {b (1+r)})a)2 },

which is same type of density, where b has become b(1+r), as was shown previously.
Alternately, you can recognize b is a scale parameter, since F(x) = (x/b)a / {(x/b)a + 1}.
Or alternately, you can recognize that this is Loglogistic Distribution with a = γ and b = θ. ]
Exercise: What is the behavior under uniform inflation of the density function:
⎛ x ⎞2
θ ⎜ − 1⎟
f(x) =
θ exp [-
⎝µ ⎠
2x .
]
2π x1.5
[Solution: In general one substitutes for x = z / (1+r), and for the density function fZ(z) = fX(x) / (1+r).
⎛ z ⎞2 ⎛ z ⎞2
θ⎜ − 1⎟ θ (1+ r) ⎜ − 1⎟
θ
exp -[ ⎝ µ(1+ r) ⎠
2 {z / (1+ r)}
] [
θ(1+ r) exp -
⎝ µ(1+ r) ⎠
2z
]
fZ(z) = fX(x) / (1+r) = = .
2π (1+ r) {z / (1+r)}1.5 2π z1.5
This is of the same form, but with parameters (1+r)µ and (1+r)θ, rather than µ and θ.]
Thus we have shown that under uniform inflation for the Inverse Gaussian Distribution µ and θ
become (1+r)µ and (1+r)θ.
Behavior of the Domain of Distributions:
For all of the distributions discussed so far, the domain has been 0 to ∞. For the Single Parameter
Pareto distribution the domain is x > θ. Under uniform inflation the domain becomes x > (1+r)θ.
In general, the domain [a, b] becomes under uniform inflation [(1+r)a, (1+r)b].
If a = 0, multiplying by 1+r has no effect; if b = ∞, multiplying by 1+r has no effect. So for
distributions like the Gamma, the domain remains (0, ∞) after uniform inflation.
For the Single Parameter Pareto F(x) = 1 - (x / θ)−α, x > θ, under uniform inflation α is unaffected
and θ becomes (1+r)θ.
The uniform distribution on [a , b] becomes under uniform inflation the uniform
distribution on [a(1+r) , b(1+r)].
Working in Either the Earlier or Later Year:
θ = 240,151. Losses increase uniformly by 50%. What is the average contribution per loss to the
layer from 1 million to 5 million, both prior to and subsequent to inflation?
θ ⎧ ⎛ θ ⎞ α− 1⎫
[Solution: For the Pareto Distribution, E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬.
α −1 ⎩ ⎝ θ + x⎠ ⎭
Prior to inflation, E[X ∧ 5 million] = (240,151 / 0.702){1 - (240,151/5,240,151).702} = 302,807.

After inflation, losses follow a Pareto with α = 1.702 and θ = (1.5)(240,151) = 360,227, and
E[X ∧ 5 million] = (360,227 / 0.702){1 - (360,227/5,360,227).702} = 436,041.
Prior to inflation, the average loss contributes: E[X ∧ 5 million] - E[X ∧ 1 million] =
302,807 - 234,045 = 68,762, to this layer.
After inflation, the average loss contributes: 436,041 - 311,232 = 124,809, to this layer.]
The contribution to this layer has increased by 82%, in this case more than the overall rate of inflation.
There are two alternative ways to solve many problems involving inflation. In the above solution,
one adjusts the size of loss distribution in the earlier year to the later year based on the amount of
inflation. Then one calculates the quantity of interest in the later year. However, there is an alternative,
which many people will prefer. Instead one calculates the quantity of interest in the earlier year at its
deflated value, and then adjusts it to the later year for the effects of inflation. Hereʼs how this alternate
method works for this example.
A limit of 1 million in the later year corresponds to a limit of 1 million /1.5 = 666,667 in the earlier year.
Similarly, a limit of 5 million in the later year corresponds to 5 million /1.5 = 3,333,333 in the earlier
year. Using the Pareto in the earlier year, with α = 1.702 and θ = 240,151,
E[X ∧ 666,667] = (240,151 / 0.702){1 - (240,151/906,818)0.702} = 207,488, and
E[X ∧ 3,333,333] = (240,151 / 0.702){1 - (240,151/3,573,484)0.702} = 290,694. In terms of the
earlier year dollars, the contribution to the layer is: 290,694 - 207,488 = 83,206.
However, one has to inflate back up to the level of the later year: (1.5)(83,206) =124,809, matching
the previous solution.
This type of question can also be answered using the formula discussed subsequently for the
average payment per loss. This formula for the average payment per loss is just an application of
the technique of working in the earlier year, by deflating limits and deductibles. However, this
technique of working in the earlier year is more general, and also applies to other quantities of
interest, such as the survival function.
Exercise: Losses in 2003 follow a LogNormal Distribution with parameters µ = 3 and σ = 5.

Between 2003 and 2009 there is a total of 35% inflation.
Determine the percentage of the total number of losses in 2009 that would be expected to exceed
a deductible of 1000.
[Solution: The losses in year 2009 follow a LogNormal Distribution with parameters
µ = 3 + ln(1.35) = 3.300 and σ = 5. Thus in 2009, S(1000) = 1 - F(1000) =
1 - Φ[{ln(1000) - 3.300} / 5] = 1 - Φ[0.72] = 1 - 0.7642 = 0.2358.
Alternately, we first deflate to 2003. A deductible of 1000 in 2009 is equivalent to a deductible of
1000/1.35 = 740.74 in 2003. The losses in 2003 follow a LogNormal Distribution with parameters
µ = 3 and σ = 5. Thus in 2003, S(740.74) = 1 - F(740.47) =
1 - Φ[{ln(740.47) - 3}/ 5] = 1 - Φ[0.72] = 1 - 0.7642 = 0.2358.]
Of course both methods of solution produce the same answer. One can work either in terms of 2003
or 2009 dollars. In this case, the survival function is a dimensionless quantity. However, when
working with quantities in dollars, such as the limited expected value, if one works in the earlier year,
in this case 2003, one has to remember to reinflate the final answer back to the later year, in this case
2009.
Formulas for Average Payments:
The ideas discussed above can be put in terms of formulas:283
Given uniform inflation, with inflation factor of 1+r, Deductible Amount d, Maximum Covered
Loss u, and coinsurance factor c, then in terms of the values in the earlier year, the
insurerʼs average payment per loss in the later year is:
u d
(1+r) c { E[X ∧ ] - E[X ∧ ]}.
1+ r 1+ r
average payment per (non-zero) payment by the insurer in the later year is:
u d
E[X ∧ ] - E[X ∧ ]
(1+r) c 1+ r 1+ r .
⎛ d ⎞
S⎜ ⎟
⎝ 1+ r ⎠
In each case we have deflated the Maximum Covered Loss and the Deductible back to the earlier
year, computed the average payment in the earlier year, and then reinflated back to the later year.
Important special cases are: d = 0 ⇔ no deductible, L = ∞ ⇔ no maximum covered loss,
c = 1 ⇔ no coinsurance, r = 0 ⇔ no inflation or prior to the effects of inflation.
For example, assume losses in 2001 follow an Exponential distribution with θ = 1000.
There is a total of 10% inflation between 2001 and 2004. In 2004 there is a deductible of 500, a
maximum covered loss of 5000, and a coinsurance factor of 80%. Then the average payment per
(non-zero) payment in 2004 is computed as follows, using that for the Exponential Distribution,
E[X ∧ x] = θ(1 - e-x/θ).
Take d = 500, u = 5000, c = 0.8, and r = 0.1.

average payment per (non-zero) payment in 2004 =
(1+r) c (E[X ∧ u/(1+r)] - E[X ∧ d/(1+r)]) / S(d/(1+r)) =
(1.1)(0.8)(E[X ∧ 4545] - E[X ∧ 455]) / S(455) =
(0.88){1000(1 - e-4545/1000) - 1000(1 - e-455/1000)}/e-455/1000 = (0.88)(989 - 366)/0.634 = 865.
Note that all computations use the original Exponential Distribution in 2001, with θ = 1000.
283
Exercise: For a LogNormal Distribution with parameters µ = 3 and σ = 5, determine

E[X ∧ 100,000], E[X ∧ 1,000,000], E[X ∧ 74,074], and E[X ∧ 740,740].
[Solution: E[X ∧ 100,000] = exp(µ + σ2/2) Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]} =
exp(3 + 25/2)Φ[(ln(100,000) - 3 - 25)/5] + (100,000){1 - Φ[{ln(100000) - 3} / 5]} =
5,389,670Φ[-3.30] + (740,740){1 - Φ[1.70]} = 5,389,670(.0005) + (100000)(1 - 0.9554) = 7155.
E[X ∧ 1,000,000] =
exp(3 + 25/2)Φ[(ln(1,000,000) - 3 - 25)/5] + (1,000,000) {1 - Φ[{ln(1000000) - 3} / 5]} =
5,389,670Φ[-2.84] + (1,000,000){1 - Φ[2.16]} = 5,389,670(.0023) + (1000000)(1 - 0.9846) =
27,796.
E[X ∧ 74,074] = exp(3 + 25/2)Φ[(ln(74074) - 3 - 25)/5] + (74074){1 - Φ[{ln(74074) - 3} / 5]) =
5,389,670Φ[-3.36] + (74074){1 - Φ[1.64]} = 5,389,670(.0004) + (74074)(1 - 0.9495) = 5897.
E[X ∧ 740,740] = exp(3 + 25/2)Φ[(ln(740,740) - 3 - 25)/5] + (740,740){1 - Φ[{ln(740,740) - 3}/ 5]}
= 5,389,670Φ[-2.90] + (740,740){1 - Φ[2.10]} = 5,389,670(.0019) + (740,740)(1 - 0.9821) =
23,500.]

In 2009 there is a deductible of $100,000 and a maximum covered loss of $1 million. Determine
the increase between 2003 and 2009 in the insurerʼs average payment per loss to the insured.
[Solution: In 2003, take r = 0, d = 100,000, u = 1 million, and c = 1.
Average payment per loss = E[X ∧ 1 million] - E[X ∧ 100,000] = 27,796 - 7155 = 20,641.
In 2009, take r = 0.35, d = 100,000, u = 1 million, and c = 1.
Average payment per loss = 1.35 (E[X ∧ 1 million/1.35] - E[X ∧ 100000/1.35]) =
1.35 (E[X ∧ 740,740] - E[X ∧ 74074]) = 1.35 (23,500 - 5897) = 23,764.
The increase is: 23,764/20,641 - 1 = 15%.
Comment: Using a computer, the exact answer without rounding is: 23,554/20,481 - 1 = 15.0%.
Using the formula in order to get the average payment per loss in 2009 is equivalent to deflating to
2003, working in the year 2003, and then reinflating to the year 2009. The 2009 LogNormal has
parameters µ = 3 + ln(1.45) = 3.300 and σ = 5. For this LogNormal, E[X ∧ 100,000] =
exp(3.3 + 25/2)Φ[(ln(100,000) - 3.3 - 25)/5] + (100,000){1 - Φ[{ln(100,000) - 3.3} / 5]} =
7,275,332Φ[-3.36] + (100,000){1 - Φ[1.64]} = 7,275,332(.0004) + (100,000)(1 - 0.9495) = 7960.
For this LogNormal, E[X ∧ 1,000,000] =
exp(3.3 + 25/2)Φ[(ln(1,000,000) - 3.3 - 25)/5] + (1,000,000){1 - Φ[{ln(1,000,000) - 3.3} / 5]} =
7,275,332Φ[-2.90] + (1,000,000){1 - Φ[2.10]} = 7,275,332(.0019) + (1,000,000)(1 - 0.9821) =
31,723. 31,723 - 7960 = 23,763, matching the 23,764 obtained above except for rounding.]

In 2009 there is a deductible of $100,000 and a maximum covered loss of $1 million.
Determine the increase between 2003 and 2009 in the insurerʼs average payment per
(non-zero) payment to the insured.
[Solution: In 2003, take r = 0, d = 100,000, u = 1 million, and c = 1.
Average payment per non-zero payment = (E[X ∧ 1 million] - E[X ∧ 100,000])/S(100,000) =
(27,796 - 7155) / {1 - Φ[{ln(100,000) - 3} / 5]} = 20,641 / {1 - Φ[1.70]} = 20,641/0.0446 = 462,803.
In 2009, take r = 0.35, d = 100,000, d = 1 million, and c = 1.
Average payment per non-zero payment =
1.35 (E[X ∧ 1 million/1.35] - E[X ∧ 100,000/1.35])/S(100,000/1.35) =
1.35 (E[X ∧ 740,740] - E[X ∧ 74074])/S(74074) = 1.35 (23,500 - 5897)/{1 - Φ[1.64]} =
23,764/0.0505 = 470,574. The increase is: 470,574/462,803 - 1 = 1.7%.
Comment: Using a computer, the exact answer without rounding is: 468,852/462,085 - 1 = 1.5%.]
Formulas for Second Moments:284
We have previously discussed second moments of layers. We can incorporate inflation in a manner
similar to the formulas for first moments. However, since the second moment is in dollars squared,
we reinflate back by multiplying by (1+r)2 . Also we multiply by the coinsurance factor squared.
Loss u, and coinsurance factor c, then in terms of the values in the earlier year,
the second moment of the insurerʼs payment per loss in the later year is:
u 2 d 2 d u d
(1+r)2 c2 { E[(X ∧ ) ] - E[(X ∧ ) ] -2 ( E[X ∧ ] - E[X ∧ ])}.
1+ r 1+ r 1+ r 1+ r 1+ r
Loss u, and coinsurance factor c, then in terms of the values in the earlier year,
the average payment per (non-zero) payment by the insurer in the later year is:
u 2 d 2 d u d
E[(X ∧ ) ] - E[(X ∧ ) ] - 2 {E[X ∧ ] - E[X ∧ ]}
1+ r 1+ r 1+ r 1+ r 1+ r .
(1+r)2 c2
⎛ d ⎞
S
⎝ 1+ r ⎠
One can combine the formulas for the first and second moments in order to calculate the variance.
284
See Theorem 8.8 in Loss Models. If r = 0, these reduce to formulas perviously discussed.
Exercise: Losses in 2005 follow a Single Parameter Pareto Distribution with α = 3 and θ = 200.
In 2010 there is a deductible of 300, a maximum covered loss of 900, and a coinsurance of 90%.
In 2010, determine the variance of YP, the per payment variable.
[Solution: From the Tables attached to the exam, for the Single Parameter Pareto, for x ≥ θ:
αθ θα
E[X ∧ x] = - .
α - 1 (α - 1) x α - 1
2 α θ2 2 θα
E[(X ∧ x) ] = - .
α - 2 (α - 2) x α - 2
Thus [X ∧ 300 / 1.25] = [X ∧ 240] = (3)(200) / 2 - 2003 / {(2) (2402 )} = 230.556.

E[X ∧ 900 / 1.25] = E[X ∧ 720] = (3)(200) / 2 - 2003 / {(2) (7202 )} = 292.284.
S(300/1.25) = S(240) = (200/240)3 = 0.5787.
Thus the mean payment per payment is: (1.25) (90%) (292.284 - 230.556) / 0.5787 = 120.00.
2
E[(X ∧ 240) ] = (3)(2002 ) / 1 - (2)(2003 ) / 240 = 53,333.
2
E[(X ∧ 720) ] = (3)(2002 ) / 1 - (2)(2003 ) / 720 = 97,778.
Since the second moment is in dollars squared, we multiply by the square of the coinsurance factor,
and the square of the inflation factor.
(1.252 ) (90%)2 {97,778 - 53,333 - (2)(240)(292.284 - 230.556)} / 0.5787 = 32,402.
Thus the variance of the non-zero payments is: 32,402 - 120.002 = 18,002.
Alternately, work with the 2010 Single Parameter Pareto with α = 3, and θ = (200)(1.25) = 250.
E [X ∧ 300] = (3)(250) / 2 - 2503 / {(2) (3002 )} = 288.194.
E [X ∧ 900] = (3)(250) / 2 - 2503 / {(2) (9002 )} = 365.355.
S(300) = (250/300)3 = 0.5787.
Thus the mean payment per payment is: (90%) (365.355 - 288.194) / 0.5787 = 120.00.
2
E[(X ∧ 300) ] = (3)(2502 ) / 1 - (2)(2503 ) / 300 = 83,333.
2
E[(X ∧ 900) ] = (3)(2502 ) / 1 - (2)(2503 ) / 900 = 152,778.
Since the second moment is in dollars squared, we multiply by the square of the coinsurance factor.
(90%)2 {152,778 - 83,333 - (2)(300)(365.355 - 288.194)} / 0.5787 = 32,401.
Thus the variance of the non-zero payments is: 32,401 - 120.002 = 18,001.]
Mixed Distributions:285
If one has a mixed distribution, then under uniform inflation each of the component distributions acts
as it would under uniform inflation.
Exercise: The size of loss distribution is: F(x) = 0.7{1 - e-x/130} + 0.3{1 - (250/(250+x))2 }.
After uniform inflation of 20%, what is the size of loss distribution?
[Solution: After uniform inflation of 20%, we get another Exponential Distribution, but with
θ = (1.2)(130) = 156: 1 - e-x/156. After uniform inflation of 20%, we get another Pareto Distribution,
but with α = 2 and θ = (1.2)(250) = 300: 1 - {300/(300+x)}2 .
Therefore, the mixed distribution becomes: 0.7{1 - e-x/156} + 0.3{1 - (300/(300+x))2 }. ]
285
Mixed Distributions are discussed in a subsequent section.
Non-Uniform Rates of Inflation by Size of Loss:
On the exam, inflation is assumed to be uniform by size of loss. What would one expect to see if for
example large losses were inflating at a higher rate than smaller losses? Then we would expect for
example the 90th percentile to increase at a faster rate than the median.286
Exercise: In 2001 the losses follow a Pareto Distribution with parameters α = 3 and θ = 1000.
In 2004 the losses follow a Pareto Distribution with parameters α = 2.5 and θ = 1100.
What is the increase from 2001 to 2004 in the median (50th percentile)?
Also, what is the increase from 2001 to 2004 in the 90th percentile?
[Solution: For the Pareto, at the 90th percentile: 0.9 = 1 - {θ/(θ +x)}α. ⇒ x = θ{101/α -1}.
In 2001 the 90th percentile is: 1000{101/3 -1} = 1154.
In 2004 the 90th percentile is: 1100{101/2.5 - 1} = 1663.
For the Pareto, the median is: θ{21/α -1}.
In 2001 the median is: 1000{21/3 - 1} = 260.
In 2004 the median is: 1100{21/2.5 -1} = 351.
The median increased by: (351/260) - 1 = 35.0%,
while the 90th percentile increased by: (1663/1154) - 1 = 44.1%.]
In this case, the 90th percentile increased more than the median did. The shape parameter of the
Pareto decreased, resulting in a heavier-tailed distribution in 2004 than in 2001. If the higher
percentiles increase at a different rate than the lower percentiles, then inflation is not uniform by size
of loss. When inflation is uniform by size of loss, all percentiles increase at the same rate.287
Over the past few decades, the median wage in the United States has been has been increasing
much more slowly than the 90th percentile wage. Therefore, the distribution of wages has been
changing shape; this is not an example of uniform inflation.
286
If the larger losses are inflating at a lower rate than the smaller losses then the situation is reversed and the higher
percentiles will inflate more slowly than the lower percentiles. Which situation applies may be determined by
graphing selected percentiles over time, with the size of loss on a log scale. In practical applications this analysis
would be complicated by taking into account the impacts of any deductible and/or maximum covered loss.
287
In practical situations, the estimated rates of increase in different percentiles based on data will differ somewhat,
even if the underlying inflation is uniform by size of loss.
Fixed Exchange Rates of Currency:288
Finally it is useful to note that the mathematics of changes in currency is the same as that for inflation.
Thus if loss sizes are expressed in dollars and you wish to convert to some other currency one
multiplies each loss size by the appropriate exchange rate.
Assuming each loss is paid at (approximately) the same time one can apply (approximately) the
same exchange rate to each loss. This is mathematically identical to applying the same inflation factor
under uniform inflation.
If the exchange rate is 80 yen per dollar, then the Loss Elimination Ratio at 80,000 yen is the same
as that at $1000.
Exercise: The Limited Expected Value at $1000 is $600.

The exchange rate is 80 yen per dollar.
Determine the Limited Expected Value at 80,000 yen.
[Solution: 80,000 yen ⇔ 1000.
Limited Expected Value at 80,000 yen is: (600)(80) = 48,000 yen.
Comment: The Loss Elimination Ratio is dimensionless, while the Limited Expected Value is in
dollars or yen. Thus here we need to multiply by the exchange rate of 80 yen per dollar.]
The Coefficient of Variation, Skewness, and Kurtosis, which describe the shape of the size of loss
distribution, are unaffected by converting to yen.
Exercise: The size of loss distribution in dollars is Gamma with α = 3 and θ = 2000.
The exchange rate is 0.80 euros per dollar.
Determine the size of loss distribution in euros.
[Solution: Gamma with α = 3 and θ = (0.80)(2000) = 1600.
Comment: The mean in euros is: (3)(1600) = €4800.
The mean in dollars is: (3)(2000) = $6000. ⇔ (.8)(6000) = €4800.
0.80 euros per dollar. ⇔ 1.25 dollars per euro.

Going from euros to dollars would be mathematically equivalent to 25% inflation.
Going from dollars to euros is mathematically equivalent to deflating back to the earlier year from the
later year with 25% inflation. $6000/1.25 = €4800.]
288
See CAS3, 5/06, Q.26.
Problems:
36.1 (1 point) The size of losses in 1994 follow a Pareto Distribution,

with parameters α = 3, θ = 5000.
Assume that inflation uniformly increases the size of losses between 1994 and 1997 by 20%.
What is the average size of loss in 1997?
A. 2500 B. 3000 C. 3500 D. 4000 E. 4500
36.2 (1 point) The size of losses in 2004 follow an Exponential Distribution: F(x) = 1 - e-x/θ, with
θ = 200. Assume that inflation uniformly increases the size of losses between 2004 and 2009 by
3% per year. What is the variance of the loss distribution in 2009?
A. 48,000 B. 50,000 C. 52,000 D. 54,000 E. 56,000
⎛ 1 ⎞α
36.3 (2 points) The size of losses in 1992 follows a Burr Distribution, F(x) = 1 - ⎜ γ⎟ ,
⎝ 1 + (x / θ) ⎠
with parameters α = 2, θ = 19,307, γ = 0.7.

What is the probability of a loss being greater than 10,000 in 1996?
A. 39% B. 41% C. 43% D. 45% E. 47%
36.4 (1 point) The size of losses in 1994 follow a Gamma Distribution, with parameters α = 2,
θ = 100. Assume that inflation uniformly increases the size of losses between 1994 and 1996 by
10%. What are the parameters of the loss distribution in 1996?
A. α = 2, θ = 100 B. α = 2, θ = 110 C. α = 2, θ = 90.9 D. α = 2, θ = 82.6
36.5 (2 points) The size of losses in 1995 follow a Pareto Distribution, with α = 1.5, θ = 15000.
In 1999, what is the average size of the non-zero payments excess of a deductible of 25,000?
A. 72,000 B. 76,000 C. 80,000 D. 84,000 E. 88,000
36.6 (2 points) The size of losses in 1992 follow the density function f(x) = 2.5x-2 for 2 < x < 10.
What is the probability of a loss being greater than 6 in 1996?
A. 23% B. 25% C. 27% D. 29% E. 31%

A size of loss distribution has been fit to certain data in terms of dollars. The loss sizes have been
converted to yen. Assume the exchange rate is 80 yen per dollar.
36.7 (1 point) In terms of dollars the sizes of loss are given by a Loglogistic, with parameters γ = 4,
and θ = 100.
Which of the following are the parameters of the distribution in terms of yen?
A. γ = 4, and θ = 100 B. γ = 320, and θ = 100 C. γ = 4, and θ = 8000
D. γ = 320, and θ = 8000 E. None of A, B, C, or D.
36.8 (1 point) In terms of dollars the sizes of loss are given by a LogNormal Distribution, with
parameters µ = 10 and σ = 3.
A. µ = 10 and σ = 3 B. µ = 800 and σ = 3 C. µ = 10 and σ = 240
D. µ = 800 and σ = 240 E. None of A, B, C, or D.
36.9 (1 point) In terms of dollars the sizes of loss are given by a Weibull Distribution, with
parameters θ = 625 and τ = 0.5.
A. θ = 625 and τ = 0.5 B. θ = 69.9 and τ = 0.5 C. θ = 5,590 and τ = 0.5
D. θ = 50,000 and τ = 0.5 E. None of A, B, C, or D.
36.10 (1 points) In terms of dollars the sizes of loss are given by a Paralogistic Distribution, with
α = 4, θ = 100. Which of the following are the parameters of the distribution in terms of yen?
A. α = 4, and θ = 100 B. α = 320, and θ = 100 C. α = 4, and θ = 8000
D. α = 320, and θ = 8000 E. None of A, B, C, or D.
36.11 (1 point) The size of losses in 1994 follows a distribution F(x) = Γ[α; λ ln(x)], x > 1, with
parameters α = 40, λ = 10. Assume that inflation uniformly increases the size of losses between
1994 and 1996 by 10%. What are the parameters of the loss distribution in 1996?
A. α = 40, λ = 10 B. α = 40, λ = 9.1 C. α = 40, λ = 11
D. α = 40, λ = 12.1 E. None of A, B, C, or D.
36.12 (1 point) X1 , X2 , ... X50, are independent, identically distributed variables, each with an
Exponential Distribution with mean 800. What is the distribution of X their average?
36.13 (1 point) Assume that inflation uniformly increases the size of losses between 1994 and
1996 by 10%. Which of the following statements is true regarding the size of loss distribution?
1. If the skewness in 1994 is 10, then the skewness in 1996 is 13.31.
2. If the 70th percentile in 1994 is $10,000, then the 70th percentile in 1996 is $11,000.
3. If in 1994 the Loss Elimination Ratio for a deductible of $1000 is 10%,
then in 1996 the Loss Elimination Ratio for a deductible of $1100 is 11%.
A. 1 B. 2 C. 3 D. 1, 2, 3 E. None of A, B, C, or D
36.14 (3 points) The size of losses in 1995 follow the density function:
f(x) = 375x2 e-10x + 20x3 exp(-20x4 ).
Which of the following is the density in 1999?
A. 468.75x2 e-12.5x + 16x3 exp(-16x4 ) B. 192x2 e-8x + 8.192x3 exp(-8.192x4 )
C. 468.75x2 e-12.5x + 8.192x3 exp(-8.192x4 ) D. 192x2 e-8x + 16x3 exp(-16x4 )

• Losses follow a distribution with density function
⎡ ⎛ ln(x) - 7 ⎞ 2 ⎤
exp⎢-0.5 ⎜ ⎟ ⎥
⎣ ⎝ 3 ⎠ ⎦
f(x) = , 0 < x < ∞.
3x 2 π

Determine the expected number of losses that would exceed the deductible each year if all loss
amounts increased by 40%, but the deductible remained at 1000.
A. Less than 175
E. At least 190
36.16 (3 points) Losses in the year 2001 have a Pareto Distribution with parameters α = 3 and
θ = 40. Losses are uniformly 6% higher in the year 2002 than in the year 2001. In both 2001 and
2002, an insurance policy has a deductible of 5 and a maximum covered loss of 25.
What is the ratio of expected payments in 2002 over expected payments in the year 2001?
(A) 104% (B) 106% (C) 108% (D) 110% (E) 112%

• 1000 observed losses occurring in 1993 for a group of risks have been recorded
and are grouped as follows:
Interval Number of Losses
(0, 100] 341
(100, 500] 202
(500, 1000] 131
(1000, 5000] 151
(5000, 10000] 146
(10000, ∞ ) 29
• Inflation of 8% per year affects all losses uniformly from 1993 to 2002.
What is the expected proportion of losses for this group of risks that will be greater than 1000 in the
year 2002?
A. 38% B. 40% C. 42% D. 44% E. 46%
2
[ (x2βx
exp -
-µ)
]
36.18 (2 points) The probability density function of losses in 1996 is: f(x) = µ , x> 0.
2β π x 3
Between 1996 and 2001 there is a total of 20% inflation. What is the density function in 2001?
A. Of the same form, but with parameters 1.2µ and β, rather than µ and β.
B. Of the same form, but with parameters µ and 1.2β, rather than µ and β.
C. Of the same form, but with parameters 1.2µ and 1.2β, rather than µ and β.
D. Of the same form, but with parameters µ/1.2 and β, rather than µ and β.
E. Of the same form, but with parameters µ and β/1.2, rather than µ and β.

• The losses in 1998 prior to any deductible follow a Distribution: F(x) = 1 - e-x/5000.
• Assume that losses increase uniformly by 40% between 1998 and 2007.
• In 1998, an insurer pays for losses excess of a 1000 deductible.
36.19 (2 points) If in 2007 this insurer pays for losses excess of a 1000 deductible, what is the
increase between 1998 and 2007 in the dollars of losses that this insurer expects to pay?
A. 44% B. 46% C. 48% D. 50% E. 52%
36.20 (2 points) If in 2007 this insurer pays for losses excess of a 1400 deductible, what is the
increase between 1998 and 2007 in the dollars of losses that this insurer expects to pay?
A. 38% B. 40% C. 42% D. 44% E. 46%

• In 1990, losses follow a LogNormal Distribution, with parameters 3 and σ.
• Between 1990 and 1999 there is uniform inflation at an annual rate of 4%.
• In 1990, 5% of the losses exceed the mean of the losses in 1999.
Determine σ.
A. 0.960 or 2.960
B. 0.645 or 2.645
C. 0.546 or 3.374
D. 0.231 or 3.059

• The losses in 1995 follow a Weibull Distribution with parameters θ = 1 and τ = 0.3.
• A relevant Consumer Price Index (CPI) is 170.3 in 1995 and 206.8 in 2001.
• Assume that losses increase uniformly by an amount based on the increase in the CPI.
What is the increase between 1995 and 2001 in the expected number of losses exceeding a 1000
deductible?
A. 45% B. 48% C. 51% D. 54% E. 57%
• The losses in 1994 follow a LogNormal Distribution, with parameters µ = 3 and σ = 4.

• Assume that losses increase by 5% from 1994 to 1995, 3% from 1995 to 1996,
7% from 1996 to 1997, and 6% from 1997 to 1998.
• In both 1994 and 1998, an insurer sells policies with a $25,000 maximum covered loss.
What is the increase due to inflation between 1994 and 1998 in the dollars of losses that the insurer
expects to pay?
A. 9% B. 10% C. 11% D. 12% E. 13%

• The losses in 1994 follow a Distribution: F(x) = 1 - (100000/x)3 for x > $100,000.
• Assume that inflation is a total of 20% from 1994 to 1999.
• In each year, a reinsurer pays for the layer of loss from $500 thousand to $2 million.
What is the increase due to inflation between 1994 and 1999 in the dollars that the reinsurer expects
to pay?
A. 67% B. 69% C. 71% D. 73% E. 75%

• The losses in 2001 follow an Inverse Gaussian Distribution,
with parameters µ = 3 and θ =10.
• There is uniform inflation from 2001 to 2009 at an annual rate of 3%.
What is the variance of the distribution of losses in 2009?
A. Less than 2
E. At least 5
36.26 (1 point) In the year 2002 the size of loss distribution is a Pareto with α = 3 and θ = 5000.
During the year 2002 what is the median of those losses of size greater than 10,000?
A. 13,900 B. 14,000 C. 14,100 D. 14,200 E. 14,300
36.27 (2 points) In the year 2002 the size of loss distribution is a Pareto with α = 3 and θ = 5000.
You expect a total of 15% inflation between the years 2002 and 2006.
During the year 2006 what is the median of those losses of size greater than 10,000?
A. 13,900 B. 14,000 C. 14,100 D. 14,200 E. 14,300
36.28 (1 point) The size of losses in 1992 follow the density function f(x) = 1000e-1000x.
Which of the following is the density in 1998?
A. 800e-800x B. 1250e-1250x C. 17841 x.5 e-1000x
D. 1000e-1000x E. None of the above.

Amount Probability
$1,000 1/6
$2,000 1/3
$5,000 1/3
$10,000 1/6
• An insurer pays all losses after applying a $2000 deductible to each loss.
• Inflation of 4% per year impacts all claims uniformly from 2003 to 2006.
Assuming no change in the deductible, what is the inflationary impact on losses paid by the insurer in
2006 as compared to the losses the insurer paid in 2003?
A. 9% B. 12% C. 15% D. 18% E. 21%

• The size of loss distribution in 2007 is LogNormal Distribution with µ = 5 and σ = 0.7.
• Assume that losses increase by 4% per year from 2007 to 2010.
What is the second moment of the size of loss distribution in 2010?
A. Less than 70,000
E. At least 85,000
36.31 (3 points) In 2005 sizes of loss follow a distribution F(x), with survival function S(x) and
density f(x). Between 2005 and 2008 there is a total of 10% inflation.
In 2008 there is a deductible of 1000.
Which of the following does not represent the expected payment per loss in 2008?
A. 1.1 E[X] - 1.1 E[X ∧ 909]
∞ ∞
B. 1.1 ∫909 x f(x) dx - 1000 909∫ f(x) dx
∞
C. 1.1 ∫909 S(x) dx
∞
D. 1.1 ∫909 (x -1000) f(x) dx
∞ 909
E. 1.1 ∫909 x f(x) dx + 1.1 ∫0 {x f(x) - S(x)} dx
36.32 (3 points) For Actuaries Professional Liability insurance, severity follows a Pareto Distribution
with α = 2 and θ = 500,000.
Excess of loss reinsurance covers the layer from R to $1 million.
Annual unlimited ground up inflation is 10% per year.
Determine R, less than $1 million, such that the annual loss trend for the reinsured layer is exactly
equal to the overall inflation rate of 10%.
36.33 (3 points) In 2011, the claim severity distribution is exponential with mean 5000.
In 2013, an insurance company will pay the amount of each claim in excess of a deductible of 1000.
There is a total of 8% inflation between 2011 and 2013.
In 2013, calculate the variance of the amount paid by the insurance company for one claim,
36.34 (3 points) In 2005 sizes of loss follow a certain distribution, and you are given the following
selected values of the distribution function and limited expected value:
3000 0.502 2172
3500 0.549 2409
4000 0.590 2624
4500 0.624 2820
5000 0.655 3000
5500 0.681 3166
6000 0.705 3319
6500 0.726 3462
7000 0.744 3594
In both 2005 and 2010 there is a deductible of 5000.
In 2010 the average payment per payment is 15% more than it was in 2005.
Determine E[X] in 2005.
A. 5000 B. 5500 C. 6000 D. 6500 E. 7000

• In 2010, losses follow a Pareto Distribution with α = 5 and θ = 40.
• There is a total of 25% inflation between 2010 and 2015.
• In 2015, there is a deductible of 10.
36.35 (2 points) In 2015, determine the variance of YP, the per-payment variable.
A. 300 B. 325 C. 350 D. 375 E. 400
36.36 (3 points) In 2015, determine the variance of YL , the per-loss variable.

A. 160 B. 180 C. 200 D. 220 E. 240

• Losses in 2002 follow a LogNormal Distribution with parameters µ = 9.7 and σ = 0.8.
• In 2007, the insured has a deductible of 10,000, maximum covered loss of 50,000,
and a coinsurance factor of 90%.
• Inflation is 3% per year from 2002 to 2007.
36.37 (3 points) In 2007, what is the average payment per loss?

A. less than 12,100
E. at least 12,400
36.38 (1 point) In 2007, what is the average payment per non-zero payment?
A. less than 15,500
E. at least 15,800
36.39 (3 points) In 2007, what is the standard deviation of YL , the per-loss variable?
A. less than 12,100
E. at least 12,400
36.40 (1 point) In 2007, what is the standard deviation of YP, the per-payment variable?
A. less than 12,100
E. at least 12,400
36.41 (1 point) Determine the distribution followed by the average of n independent, identically
distributed Gamma Distributions.
36.42 (3 points) In 2011, losses prior to the effect of a deductible follow a Pareto Distribution
with α = 2 and θ = 250.
There is deductible of 100 in both 2011 and 2016.
The ratio of the expected aggregate payments in 2016 to 2011 is 1.26.
Determine the total amount of inflation between 2011 and 2016.
A. 19% B. 20% C. 21% D. 22% E. 23%
36.43 (3 points) In 2004 losses follow a LogNormal Distribution with mean 30,000 and
coefficient of variation 4. Inflation is 5% per year.
In 2015, what percent of total losses are excess of 200,000?
36.44 (3 points) In 2012 losses follow an Exponential Distribution with mean 1000.
Inflation is 3% per year.
In 2016, there is deductible of size 500, maximum covered loss of 2000, and a coinsurance of 80%.
Determine the average payment per payment in 2016.
A. less than 500
E. at least 650
36.45 (2 points) The changes in the hourly real wages of men from 1973 to 2012 in the United
States, at different percentiles of the wage distribution:
Percentile Change
10th -10.9%
20th -17.9%
30th -13.7%
40th -10.5%
50th -5.8%
60th -1.9%
70th 6.3%
80th 16.9%
90th 25.0%
95th 35.5%
Briefly discuss what this data implies.
36.46 (2 points) Severity in 2014 follows an Inverse Gamma with α = 6 and θ = 10.
What is the fourth moment of the distribution of severity in 2020?
A. less than 100
E. at least 170

• In 2015 a risk has a two-parameter Pareto distribution with α = 3 and θ = 200.
• An insurance on the risk has a deductible of 50 in each year.
• Pi, the insurance premium in year i, equals 1.3 times the expected losses excess of the
deductible of 50.
• The risk is reinsured with a deductible that stays the same in each year.
The reinsurer pays for any payments by the insurer excess of 250 per claim;
in other words the reinsurer pays for the ground up layer excess of 300.
• Ri, the reinsurance premium in year i, equals 1.1 times the expected reinsured claims.
36.47 (2 points) Determine the ratio R2015 / P2015.

(A) 0.21 (B) 0.22 (C) 0.23 (D) 0.24 (E) 0.25
36.48 (2 points) In 2020 losses inflate by 25%.

Calculate R2020 / P2020.
(A) 0.21 (B) 0.22 (C) 0.23 (D) 0.24 (E) 0.25
36.49 (2 points) In 2014 losses follow an Exponential Distribution with mean 3000.
Inflation is 6% per year.
In 2015, there is deductible of size d, and the average payment per payment is 3180.
Determine d.
(A) 100 (B) 250 (C) 500 (D) 1000 (E) Cannot be determined.
36.50 (3 points) In 2010, prior to the application of any deductible, losses follow
a Pareto Distribution with α = 4 and θ = 500.
Between 2010 and 2020 there is a total of 30% inflation. In 2020, there is a deductible of 200.
In 2020, what is the variance of amount paid by the insurer for one loss, including the possibility that
the amount paid is zero?
A. less than 70,000
E. at least 85,000
36.51 (4, 5/86, Q.61 & 4, 5/87, Q.59) (1 point) Let there be a 10% rate of inflation over the
period of concern. Let X be the uninflated losses and Z be the inflated losses.
Let Fx be the distribution function (d.f.) of X, and fx be the probability density function (p.d.f.) of X.
Similarly, let Fz and fz be the d.f. and p.d.f. of Z. Then which of the following statements are true?
1. fz(Z) = fx(Z / 1.1)
2. If Fx is a Pareto, then Fz is also a Pareto.
3. If Fx is a LogNormal, then Fz is also a LogNormal.
A. 2 B. 3 C. 1, 2 D. 1, 3 E. 2, 3
36.52 (4, 5/89, Q.58) (2 points) The random variable X with distribution function Fx(x) is distributed
⎛ 1 ⎞α
according to the Burr distribution, F(x) = 1 - ⎜ γ⎟ ,
⎝ 1 + (x / θ) ⎠
with parameters α > 0, θ > 0, and γ > 0.

If Z = (1 + r)X where r is an inflation rate over some period of concern, find the parameters for the
distribution function Fz(z) of the random variable z.
A. α, θ, γ B. α(1+r), θ, γ C. α, θ(1+r), γ
D. α, θ, γ(1+r)γ E. None of the above
36.53 (4, 5/90, Q.37) (2 points) Liability claim severity follows a Pareto distribution with a mean of
$25,000 and parameter α = 3. If inflation increases all claims by 20%, the probability of a claim
exceeding $100,000 increases by:
A. less than 0.02
E. at least 0.05
36.54 (4, 5/91, Q.27) (3 points) The Pareto distribution with parameters θ = 12,500 and α = 2
appears to be a good fit to 1985 policy year liability claims.
Assume that inflation has been a constant 10% per year.
What is the estimated claim severity for a policy issued in 1992 with a $200,000 limit of liability?
A. Less than 22,000
B. At least 22,000 but less than 23,000
E. At least 25,000
36.55 (4, 5/91, Q.44) (2 points) Inflation often requires one to modify the parameters of a
distribution fitted to historical data. If inflation has been at the same rate for all sizes of loss, which of
the sets of parameters shown in Column C would be correct?
The form of the distributions is as given in the Appendix A of Loss Models.
(A) (B) (C)
Distribution Family Distribution Function Parameters of z = (1+r)(x)
⎛x ⎞ ⎛x ⎞
1. Inverse Gaussian [
Φ ⎜ − 1⎟
⎝µ ⎠
θ
x
] [
+ e2θ / µ Φ − ⎜ + 1⎟
⎝µ ⎠
θ
x
] µ, θ(1+r)
2. Generalized Pareto β[τ, α; x/(θ+x)] α, θ /(1+r), τ

⎡ ⎛ x ⎞ τ⎤
3. Weibull 1 - exp⎢-⎜ ⎟ ⎥ θ(1+r), τ
⎣ ⎝ θ⎠ ⎦
A. 1 B. 2 C. 3 D. 1, 2, 3 E. None of the above
36.56 (4B, 5/92, Q.7) (2 points) The random variable X for claim amounts with distribution function
Fx(x) is distributed according to the Erlang distribution with parameters b and c.
The density function for X is as follows: f(x) = (x/b)c-1 e-x/b / { b (c-1)! }; x > 0, b > 0 , c > 1.
Inflation of 100r% acts uniformly over a one year period.
Determine the distribution function Fz(Z) of the random variable Z = (1+r)X.
A. Erlang with parameters b and c(1+r)
B. Erlang with parameters b(1+r) and c
C. Erlang with parameters b/(1+r) and c
D. Erlang with parameters b/(1+r) and c/(1+r)
E. No longer an Erlang distribution
36.57 (4B, 11/92, Q.20) (3 points) Claim severity follows a Burr distribution,
⎛ 1 ⎞α
F(x) = 1 - ⎜ γ ⎟ , with parameters α = 3, γ = 0.5 and θ. The mean is 10,000.
⎝ 1 + (x / θ) ⎠
If inflation increases all claims uniformly by 44%, determine the probability of a claim exceeding
$40,000 after inflation.
Hint: The nth moment of a Burr Distribution is: θn Γ(1+ n/γ) Γ(α − n/γ) / Γ(α), αγ > n.
A. Less than 0.01
E. At least 0.07
• The underlying distribution for 1992 losses is given by a lognormal distribution with
parameters µ = 17.953 and σ = 1.6028.
• Inflation of 10% impacts all claims uniformly the next year.
What is the underlying loss distribution after one year of inflation?
A. lognormal with µ´ = 19.748 and σ´ = 1.6028
B. lognormal with µ´ = 18.048 and σ´ = 1.6028
C. lognormal with µ´ = 17.953 and σ´ = 1.7631
D. lognormal with µ´ = 17.953 and σ´ = 1.4571
E. no longer a lognormal distribution
• The underlying distribution for 1992 losses is given by f(x) = e-x, x > 0,
where losses are expressed in millions of dollars.
• Inflation of 10% impacts all claims uniformly from 1992 to 1993.
• Under a basic limits policy, individual losses are capped at $1.0 (million).
What is the inflation rate from 1992 to 1993 on the capped losses?
A. less than 2%
E. at least 5%
• The underlying loss distribution function for a certain line of business in 1991 is:
F(x) = 1 - x-5, x > 1.
• From 1991 to 1992, 10% inflation impacts all claims uniformly.
Determine the 1992 Loss Elimination Ratio for a deductible of 1.2.
A. Less than 0.850
E. At least 0.910
• The underlying distribution for 1993 losses is given by
f(x) = e-x, x > 0, where losses are expressed in millions of dollars.
• Under a basic limits policy, individual losses are capped at $1.0 million in each year.
What is the inflation rate from 1993 to 1994 on the capped losses?
A. Less than 1.5%
E. At least 4.5%
• X is the random variable for claim severity with probability distribution function F(x).
• During the next year, uniform inflation of r% impacts all claims.
Which of the following are true of the random variable Z = X(1+r), the claim severity one year later?
1. The coefficient of variation for Z equals (1+r) times the coefficient of variation for X.
2. For all values of d > 0, the mean excess loss of Z at d(1+r) equals (1+r) times
the mean excess loss of X at d.
3. For all values of d > 0, the limited expected value of Z at d equals (1+r) times
the limited expected value of X at d.
A. 2 B. 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C or D
• Losses for 1991 are uniformly distributed on [0, 10,000].
• Inflation of 5% impacts all losses uniformly from 1991 to 1992 and from 1992 to 1993
(5% each year).
Determine the 1993 Loss Elimination Ratio for a deductible of $500.
A. Less than 0.085
E. At least 0.100
• Losses in 1993 follow the density function
f(x) = 3x-4, x ≥ 1,
where x = losses in millions of dollars.
Determine the probability that losses in 1994 exceed $2.2 million.
A. Less than 0.05
E. At least 0.20
• For 1993 the amount of a single claim has the following distribution:
Amount Probability
$1,000 1/6
$2,000 1/6
$3,000 1/6
$4,000 1/6
$5,000 1/6
$6,000 1/6
• An insurer pays all losses AFTER applying a $1,500 deductible to each loss.
Assuming no change in the deductible, what is the inflationary impact on losses paid by the insurer in
1994 as compared to the losses the insurer paid in 1993?
A. Less than 5.5%
E. At least 8.5%
• X is a random variable for 1993 losses, having the density function f(x) = 0.1e-0.1x, x > 0.
• For 1994, a deductible, d, is applied to all losses.
• P is a random variable representing payments of losses truncated and shifted by
the deductible amount.
Determine the value of the cumulative distribution function at p = 5, FP(5), in 1994.
A. 1 - e-0.1(5+d)/1.1
B. {e-0.1(5/1.1) - e-0.1(5+d)/1.1} / {1 - e-0.1(5/1.1)}
C. 0
E. At least 0.35, but less than 0.45
In 1993, an insurance companyʼs underlying loss distribution for an individual claim amount is a
lognormal distribution with parameters µ = 10.0 and σ2 = 5.0.
From 1993 to 1994, an inflation rate of 10% impacts all claims uniformly.
In 1994, the insurance company purchases excess-of-loss reinsurance that caps the insurerʼs loss at
$2,000,000 for any individual claim. Determine the insurerʼs 1994 expected net claim amount for a
single claim after application of the $2,000,000 reinsurance cap.
In 1993, the claim amounts for a certain line of business were normally distributed with mean
(x -µ)2
exp[- ]
µ = 1000 and variance σ2 = 10,000: f(x) = 2σ2 .
σ 2π
Inflation of 5% impacted all claims uniformly from 1993 to 1994. What is the distribution for claim
amounts in 1994?
A. No longer a normal distribution
B. Normal with µ = 1000.0 and σ = 102.5
C. Normal with µ = 1000.0 and σ = 105.0
D. Normal with µ = 1050.0 and σ = 102.5
E. Normal with µ = 1050.0 and σ = 105.0
36.69 (CAS9, 11/94, Q.37) (2 points)

The following first-dollar claims have been observed for a certain class of business:
Claim Number Claim Amount
000500 26
000501 115
000502 387
000503 449
000504 609
000505 774
000506 2,131
000507 5,791
000508 7,499
000509 12,526
Total 30,307
a. (1/2 point) What are the empirical loss elimination ratios for deductibles of $1,000 and $5,000?
b. (1/2 point) Assume that losses can be modeled by an exponential distribution,
with hazard rate = 0.00033.
What are the indicated loss elimination ratios from the model for deductibles of $1,000 and $5,000?
c. (1 point) Assume that losses can be modeled by an exponential distribution,
with hazard rates = 0.00033.
For a fixed deductible of $5,000, what is the leveraged inflation rate under the model, if first dollar
inflation is 10% per year?
(i) The random variable X has an exponential distribution.
(ii) px = 0.95, for all x.
(iii) Y = 2X.
(iv) fY(Y) is the probability density function of the random variable Y.
Calculate fY(1).
(A) 0.000 (B) 0.025 (C) 0.050 (D) 0.075 (E) 0.100
• For 1994, loss sizes follow a uniform distribution on [0, 2500].
• In 1994, the insurer pays 100% of all losses.
• Inflation of 3.0% impacts all losses uniformly from 1994 to 1995.
• In 1995, a deductible of $100 is applied to all losses.
Determine the Loss Elimination Ratio (L.E.R.) of the $100 deductible on 1995 losses.
A. Less than 7.3%
E. At least 7.9%
• For each loss that occurs, the insurer's payment is equal to the entire amount of the
loss if the loss is greater than 100. The insurer makes no payment if the loss is less
than or equal to 100.
Determine the insurer's expected number of annual payments if all loss amounts increased uniformly
by 10%.
A. Less than 7.9
E. At least 8.5
• In 1994, losses follow a Pareto distribution, with parameters θ = 500 and α = 1.5.
What is the median of the portion of the 1995 loss distribution above 200?
A. Less than 600
E. At least 660
36.74 (CAS9, 11/95, Q.11) (1 point) Which of the following are true?
1. A franchise deductible provides incentive for the insured to reduce the magnitude of losses.
2. The expected pure premium for a policy with a straight deductible of d is the expected pure
premium prior to the deductible times the loss elimination ratio at d.
3. Pure premiums on policies with a constant deductible increase faster than the rate of inflation.
A. 1 only B. 3 only C. 1 and 2 D. 2 and 3 E. 1, 2, and 3
36.75 (CAS9, 11/95, Q.35) (4 points)

Using the information given below on a claim size distribution, compute the following:
a. (0.5 point) Pure premium for a $1,000,000 policy limit.
b. (1.5 points) Frequency, severity, and pure premium for a $1,000,000 maximum covered loss
with a deductible of $1,000.
c. (2 points) The proportional increase in pure premium caused by 10% inflation,
assuming the maximum covered loss remains at $1,000,000
and the deductible remains at $1,000.
Claim frequency is 0.15
Claim Size x Distribution Function F(x) Limited Expected Value E[X ∧ x]

909 0.0672 878
1,000 0.0734 962
1,100 0.0801 1,055
909,091 0.9731 73,493
1,000,000 0.9751 75,845
1,100,000 0.9769 78,243
Show all work.
36.76 (4B, 5/96, Q.10 & Course 3 Sample Exam, Q.18) (2 points)
• Losses follow a lognormal distribution, with parameters µ = 7 and σ = 2.
• There is a deductible of 2,000.
Determine the expected number of annual losses that exceed the deductible if all loss amounts
increased uniformly by 20%, but the deductible remained the same.
A. Less than 4.0
E. At least 7.0
36.77 (4B, 11/96, Q.1) (1 point) Using the information in the following table, determine the total
amount of losses from 1994 and 1995 in 1996 dollars.
Year Actual Losses Cost Index
1994 10,000,000 0.8
1995 9,000,000 0.9
1996 --- 1.0
A. Less than 16,000,000
B. At least 16,000,000, but less than 18,000,000
C. At least 18,000,000, but less than 20,000,000
D. At least 20,000,000, but less than 22,000,000
E. At least 22,000,000
• Losses follow a Pareto distribution, with parameters θ = k and α = 2, where k is a constant.
• There is a deductible of 2k.
Over a period of time, inflation has uniformly affected all losses, causing them to double, but the
deductible remains the same. What is the new loss elimination ratio (LER)?
A. 1/6 B. 1/3 C. 2/5 D. 1/2 E. 2/3
36.79 (4B, 11/96, Q.25) (1 point)

The random variable X has a lognormal distribution, with parameters µ and σ.
If the random variable Y is equal to 1.10X what is the distribution of Y?
A. Lognormal with parameters 1.10µ and σ
B. Lognormal with parameters µ and 1.10σ
C. Lognormal with parameters µ + ln1.10 and σ
D. Lognormal with parameters µ and σ + ln1.10
E. Not lognormal
• The random variable X has a Weibull distribution, with parameters θ = 625 and τ = 0.5.
• Z is defined to be 0.25X.
Determine the distribution of Z.
A. Weibull with parameters θ = 10,000 and τ = 0.5
B. Weibull with parameters θ = 2500 and τ = 0.5
C. Weibull with parameters θ = 156.25 and τ = 0.5
D. Weibull with parameters θ = 39.06 and τ = 0.5
E. Not Weibull
• Losses follow a distribution with density function f(x) = (1/1000) e-x/1000, 0 < x < ∞.
Determine the expected number of losses that would exceed the deductible each year if all loss
amounts doubled, but the deductible remained at 500.
A. Less than 10
E. At least 16
• The random variable X has a distribution that is a mixture of a Burr distribution,
⎛ 1 ⎞α
F(x) = 1 - ⎜ γ ⎟ , with parameters θ = 1,000 , α = 1 and γ = 2,
⎝ 1 + (x / θ) ⎠
and a Pareto distribution, with parameters θ = 1,000 and α = 1.

• Each of the two distributions in the mixture has equal weight.
Y is defined to be 1.10 X, which is also a mixture of a Burr distribution and a Pareto distribution.
Determine θ for the Burr distribution in this mixture.
A. Less than 32
E. At least 35
• In 1996, losses follow a lognormal distribution, with parameters µ and σ.
• In 1997, losses follow a lognormal distribution with parameters µ+ ln k and σ,
where k is greater than 1.
• In 1996, 100p% of the losses exceed the mean of the losses in 1997.
Determine σ.
Note: zp is the 100pth percentile of a normal distribution with mean 0 and variance 1.
A. 2 ln k B. -zp ± zp2 - 2 ln k C. zp ± zp2 - 2 ln k
D. -zp ± zp 2 - 2 ln k E. zp ± zp 2 - 2 ln k
• 100 observed claims occurring in 1995 for a group of risks have been recorded and are
grouped as follows:
(0, 250) 36
[250, 300) 6
[300, 350) 3
[350, 400) 5
[400, 450) 5
[450, 500) 0
[500, 600) 5
[600, 700) 5
[700, 800) 6
[800, 900) 1
[900, 1000) 3
[1000, ∞ ) 25
• Inflation of 10% per year affects all claims uniformly from 1995 to 1998.
Using the above information, determine a range for the expected proportion of claims for this group
of risks that will be greater than 500 in 1998.
A. Between 35% and 40% B. Between 40% and 45%
C. Between 45% and 50% D. Between 50% and 55%
E. Between 55% and 60%
10,000 0.60 6,000
15,000 0.70 7,700
22,500 0.80 9,500
∞ 1.00 20,000
After several years of inflation, all losses have increased in size by 50%, but the deductible has
remained the same. Determine the expected value of p.
A. Less than 15,000
E. At least 60,000
36.86 (4B, 5/99, Q.17) (2 points) You are given are following:
• In 1998, claim sizes follow a Pareto distribution, with parameters θ (unknown) and α = 2.
• Inflation of 6% affects all claims uniformly from 1998 to 1999.
• r is the ratio of the proportion of claims that exceed d in 1999
to the proportion of claims that exceed d in 1998.
A. Less than 1.05
E. At least 1.20
36.87 (4B, 5/99, Q.21) (2 points) Losses follow a lognormal distribution,with parameters
µ = 6.9078 and σ = 1.5174. Determine the percentage increase in the number of losses that
exceed 1,000 that would result if all losses increased in value by 10%.
A. Less than 2%
E. At least 8%
36.88 (CAS6, 5/99, Q.39) (2 points) Use the information shown below to determine the one-year
severity trend for the loss amounts in the following three layers of loss:
$0 - $50 $50 - $100 $100 - $200
• Losses occur in multiples of $40, with equal probability, up to $200, i.e., if a loss occurs,
it has an equal chance of being $40, $80, $120, $160, or $200.
• For the next year, the severity trend will uniformly increase all losses by 10%.
• The random variable X follows a Pareto distribution, as per Loss Models, with parameters
θ = 100 and α = 2 .
• Y = 1.10 X.
Determine the range of the function eY(k)/eX(k) over its domain of [0, ∞).
A. (1, 1.10] B. (1, ∞) C. 1.10 D. [1.10, ∞) E. ∞
36.90 (CAS9, 11/99, Q.38) (1.5 points) Assume a ground-up claim frequency of 0.05.
Based on the following claim size distribution, answer the following questions. Show all work.
Claim size (d) Fx(d) E(X ∧ d)
$909 0.075 $870
$1,000 0.090 $945
$1,100 0.100 $1,040
Unlimited 1.000 $10,000
a. (1 point) For a $1,000 franchise deductible, what is the frequency of payments and the average
payment per payment?
b. (0.5 point) Assuming a constant annual inflation rate of 10% across all loss amounts, what is the
pure premium one year later if there is a $1,000 franchise deductible?

For a certain insurance, individual losses in 1994 were uniformly distributed over (0, 1000).
A deductible of 100 is applied to each loss.
In 1995, individual losses have increased 5%, and are still uniformly distributed.
A deductible of 100 is still applied to each loss.
Determine the percentage increase in the standard deviation of amount paid.
(A) 5.00% (B) 5.25% (C) 5.50% (D) 5.75% (E) 6.00%
36.92 (Course 1 Sample Exam, Q.17) (1.9 points) An actuary is reviewing a study she
performed on the size of claims made ten years ago under homeowners insurance policies.
In her study, she concluded that the size of claims followed an exponential distribution and that the
probability that a claim would be less than $1,000 was 0.250.
The actuary feels that the conclusions she reached in her study are still valid today with one
exception: every claim made today would be twice the size of a similar claim made ten years ago as
a result of inflation.
Calculate the probability that the size of a claim made today is less than $1,000.
A. 0.063 B. 0.125 C. 0.134 D. 0.163 E. 0.250
36.93 (3, 5/00, Q.30) (2.5 points) X is a random variable for a loss.
Losses in the year 2000 have a distribution such that:
E[X ∧ d] = -0.025d2 + 1.475d - 2.25, d = 10, 11, 12, ..., 26
Losses are uniformly 10% higher in 2001.
An insurance policy reimburses 100% of losses subject to a deductible of 11 up to a maximum
reimbursement of 11.
Calculate the ratio of expected reimbursements in 2001 over expected reimbursements in the year
2000.
(A) 110.0% (B) 110.5% (C) 111.0% (D) 111.5% (E) 112.0%

An insurer has excess-of-loss reinsurance on auto insurance. You are given:
(i) Total expected losses in the year 2001 are 10,000,000.
(ii) In the year 2001 individual losses have a Pareto distribution with
⎛ 2000 ⎞ 2
F(x) = 1 - ⎜ ⎟ , x > 0.
⎝ 2000 + x ⎠
(iii) Reinsurance will pay the excess of each loss over 3000.
(iv) Each year, the reinsurer is paid a ceded premium, Cyear, equal to 110% of
the expected losses covered by the reinsurance.
(v) Individual losses increase 5% each year due to inflation.
(vi) The frequency distribution does not change.
36.94 (3, 11/00, Q.41 & 2009 Sample Q.119) (1.25 points) Calculate C2001.
(A) 2,200,000 (B) 3,300,000 (C) 4,400,000 (D) 5,500,000 (E) 6,600,000
36.95 (3, 11/00, Q.42 & 2009 Sample Q.120) (1.25 points) Calculate C2002 / C2001.
(A) 1.04 (B) 1.05 (C) 1.06 (D) 1.07 (E) 1.08
36.96 (3, 11/01, Q.6 & 2009 Sample Q.97) (2.5 points) A group dental policy has a negative
binomial claim count distribution with mean 300 and variance 800.
Ground-up severity is given by the following table:
Severity Probability
40 0.25
80 0.25
120 0.25
200 0.25
You expect severity to increase 50% with no change in frequency.
You decide to impose a per claim deductible of 100.
Calculate the expected total claim payment after these changes.
(A) Less than 18,000
(B) At least 18,000, but less than 20,000
(C) At least 20,000, but less than 22,000
(D) At least 22,000, but less than 24,000
(E) At least 24,000
36.97 (CAS3, 5/04, Q.17) (2.5 points) Payfast Auto insures sub-standard drivers.
• Each driver has the same non-zero probability of having an accident.
• Each accident does damage that is exponentially distributed with θ = 200.
• There is a $100 per accident deductible and insureds only "report" claims that are larger
than the deductible.
• Next year each individual accident will cost 20% more.
• Next year Payfast will insure 10% more drivers.
What will be the percentage increase in the number of “reported” claims next year?
A. Less than 15%
E. At least 30%
36.98 (CAS3, 5/04, Q.29) (2.5 points) Claim sizes this year are described by a
2-parameter Pareto distribution with parameters θ = 1,500 and α = 4. What is the expected claim
size per loss next year after 20% inflation and the introduction of a $100 deductible?
A. Less than $490
E. At least $520
36.99 (CAS3, 5/04, Q.34) (2.5 points) Claim severities are modeled using a continuous
distribution and inflation impacts claims uniformly at an annual rate of i.
Which of the following are true statements regarding the distribution of claim severities after the effect
of inflation?
1. An Exponential distribution will have scale parameter (1+i)θ.
2. A 2-parameter Pareto distribution will have scale parameters (1+i)α and (1+i)θ.
3. A Paralogistic distribution will have scale parameter θ /(1+i).
A. 1 only B. 3 only C. 1 and 2 only D. 2 and 3 only E. 1, 2, and 3
36.100 (CAS3, 11/04, Q.33) (2.5 points)

Losses for a line of insurance follow a Pareto distribution with θ = 2,000 and α = 2.
An insurer sells policies that pay 100% of each loss up to $5,000. The next year the insurer changes
the policy terms so that it will pay 80% of each loss after applying a $100 deductible.
The $5,000 limit continues to apply to the original loss amount. That is, the insurer will pay 80% of
the loss amount between $100 and $5,000. Inflation will be 4%.
Calculate the decrease in the insurer's expected payment per loss.
A. Less than 23%
E. At least 26%
Losses in 2003 follow a two-parameter Pareto distribution with α = 2 and θ = 5.
Losses in 2004 are uniformly 20% higher than in 2003.
An insurance covers each loss subject to an ordinary deductible of 10.
Calculate the Loss Elimination Ratio in 2004.
(A) 5/9 (B) 5/8 (C) 2/3 (D) 3/4 (E) 4/5
36.102 (CAS3, 11/05, Q.21) (2.5 points) Losses during the current year follow a Pareto
distribution with α = 2 and θ = 400,000. Annual inflation is 10%.
Calculate the ratio of the expected proportion of claims that will exceed $750,000 next year to the
proportion of claims that exceed $750,000 this year.
A Less than 1.105
E. At least 1.135
36.103 (CAS3, 11/05, Q.33) (2.5 points)

⎛ 800 ⎞ 3
In year 2005, claim amounts have the following Pareto distribution: F(x) = 1 - ⎜ ⎟ .
⎝ 800 + x ⎠
The annual inflation rate is 8%. A franchise deductible of 300 will be implemented in 2006.
Calculate the loss elimination ratio of the franchise deductible.
A. Less than 0.15
E. At least 0.30
In 2005 a risk has a two-parameter Pareto distribution with α = 2 and θ = 3000.
In 2006 losses inflate by 20%. An insurance on the risk has a deductible of 600 in each year.
Pi, the premium in year i, equals 1.2 times the expected claims.
The risk is reinsured with a deductible that stays the same in each year.
Ri, the reinsurance premium in year i, equals 1.1 times the expected reinsured claims.
R2005/P2005 = 0.55. Calculate R2006/P2006.
(A) 0.46 (B) 0.52 (C) 0.55 (D) 0.58 (E) 0.66
36.105 (CAS3, 5/06, Q.26) (2.5 points) The aggregate losses of Eiffel Auto Insurance are
denoted in euro currency and follow a Lognormal distribution with µ = 8 and σ = 2.
Given that 1 euro = 1.3 dollars, which set of lognormal parameters describes the distribution of
Eiffelʼs losses in dollars?
A. µ = 6.15, σ = 2.26 B. µ = 7.74, σ = 2.00 C. µ = 8.00, σ = 2.60
D. µ = 8.26, σ = 2.00 E. µ = 10.40, σ = 2.60
36.106 (CAS3, 5/06, Q.39) (2.5 points) Prior to the application of any deductible, aggregate claim
counts during 2005 followed a Poisson distribution with λ = 14. Similarly, individual claim sizes
followed a Pareto distribution with α = 3 and θ = 1000. Annual severity inflation is 10%.
If all policies have a $250 ordinary deductible in 2005 and 2006, calculate the expected increase in
the number of claims that will exceed the deductible in 2006.
A. Fewer than 0.41 claims
B. At least 0.41, but fewer than 0.45
C. At least 0.45, but fewer than 0.49
D. At least 0.49, but fewer than 0.53
E. At least 0.53
36.107 (CAS3, 11/06, Q.30) (2.5 points) An insurance company offers two policies.
Policy R has no deductible and no limit. Policy S has a deductible of $500 and a limit of $3,000; that
is, the company will pay the loss amount between $500 and $3,000.
In year t, severity follows a Pareto distribution with parameters α = 4 and θ = 3,000.
The annual inflation rate is 6%.
Calculate the difference in expected cost per loss between policies R and S in year t+4.
A. Less than $500
E. At least $650
36.108 (CAS5, 5/07, Q.46) (2.0 points) You are given the following information:
Claim Ground-up Uncensored Loss Amount
A $35,000
B 125,000
C 180,000
D 206,000
E 97,000
If all claims experience an annual ground-up severity trend of 8.0%, calculate the effective trend in the
layer from $100,000 to $200,000 ($100,000 in excess of $100,000.) Show all work.
36.1. B. For the Pareto, the new theta is the old theta multiplied by the inflation factor of 1.2.
Thus the new theta = (1.2)(5000) = 6000. Alpha is unaffected.
The average size of claim for the Pareto is: θ / (α-1). In 1997, this is: 6000/(3-1) = 3000.
Alternately, the mean in 1994 is 5000 / (3-1) = 2500. The mean increases by the inflation factor of
1.2; therefore the mean in 1997 is (1.2)(2500) = 3000.
36.2. D. The inflation factor is (1.03)5 = 1.1593. For the Exponential, the new θ is the old θ
multiplied by the inflation factor. Thus the new θ is: (200)(1.1593) = 231.86.
The variance for the Exponential Distribution is: θ2 , which in 2009 is: 231.862 = 53,758.
Alternately, the variance in 2004 is: 2002 = 40,000. The variance increases by the square of the
inflation factor; therefore the variance in 2009 is: (1.15932 )(40,000) = 53759.
36.3. C. For a Burr, the new theta = θ(1+r) = 19307(1.3) = 25,099. (Alpha and gamma are
unaffected.) Thus in 1996, 1 - F(10,000) = {1/(1 + (10000/25099)0.7)}2 = 43.0%.
Alternately, $10,000 in 1996 corresponds to $10,000 / 1.3 = $7692 in 1992.
Then in 1992, 1 - F(7692) = {1/(1 + (7692/19307)0.7)}2 = 43.0%.
36.4. B. For the Gamma Distribution, θ is multiplied by the inflation factor of 1.1, while α is
unaffected. Thus the parameters in 1996 are: α = 2, θ = 110.
36.5. E. For the Pareto, the new theta is the old theta multiplied by the inflation factor of 1.25.
Thus the new theta = (1.25)(15000) = 18750. Alpha is unaffected. The average size of claim for
data truncated and shifted at 25,000 in 1999 is the mean excess loss, e(25000), in 1999.
For the Pareto e(x) = (x+θ) / (α-1).
In 1999, e(25000) = (25000 + 18750) / (1.5 - 1) = 87,500.
Alternately, $25,000 in 1999 corresponds to $25,000 / 1.25 = $20,000 in 1995.
The average size of claim for data truncated and shifted at 20,000 in 1995 is the mean excess loss,
e(20000), in 1995. For the Pareto e(x) = (x+θ) / (α-1).
In 1995, e(20,000) = (20000 + 15000) / (1.5 - 1) = 70,000. However, we need to inflate this back
up to get the average size in 1999 dollars: (70,000)(1.25) = 87,500.
Comment: The alternate solution uses the fact that the effect of a deductible keeps up with inflation
provided the limit keeps up with inflation, or equivalently if the limit keeps up with inflation, then the
mean excess loss increases by the inflation rate.
36.6. B. Integrating the density function, FX(x) = (2.5)(1/2 - 1/x), 2 < x < 10.
FZ(z) = FX(x) = (2.5)(1/2 - 1/x) = (2.5)(1/2 - 1.2/z).
Alternately, FZ(z) = FX(z/(1+r)) = FX(z/1.2) = (2.5)(1/2 - 1.2/z).
FZ(6) = (2.5)(1/2 - 1.2/6 ) = 0.75. 1 - FZ(6) = 0.25.
Alternately, 6 in 1996 is 6 /1.2 = 5 in terms of 1992.
In 1992, FX(5) = (2.5)(1/2 - 1/5 ) = 0.75. 1 - FX(5) = 0.25.
Comment: Note that the domain becomes [2.4, 12] in 1996.
36.7. C. For the Loglogistic, θ is multiplied by 80 (the inflation factor), while the other parameter γ is
unaffected.
36.8. E. For the LogNormal Distribution µ has ln(80) added to it, while σ is unaffected.
New µ = 10 + ln(80) = 14.38 and σ = 3.
36.9. D. For the Weibull Distribution, θ is multiplied by 80, while τ is unaffected.
New θ = (625) (80) = 50,000.
36.10. C. For the Paralogistic, θ is multiplied by 80 (the inflation factor), while the other parameter α
is unaffected.
36.11. E. In 1996 x becomes 1.1x. z = 1.1x. ln(z) = ln(1.1x) = ln(x) + ln(1.1).

Thus in 1996 the distribution function is F(z) = Γ[α; λln(z)] = Γ[α; λ ln(x) + λln(1.1)].
This is not of the same form, so the answer is none of the above.
Comment: This is called the LogGamma Distribution. If ln(x) follows a Gamma Distribution, then x
follows a LogGamma Distribution. Under uniform inflation, ln(x) becomes
ln(x) + ln(1+r). If you add a constant amount to a Gamma distribution, then you no longer have a
Gamma distribution. Which is why under uniform inflation a LogGamma distribution is not
reproduced.
36.12. The sum of 50 independent, identically distributed Exponentials each with θ = 800 is
Gamma with α = 50 and θ = 800. The average is 1/50 times the sum, and has a Gamma
Distribution with α = 50 and θ = 800/50 = 16.
36.13. B. 1. F. The skewness is unaffected by uniform inflation. (The numerator of the skewness is
the third central moment which would be multiplied by 1.13 .) 2. T. Since each claim size is
increased by 10%, the place where 70% of the claims are less and 30% are more is also increased
by 10%. Under uniform inflation, each percentile is increased by the inflation factor.
3. F. Under uniform inflation, if the deductible increases to keep up with inflation, then the Loss
Elimination Ratio is unaffected. So in 1996 the Loss Elimination Ratio at $1100 is 10% not 11%.
36.14. B. This is a mixed Gamma-Weibull Distribution.

The Gamma has parameters α = 3 and θ = 1/10, and density: θ−αxα−1 e−x/θ /Γ(α) = 500x2 e-10x.
The Weibull has parameters θ = 1/ (201/4) and τ = 4, and density:
τ(x/θ)τ exp[-(x/θ)τ] / x = 80x3 exp(-20x4 ). In the mixed distribution, the Gamma is given a weight of
0.75, while the Weibull is given a weight of 0.25. Note that (0.75)(500) =375, and (0.25)(80) = 20.
Under uniform inflation of 25%, the Gamma has parameters:
α = 3 and θ = (1/10)1.25 = 1/8, and density: θ−αxα−1 e−x/θ /Γ(α) = 256x2 e-8x.
Under uniform inflation of 25%, the Weibull has parameters: θ = 1.25/ (201/4).
θ-1/4 = 20/(1.25)4 = 8.192 and τ = 4, and density: τ(x/θ)τ exp(-(x/θ)τ) / x = 32.768x3 exp(-8.192x4 ).
Therefore, the mixed distribution after inflation has a density of:
(0.75){256x2 e-8x} + (0.25){32.768x3 exp(-8.192x4 )} = 192x2 e-8x + 8.192x3 exp(-8.192x4 ).
Comment: For a mixed distribution under uniform inflation, the weights are unaffected, while the each
separate distribution is affected as usual.
36.15. D. This is a LogNormal Distribution with parameters (prior to inflation) of µ = 7 and

σ = 3. Thus posterior to inflation of 40%, one has a LogNormal Distribution with parameters of
µ = 7 + ln(1.4) = 7.336 and σ = 3. For the LogNormal, S(x) = 1 - Φ[(ln(x) - µ)/σ]. Prior to inflation,
S(1000) = 1 - Φ[(ln(x) - µ)/σ] = 1 - Φ[(ln(1000) - 7)/3] = 1 - Φ(-0.31) = Φ(0.03) = 0.5120.
After inflation, S(1000) = 1 - Φ[(ln(x) - µ)/σ] = 1 - Φ[(ln(1000) - 7.336)/3] =
1 - Φ(-0.143) = Φ(0.14) = 0.5557. Prior to inflation, 173 losses are expected to exceed the
deductible each year. The survival function increased from 0.5120 to 0.5557 after inflation.
Thus after inflation one expects to exceed the deductible per year:
173(0.5557)/0.5120 = 187.8 claims.
Alternately, a limit of 1000 after inflation is equivalent to 1000/1.4 = 714.29 prior to inflation. Thus the
tail probability after inflation at 1000 is the same as the tail probability at 714.29 prior to inflation.
Prior to inflation, 1 - F(714.29) = 1 - Φ[(ln(x) - µ)/σ] = 1 - Φ[(ln(714.3) - 7)/3] = 1 - Φ[-0.14] = Φ[0.14]
= 0.5557. Proceed as before.
Comment: The expected number of claims over a fixed deductible increases under uniform inflation.
36.16. A. For the Pareto Distribution, E[X ∧ x] = {θ/(α−1)} {1−(θ/(θ+x))α−1} = 20{1 - (1 + x/40)-2}.
E[X ∧ 5/1.06] = 20{1 - (1 + 4.717/40)-2} = 3.997.
E[X ∧ 5] = 20{1 - (1+ 5/40)-2} = 4.198. E[X ∧ 25/1.06] = 20(1-(1+ 23.585/40)-2) = 12.085.
E[X ∧ 25] = 20{1 - (1+ 25/40)-2} = 12.426.
In 2001 the expected payments are: E[X ∧ 25] - E[X ∧ 5] = 12.426 - 4.198 = 8.228.
A deductible of 5 and maximum covered loss of 25 in the year 2002, when deflated back to the
year 2001, correspond to a deductible of: 5/1.06 = 4.717, and a maximum covered loss of: 25/1.06
= 23.585. Therefore, reinflating back to the year 2002, the expected payments in the year 2002
are: (1.06)(E[X ∧ 23.585] - E[X ∧ 4.717]) = (1.06)(12.085 - 3.997) = 8.573.
The ratio of expected payments in 2002 over the expected payments in the year 2001 is:
8.573/ 8.228 = 1.042.
Alternately, the insurerʼs average payment per loss is: (1+r) c (E[X ∧ u/(1+r)] - E[X ∧ d/(1+r)]).
c = 100%, u = 25, d = 5. r = .06 for the year 2002 and r = 0 for the year 2001.
Then proceed as previously.
36.17. E. Inflation is 8% per year for 9 years, thus the inflation factor is 1.089 = 1.999.
Thus 1000 in the year 2002 is equivalent to 1000/1.999 = 500 in 1993. There are 457 claims
excess of 500 in 1993; this is 457/1000 = 45.7%.
Comment: Note the substantial increase in the proportion of claims over a fixed limit. In 1993 there
are 32.6% of the claims excess of 1000, while in 2002 there are 45.7%.
36.18. C. In general one substitutes for x = z / (1+r), and for the density function
fZ(z) = fX(x) / (1+r). () In this case, 1 + r = 1.2.
Thus, fZ(z) = fX(x) / 1.2 = µ exp[- (x-µ)2 /(2βx)] / {1.2 2βπ x3 )} =
µ exp[- ({z/1.2}-µ)2 /(2β{z/1.2})] / {1.2 2βπ {z / 1.2}3 } =
(1.2µ) exp[- (z-(1.2µ))2 /{2(1.2β)z}] / 2(1.2β)π z3 .

This is of the same form, but with parameters 1.2µ and 1.2β, rather than µ and β.
Comment: This is an Inverse Gaussian Distribution. Let β = µ2 / θ and one has the parameterization
in Loss Models, with parameters µ and θ. Since under uniform inflation, for the Inverse Gaussian
each of µ and θ are multiplied by the inflation factor, so is β = µ2 / θ.
Recall that that under change of variables when working with the density you need to divide by
dz/dx = 1+ r, since dF/dz = (dF /dx) / (dz/dx).
36.19. C. This is an Exponential Distribution with θ = 5000. ⇒ new θ = (5000)(1.4) = 7000.
For the Exponential Distribution, E[X ∧ x] = θ(1-e−x/θ). The mean is θ.
The losses excess of a limit are proportional to E[X] - E[X ∧ x] = θe−x/θ.

In 1998 this is for x = 1000: 5000e-1000/5000 = 4094.
In 2007 this is for x = 1000: 7000e-1000/7000 = 6068.
The increase is: (6068/4094) - 1 = 48.2%.
Comment: In general excess losses over a fixed limit, increase faster than the rate of inflation.
Note that E[X] - E[X ∧ x] = S(x)e(x) = R(x)E[X] = θe−x/θ.
36.20. B. This is an Exponential Distribution with θ = 5000.

Therefore, the new theta = (5000) 1.4 = 7000.
For the Exponential Distribution, E[X ∧ x] = θ(1-e−x/θ). The mean is θ.
The losses excess of a limit are proportional to E[X] - E[X ∧ x] = θe−x/θ.

In 1998 this is for x = 1000: 5000e-1000/5000 = 5000e-0.2.
In 2007 this is for x = 1400: 7000e-1400/7000 = 7000e-0.2.
The increase is: (7000/5000) - 1 = 40.0%.
Comment: If the limit keeps up with inflation, then excess losses increase at the rate of inflation.
36.21. D. The inflation factor from 1990 to 1999 is: (1.04)9 = 1.423. Thus the parameters of the
1999 LogNormal are: 3 + ln(1.423) and σ. Therefore, the mean of the 1999 LogNormal is:
Mean99 = exp(3 + ln(1.423) + σ2 /2) = 1.423 exp(3 + σ2 /2) = 1.423 Mean90.
Therefore, (ln(Mean99) = 3 + ln(1.423) + σ2 / 2. F90(Mean99) = Φ[(ln(Mean99)) - µ) / σ] =
Φ[(3+ ln 1.423 + σ2 /2 - 3) / σ] = Φ[(ln 1.423 + σ2 /2) / σ].

We are given that in 1990 5% of the losses exceed the mean of the losses in 1999. Thus,
F90(Mean99) = 0.95. Therefore, Φ( (ln 1.423 + σ2 /2) / σ) = 0.95.
Φ(1.645) = 0.95. ⇒ (ln 1.423 + σ2 /2) / σ = 1.645. ⇒ σ2 /2 -1.645σ + ln 1.423 = 0.⇒
σ = 1.645 ± 1.6452 - 2 ln 1.423 = 1.645 ± 2.000 = 0.231 or 3.059.

36.22. E. The inflation factor from 1995 to 2001 is 206.8/170.3 = 1.214.

For the Weibull Distribution, θ is multiplied by 1+ r, while τ is unaffected.
Thus in 2001 the new θ is: (1) (1.214) = 1.214.
The survival function of the Weibull is S(x) = exp[-(x/θ)τ].

In 1995, S(10000) = exp[-(10000.3)) = 0.000355.
In 2001, S(10000) = exp[-(1000/1.214)0.3] = 0.000556.
The ratio of survival functions is: 0.000556/0.000355 = 1.57 or a 57% increase in the expected
number of claims excess of the deductible.
Comment: Generally for the Weibull, the ratio of the survival functions at x is:
exp[-(x/ (1+r)θ)τ] / exp[-(x/θ)τ] = exp[(x/θ)τ {1 - 1/(1+r)}].
36.23. B. The inflation factor from 1994 to 1998 is: (1.05)(1.03)(1.07)(1.06) = 1.2266.
For the LogNormal Distribution, µ has ln(1+r) added, while σ is unaffected.
Thus in 1998 the new µ is: 3 + ln(1.2266) = 3.2042.
The Limited Expected Value for the LogNormal Distribution is:
E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}.
In 1994, E[X ∧ 25000] = e11Φ[(ln(25000) - 19)/4] + 25000{1 - Φ[(ln((25000) - 3)/4]} =
59874Φ[-2.22] + 25000{1 - Φ[1.78]} = 59,874(1 - .9868) + 25,000(1 - .9625) = 1728.
In 1998, E[X ∧ 25000] =
e11.2042Φ[(ln(25000) - 19.2042)/4] + 25000{1 - Φ[(ln((25000) - 3.2043)/4]} =
73438Φ[-2.27] + 25000{1 - Φ[1.73]} = 73438(1 - 0.9884) + 25,000{1 - 0.9582) = 1897.
The ratio of Limited Expected Values is: 1897/1728 = 1.098 or a 9.8% increase in the expected
dollars of claims between 1994 and 1998.
Alternately, in 1994, E[X ∧ 20382] = e11Φ[(ln(20382) - 19)/4] + 20382{1 - Φ[(ln((20382) - 3)/4]} =
59874Φ[-2.27] + 20382{1 - Φ[1.73]} = 59,874(1 - 0.9884) + 20382(1 - 0.9582) = 1547.
In 1998 the average payment per loss is:
(1+r) c (E[X ∧ L/(1+r)] - E[X ∧ d/(1+r)]) = 1.2266 E[X ∧ 25000/1.2266] =
1.2266 E[X ∧ 20382] = (1.2266)(1547) = 1898. Proceed as before.
Comment: For a fixed limit, basic limit losses increase at less than the overall rate of inflation.
Here unlimited losses increase 22.7%, but limited losses increase only 9.8%.
When using the formula for the average payment per loss, use the original LogNormal for 1994.
36.24. D. In general a layer of loss is proportional to the integral of the survival function.
In 1994, S(x) = 1015 x-3. The integral from 500,000 to 2,000,000 of S(x) is:
1015( 500000-2 - 2000000-2)/2 = 1875.
In 1999, the Distribution Function is gotten by substituting x = z/1.2.
F(z) = 1 - (100000/(z/1.2))3 = 1 - (120000/z)3 for z > $120,000.
Thus in 1999, the integral from 500,000 to 2,000,000 of the survival function is:
(120000)3 ( 500000-2 - 2000000-2)/2 = 3240.
3240 / 1875 = 1.728, representing a 72.8% increase.
Alternately, this is a Single Parameter Pareto Distribution, with α = 3 and θ = 100,000.
Under uniform inflation of 20%, θ becomes 120,000 while α is unaffected.
E[X ∧ x] = θ [{α − (x/θ)1−α} / (α − 1)] for α > 1.

Then the layer from 500,000 to 2,000,000 is proportional to:
E[X ∧ 2000000] -E[X ∧ 500000] = (θ/(α−1)) {(α −(2000000/θ)1−α) - (α −(500000/θ)1−α)} =
(θ/(α − 1)){(500000/θ)1−α - (2000000/θ)1−α} .

In 1994, E[X ∧ 2000000] -E[X ∧ 500000] = (100000/2){5-2 - 20-2) = 1875.
In 1999, E[X ∧ 2000000] -E[X ∧ 500000] = (120000/2){4.1667-2 - 16.6667-2) = 3240.
3240 / 1875 = 1.728, representing a 72.8% increase.
Comment: As shown in “A Practical Guide to the Single Parameter Pareto Distribution,” by Stephen
W. Philbrick, PCAS LXXII, 1985, pp. 44, for the Single Parameter Pareto Distribution, a layer of
losses is multiplied by (1+r)α. In this case 1.23 = 1.728.
36.25. D. The total inflation factor is (1.03)8 = 1.2668.

Under uniform inflation both parameters of the Inverse Gaussian are multiplied by 1 + r = 1.2668.
Thus in 2009 the parameters are: µ = 3(1.26668) = 3.8003 and θ = 10(1.2668) = 12.668.
Thus the variance in 2009 is: µ3 / θ = 3.80033 / 12.668 = 4.33.
Alternately, the variance in 2001 is: µ3 / θ = 33 / 10 = 2.7. Under uniform inflation, the variance is
multiplied by (1+r)2 . Thus in 2009 the variance is: (2.7)(1.26682 ) = 4.33.
36.26. A. S(10000) = (5000/(5000 + 10000))3 = 1/27.

Thus we want x such that S(x) = (1/2)(1/27) = 1/54.
(5000/(5000 + x))3 = 1/54 ⇒ x = 5000(541/3 - 1) = 13,899.
36.27. C. During the year 2006, the losses are Pareto with α = 3 and θ = (1.15)(5000) = 5750.
S(10000) = {5750/(5750 + 10000)}3 = 0.04866.
Thus we want x such that S(x) = (1/2)(0.04866) = 0.02433.
{5750/(5750 + x)}3 = 0.02433 ⇒ x = 5750(1/0.024331/3 - 1) = 14,094.
36.28. A. The new theta = (1/1000)1.25 = 1/800. Thus in 1998 the density is:
e-x /θ/θ = 800e-800x.
36.29. E. The inflation factor is 1.043 = 1.1249.

2003 Amount 2003 2006 Amount 2006
Probability Loss Amount Insurer Payment Loss Amount Insurer Payment
0.1667 1000 0 1124.9 0.0
0.3333 2000 0 2249.7 249.7
0.3333 5000 3000 5624.3 3624.3
0.1667 10000 8000 11248.6 9248.6
Average 4166.7 2333.3 4686.9 2832.8
2832.8 / 2333.3 = 1.214, therefore the insurerʼs expected payments increased 21.4%.
36.30. B. The second moment of the LogNormal in 2007 is exp[(2)(5) + (2)(0.72 )] = 58,689.
The second moment increases by the square of the inflation factor: (1.043 )2 (58,689) = 74,260.
Alternately, the LogNormal in 2010 has parameters of: µ = 5 + ln[1.043 ] = 5.1177, and σ = 0.7.
The second moment of the LogNormal in 2010 is exp[(2)(5.1177) + (2)(0.72 )] = 74,265.
36.31. D. Deflating, the 1000 deductible in 2005 is equivalent to a deductible of 1000/1.1 = 909 in
2005. Work in 2005 and then reinflate back up to the 2008 level by multiplying by 1.1.
Average payment per loss is: 1.1{E[X] - E[X ∧ 909]}
∞ 909 ∞ ∞
= 1.1{ ∫0 x f(x) dx - ∫0 x f(x) dx - 909 S(909) } = 1.1 909∫ x f(x) dx - 1000 ∫909 f(x) dx .
Average payment per loss is:
∞ 909 ∞
1.1 {E[X] - E[X ∧ 909]} = 1.1{ ∫0 S(x) dx - ∫0 S(x) dx } = 1.1 909∫ S(x) dx .
909
Average payment per loss is: 1.1 E[(X - 909)+] = 1.1 ∫0 {x f(x) - S(x)} dx .
∞ 909
Average payment per loss is: 1.1 {E[X] - E[X ∧ ∫0
909]} = 1.1 x f(x) dx - 1.1 ∫0 S(x) dx
∞ 909
= 1.1 ∫909 x f(x) dx + 1.1 ∫0 {x f(x) - S(x)} dx .
36.32. For convenience put everything in millions of dollars.

Prior to inflation, E[X ∧ x] = 0.5 - 0.52 / (0.5 + x).
Thus prior to inflation the average payment per loss is:
E[X ∧ 1] - E[X ∧ R] = 0.52 / (0.5 + R) - 0.52 / (0.5 + 1) = 0.52 / (0.5 + R) - 0.166667.
After inflation the average payment per loss is:
1.1(E[X ∧ 1/1.1] - E[X ∧ R/1.1]) =
(1.1)(0.52 ) / (0.5 + R/1.1) - (1.1)(0.52 ) / (0.5 + 1/1.1) = 0.552 / (0.55 + R) - 0.195,161.
Setting the ratio of the two average payments per loss equal to 1.1:
(1.1){0.52 / (0.5 + R) - 0.166667} = 0.552 / (0.55 + R) - 0.195161. ⇒
0.011827(0.5 + R) (0.55 + R) + (1.1)(0.52 )(0.55 + R) - 0.552 (0.5 + R) = 0. ⇒

0.011827R2 - 0.015082R + 0.0032524 = 0.
0.015082 ± 0.0150822 - (4)(0.011827)(0.0032524)
R= = 0.6376 ± 0.3627.
(2)(0.011827)
R = 0.275 or 1.000. ⇒ R = $275,000.

Comment: A rewritten version of CAS9, 11/99, Q.39.
36.33. C. During 2013, the losses follow an Exponential with mean: (1.08)(5000) = 5400.
An Exponential distribution truncated and shifted from below is the same Exponential Distribution,
due to the memoryless property of the Exponential. Thus the nonzero payments are Exponential
with mean 5400. The probability of a nonzero payment is the probability that a loss is greater than
the deductible of 1000; S(1000) = e-1000/5400 = 0.8310. Thus the payments of the insurer
can be thought of as a compound distribution, with Bernoulli frequency with mean 0.8310 and
Exponential severity with mean 5400. The variance of this compound distribution is:
(0.8310)(54002 ) + (5400)2 {(0.8310)(1 - 0.8310)} = 28.3 million.
Equivalently, the payments of the insurer in this case are a two point mixture of an Exponential with
mean 5400 and a distribution that is always zero, with weights 0.8310 and 0.1690.
This has a first moment of: (5400)(0.8310) + (0)(0.1690) = 4487.4,
and a second moment of: {(2)(54002 )}(0.8310) + (02 )(0.1690) = 48,463,920.
Thus the variance is: 48,463,920 - 4487.42 = 28.3 million.
Comment: Similar to 3, 11/00, Q.21, which does not include inflation.
36.34. E. In 2005, the average payment per payment is:

(E[X] - E[X ∧ 5000])/S(5000) = (E[X] - 3000)/(1 - 0.655) = 2.8986E[X] - 8695.7.
In 2005, the average payment per payment is:
1.25(E[X] - E[X ∧ 5000/1.25])/S(5000/1.25) = 1.25(E[X] - E[X ∧ 4000])/(1 - F(4000))
= 1.25(E[X] - 2624)/(1 - 0.590) = 3.0488E[X] - 8000.
Set 1.15(2.8986E[X] - 8695.7) = 3.0488E[X] - 8000. ⇒ E[X] = 2000/.2846 = 7027.
36.35. D. In 2015, the losses are Pareto with α = 5 and θ = (1.25)(40) = 50.
With a deductible of 10, the non-zero payments are Pareto with α = 5 and θ = 50 + 10 = 60.
The mean of this Pareto is: 60/4 = 15.
(2) (602 )
The second moment of this Pareto is: = 600.
(5 - 1) (5 - 2)
The variance of this Pareto is: 600 - 152 = 375.
36.36. C. The non-zero payments are Pareto with α = 5 and θ = 50 + 10 = 60,

with mean: 15, second moment: 600, and variance: 600 - 152 = 375.
The probability of a non-zero payment is the survival function at 10 of the original Pareto:
⎛ 50 ⎞ 5
= 0.4019.
⎝ 50 + 10⎠
Thus YL is a two-point mixture of a Pareto distribution α = 5 and θ = 60, and a distribution that is
always zero, with weights 0.4019 and 0.5981.
The mean of the mixture is: (0.4019)(15) + (0.5981)(0) = 6.029.
The second moment of the mixture is: (0.4019)(600) + (0.5981)(02 ) = 241.14.
The variance of this mixture is: 241.14 - 6.0292 = 205.
Alternately, YL can be thought of as a compound distribution,
with Bernoulli frequency with mean 0.4019 and Pareto distribution α = 5 and θ = 60.
The variance of this compound distribution is:
(0.4019)(375) + (15)2 {(0.4019)(0.5981)} = 205.
36.37. B. inflation factor = (1+r) = 1.035 = 1.1593. coinsurance factor = c = 0.90.

Maximum Covered Loss = u = 50,000. Deductible amount = d = 10,000.
L/(1+r) = 43,130. d/(1+r) = 8626.
E[X ∧ u/(1+r)] = E[X ∧ 43,130] =
exp(10.02)Φ[(ln(43,130) - 9.7 - 0.64 )/0.8] + (43,130) {1 - Φ[(ln(43,130) - 9.7)/0.8]} =
(22,471)Φ[0.41] + (43,130){1 - Φ[1.21]} = (22,471)(0.6591) + (43,130){1 - 0.8869} = 19,689.
E[X ∧ d/(1+r)] = E[X ∧ 8626] =
exp(10.02)Φ[(ln(8626) - 9.7 - 0.64)/0.8] + (8626) {1 - Φ[(ln(8626) - 9.7)/0.8]} =
(22,471)Φ[-1.60] + (8626) {1 - Φ[-0.80]} = (22,471)(0.0548) + (8626) (0.7881) = 8030.
The average payment per loss is: (1+r) c (E[X ∧ L/(1+r)] - E[X ∧ d/(1+r)]) =
(1.1593)(.9)(19,689 - 8030) = 12,165.
Comment: In 2002, the average payment per loss is: (0.9)(E[X ∧ 50000] - E[X ∧ 10000]) ≅
(0.9)(20,345 - 9073) = 10,145. Thus it increased from 2002 to 2007 by: 12165/10145 - 1 =
19.9%. The maximum covered loss would cause the increase to be less than the rate of inflation of
15.9%, while the deductible would cause it to be greater. In this case the deductible had a bigger
impact than the maximum covered loss on the rate of increase. When using the formula for the
average payment for loss, use the parameters of the original LogNormal for 2002. This formula is
equivalent to deflating the 2007 values back to 2002, working in 2002, and then reinflating back up
to 2007. One could instead inflate the LogNormal to 2007 and work in 2007.
36.38. A. Inflation factor = (1+r) = 1.035 = 1.1593. coinsurance factor = c = 0.90.

Maximum Covered Loss = u = 50,000. Deductible amount = d = 10,000.
u/(1+r) = 43,130. d/(1+r) = 8626. S(d/(1+r)) = 1 - Φ[(ln(8626) -9.7)/.8] = 1 - Φ[-0.80] = 0.7881.
The average payment per non-zero payment is:
(1+r)c(E[X ∧ u/(1+r)] - E[X ∧ d/(1+r)]) / S(d/(1+r)) = (1.1593)(0.9)(19,689 - 8030)/0.7881 = 15,435.
Comment: The average payment per non-zero payment is:
average payment per loss / S(d/(1+r)) = 12,165 /0.7881 = 15,436.
36.39. D. E[(X ∧ x)2 ] = exp[2µ + 2σ2] Φ [ ln(x) − σµ − 2σ2

] + x2 {1 - Φ[ ln(x)σ − µ]}
E[(X ∧ u/(1+r))2 ] = E[(X ∧ 43,130)2 ] =
exp(20.68)Φ[{ln(43,130) - 9.7 - (2)(0.82 )}/0.8] + (43,130)2 {1 - Φ[(ln(43,130) − 9.7)/0.8]} =
e20.68 Φ[-0.39] + (43,1302 ) {1 - Φ[1.21]} = (e20.68)(0.3483) + (43,1302 ) {(1 - 0.8869) =
543,940,124.
E[(X ∧ d/(1+r))2 ] = E[(X ∧ 8626)2 ] =
exp(20.68)Φ[{ln(8626) - 9.7 - (2)(0.82 )}/0.8] + (86262 ) {1 - Φ[(ln(8626) - 9.7)/0.8]} =
(e20.68)Φ[-2.40] + (86262 ) {1 - Φ[-0.80]} = (e20.68)(0.0082) + (86262 ) (0.7881) = 66,493,633.
From previous solutions: E[X ∧ 43,130] = 19,689, E[X ∧ 8626] = 8030, S(8626) = 0.7881.
Thus the second moment of the per-loss variable is:
(1.15932 ) (90%2 ) {543,940,124 - 66,493,633 - (2)(8626)(19,689 - 8030)} = 300,791,874.
From a previous solution, the average payment per loss is 12,165.
Thus the variance of the per-loss variable is: 300,791,874 - 12,1652 = 152,804,649.
The standard deviation of the per-loss variable is 12,361.
Comment: One could instead inflate the LogNormal to 2007 and work in 2007.
The 2007 LogNormal has parameters µ = 9.7 + ln[1.035 ] = 9.848, and σ = 0.8
36.40. A. The second moment of the per-payment variable is:

(second moment of the per-loss variable) / S(d/(1+r)) = 300,791,874 / 0.7881 = 381,667,141.
From a previous solution, the average payment per payment is 15,435.
Thus the variance of the per-payment variable is: 381,667,141 - 15,4352 = 143,427,916.
The standard deviation of the per-payment variable is 11,976.
36.41. For independent Gammas with the same θ, the alphas add.
Thus the sum of n independent, identically distributed Gamma Distributions is
a Gamma Distribution with θ and nα.
The average is the sum multiplied by 1/n.
Multiplying a Gamma Distribution by a constant gives another Gamma Distribution with theta
multiplied by that constant.
Thus the average of n independent, identically distributed Gamma Distributions is
a Gamma Distribution with θ/n and nα.
Comment: The average has mean αθ, the same as the original Gamma, as it should be.
The average has variance: (nα) (θ/n)2 = αθ/n, 1/n times the variance of the original Gamma, as it
should be.
36.42. B. In 2016 the ground up losses are Pareto with α = 2 and θ = (1+r)250.
The average payment per loss is: E[X] - E[X ∧ 100].
θ θ2
For a Pareto with α = 2: E[X] - E[X ∧ 100] = θ - θ (1 - )= .
θ + 100 θ + 100
In 2011, E[X] - E[X ∧ 100] = 178.57.
{250(1+ r)}2
In 2016, E[X] - E[X ∧ 100] = .
250(1+ r) + 100
Setting this ratio equal to 1.26:

{250(1+ r)}2
(1.26)(178.57) = .
250(1+ r) + 100
⇒ (225) (250) (1+r) + 22,500 = 62,500 (1 + 2r + r2 ).

⇒ r2 + 1.1r - 0.26 = 0. ⇒ r = 0.2, taking the positive root of the quadratic equation.
In other words, there is a total of 20% inflation between 2011 and 2016.
Comment: The mean ground up loss increases by 20%, but the losses excess of the deductible
increase at a faster rate of 26%.
The average payment per loss is 2011: 2502 / (250 + 100) = 178.57.
The average payment per loss is 2016: 3002 / (300 + 100) = 225.00.
Their ratio is: 225.00/178.57 = 1.260.
The average payment per payment is: e(100) = (θ + 100) / (α - 1) = θ + 100.
In 2011, e(100) = 250 + 100 = 350. In 2016, e(100) = (1.2)(250) + 100 = 400.
Their ratio is: 400/350 = 1.143.
36.43. 1 + CV2 = E[X2 ]/E[X]2 = exp[2µ + 2σ2] / exp[µ + σ2/2]2 = exp[σ2].
1 + 42 = exp[σ2]. ⇒ σ = ln(17) = 1.6832.
30,000 = exp[µ + σ2/2] = exp[µ + 1.68322 /2]. ⇒ µ = 8.8924.

Deflating the limit: 200,000 / 1.0511 = 116,936.
E[X ∧ 116,936] =
30,000 Φ[(ln116,936 - 8.8924 - 1.68322 ) / 1.6832]
+ 116,936 {1 - Φ[(ln116,936 - 8.8924) / 1.6832]}
= 30,000 Φ[-0.03] + 116,936 {1 - Φ[1.65]} = (30,000)(0.4880) + (116,936)(1 - 0.9505) = 20,428.
Excess ratio is: 1 - 20,428/30,000 = 31.9%.
Alternately, if the 2004 LogNormal has µ = 8.8924, and σ = 1.6832,
then the 2015 LogNormal has µ = 8.8924 + 11 ln(1.05) = 9.4291, and σ = 1.6832.
Mean in 2014 is: (30,000) (1.0511) = 51,310.
E[X ∧ 200,000] =
51,310 Φ[(ln200,000 - 9.4291 - 1.68322 ) / 1.6832]
+ 200,000 {1 - Φ[(ln116,936 - 9.4291) / 1.6832]}
= 51,310 Φ[-0.03] + 200,000 {1 - Φ[1.65]} = (51,310)(0.4880) + (200,000)(1 - 0.9505) = 34,939.
Excess ratio is: 1 - 34,939/51,310 = 31.9%.
36.44. E. In 2016 the losses are Exponential with θ = (1.034 )(1000) = 1125.51.
E[X ∧ 500] = (1125.51)(1 - e-500/1125.51) = 403.71.
E[X ∧ 2000] = (1125.51)(1 - e-2000/1125.51) = 935.13.
Average payment per loss is:
(0.8) (E[X ∧ 2000] - E[X ∧ 500]) = (0.8) (935.13 - 403.71) = 425.14.
S(500) = e-500/1125.51 = 0.6413.
Average payment per payment is: 425.14 / 0.6413 = 663.
Alternately use the original Exponential and the formula for the average payment per payment:
(1+r) c (E[X ∧ u/(1+r)] - E[X ∧ d/(1+r)]) / S(d/(1+r)) =
(1.034 ) (0.8) (E[X ∧ 2000/1.034 ] - E[X ∧ 500/1.034 ]) / S(500/1.034 ) =
(1.12551)(0.8) {1000(1 - e-1776.97/1000) - 1000(1 - e-444.24/1000)} / e-444.24/1000 = 663.
36.45. Under uniform inflation, all of the percentiles would increase by the same amount.
Since here the percentiles increase by very different amounts, the mathematics of uniform inflation
are not a good model.
Since the higher percentiles have increased (much) faster than the lower percentiles, in fact the lower
percentiles have decreased, the shape of the distribution is changing so as to have a (significantly)
heavier righthand tail in 2012 than in 1973.
Comment: If for example the distribution of wages were Single Parameter Pareto, then the
parameter α would have gotten smaller from 1973 to 2012.
36.46. E. In 2020 severity is Inverse Gamma with α = 6 and θ = (1.2)(10) = 12.

θ4 124
Fourth moment is: = = 172.8.
(α - 1)(α - 2)(α - 3)(α - 4) (5)(4)(3)(2)
θ4 104
Alternately, in 2014 the fourth moment is: = = 83.333.
(α - 1)(α - 2)(α - 3)(α - 4) (5)(4)(3)(2)
Under inflation, the fourth moment is multiplied by (1+r)4 .

(1.24 )(83.333) = 172.8.
36.47. A. & 36.48. E. For a Pareto Distribution:

E[X] - E[X ∧ d] = θ/(α-1) - {θ/(α-1)} {1 - (θ/(d + θ))α −1} = θα / {(α-1)(d + θ)α−1}.
The insurance premium in each year is: 1.3(E[X] - E[X ∧ 50]) = 1.3θα / {(α-1)(50 + θ)α−1}.
The reinsurance covers the layer excess of 300 in ground up loss, and the reinsurance premium is:
1.1(E[X] - E[X ∧ 300]) = 1.1θα / {(α-1)(300 + θ)α−1}.
Thus Ri/Pi = (1.1/1.3) (50 + θ)α−1 / (300 + θ)α−1.
In 2015 this ratio is: (1.1/1.3) (50 + 200)2 / (300 + 200)2 = 0.212.
In 2020, the Pareto has α = 3 and θ = (1.25)(200) = 250.
Thus in 2020 this ratio is: (1.1/1.3) (50 + 250)2 / (300 + 250)2 = 0.252.
36.49. E. The losses in 2015 follow an Exponential with mean: (1.06)(3000) = 3180.
The payments excess of deductible d follow the same Exponential with mean 3180.
Thus we cannot determine d.
Alternately, deflate the deductible to 2014: d/1.06.
The payments excess of d/1.05 in 2014 follow the same Exponential with mean 3000.
Inflating to 2015, the payments excess of d in 2015 follow an Exponential with mean:
(1.06)(3000) = 3180. Thus we cannot determine d.
Alternately, the average payment per payment is:
(1.06) (E[X] - E[X ∧ d/1.06]) / S(d/1.06) = (1.06) {3000 - (3000)(1 - e-d/1.06)} / e-d/1.06 = 3180.
Thus we cannot determine d.
36.50. B. Applying inflation, in 2020 the Pareto distribution has α = 4 and θ = (1.3)(500) = 650.
After truncating and shifting from below, one gets another Pareto Distribution with
α = 4 and θ = 650 + 200 = 850.
Thus the nonzero payments are Pareto with α = 4 and θ = 850.
This has mean: θ/(α - 1) = 850/3 = 283.33, second moment: 2θ2 / {(α - 1)(α - 2)} = 240,833,
and variance: 240,833 - 283.332 = 160,557.
The probability of a nonzero payment is the probability that a loss is greater than the deductible of
200; for the inflated Pareto, S(200) = {650/(650+200)}4 = 0.342.
Thus the payments of the insurer can be thought of as an aggregate distribution,
with Bernoulli frequency with mean 0.342 and Pareto severity with α = 4 and θ = 850.
The variance of this aggregate distribution is:
(Mean Frequency)(Variance of Severity) + (Mean Severity)2 (Variance of Frequency) =
(0.342)(160,557) + (283.332 ) {(0.342)(1 - 0.342)} = 72,975.
One can also think of this as a two-point mixture between a severity that is always zero and a
severity that is the truncated and shifted Pareto, with the former with weight 1 - 0 342 and the latter
with weight 0.342. The mean of this mixture is: (0.658)(0) + (0.342)(283.33) = 96.90.
The second moment of this mixture is: (0.658)(0) + (0.342)(240,833) = 82,365.
The variance of this mixture is: 82,365 - 96.902 = 72,975.
36.51. E. 1. False. FZ(Z) = FX(Z / 1.1) so that fZ(Z) = fX(Z / 1.1) / 1.1. 2. True. 3. True.
36.52. C. For the Burr distribution θ is transformed by inflation to θ (1+r) .
This follows from the fact that θ is the scale parameter for the Burr distribution.
The shape parameters α and γ remain the same.
From first principles, one makes the change of variables Z = (1+r) X. For the Distribution Function
one just sets FZ(z) = FX(x); one substitutes for x = z / (1+r).
FZ(z) = FX(x) = 1 - (1/{1+(x/θ)τ})α = 1 - {1/(1+{z / (1+r)θ)τ})}α.
This is a Burr Distribution with parameters: α, θ (1+r), and γ.
36.53. A. The mean of a Pareto is θ /(α-1).

Therefore, θ = (α-1) (mean) = (3-1) (25,000) = 50,000.
Prior to the impact of inflation: 1 - F(100000) = {50000 / (50000 +100000)}α = 1/33 = .0370.
Under uniform inflation for the Pareto, θ is multiplied by 1.2 and α is unchanged.
Thus the new θ is (50000)(1.2) = 60000.
Thus after inflation: 1 - F(100000) = {θ / (θ + 100000)}α = (6/16)3 = 0.0527.

The increase is: 0.0527 - 0.0370 = 0.0157.
36.54. A. Inflation has been 10% per year for 7 years. Thus the inflation factor is 1.17 = 1.949.
Under uniform inflation, the Pareto has θ increase to (1+r)θ, while α remains the same.
Thus in 1992 the Pareto Distribution has parameters: 2 and (12500)(1.949) = 24,363.
For the Pareto E[X ∧ x] = {θ/(α−1)} {1 - (θ/(θ+x))α−1}. Thus in 1992,
E[X ∧ 200000] = {24363/(2-1)} {1-(24363/(24363+200000))2-1} = 21,717.
Alternately, the 200,000 limit in 1992 corresponds to 200,000 / 1.949 = 102,617 limit in 1985.
In 1985, E[X ∧ 102,617] = {12500/(2-1)} {1-(12500/(12500+102,617))2-1} = 11143.
In order to inflate to 1992, multiply by 1.949: (1.949)(11143) = 21,718.
36.55. C. 1. False. For the Inverse Gaussian, both µ and θ are multiplied by 1+r.
2. False. θ becomes θ(1+r). (The Generalized Pareto acts like the Pareto under inflation. The scale
parameter is multiplied by the inflation factor.) 3. True.
36.56. B. Since b divides x everywhere that x appears in the density function, b is a scale
parameter. Therefore, under uniform inflation we get a Erlang Distribution with b multiplied by (1+r).
Alternately, one can substitute for x = z / (1+r).
For the density function fZ(z) = fX(x) / (1+r).
Thus f(z) = (z/(1+r)b)c-1 e-z/(1+r)b / (1+r){ b (c-1)! }, which is an Erlang Distribution with b multiplied
by (1+r) and with c unchanged.
Comment: The Erlang Distribution is a special case of the Gamma Distribution, with c integral.
c ⇔ α, and b ⇔ θ.
Recall that that under change of variables applied to the density you need to divide by dz/dx = 1+ r,
since dF/dy = (dF /dx) / (dy/dx).
36.57. D. The mean of a Burr distribution is: θ Γ(1+ 1/γ) Γ(α − 1/γ) / Γ(α) =
θ Γ(1+2 )Γ(3 - 2)/Γ(3) = θ Γ(1) Γ(3 )/Γ(3) = θ. Under uniform inflation the mean increases from
10,000 to (10,000)(1.44) = 14,400. After inflation, the chance that a claim exceeds $40,000 is:
S(40000) = {1 / (1 + (40000/θ)γ)}α = {1 / (1 + (40000/14400)0.5)}3 = 0.0527.
Alternately, one can compute the chance of exceeding 40000 / 1.44 = 27778 prior to inflation:
S(27778) = {1 / (1 + (27778/10000)0.5)}3 = 0.0527.
36.58. B. Under uniform inflation for the LogNormal we get another LogNormal,
but µ becomes µ + ln(1+r) while σ stays the same.
Thus in this case µ'= 17.953 + ln(1.1) = 18.048, while σ remains 1.6028.
36.59. C. For the Exponential Distribution E[X ∧ x] = θ (1- e-x/θ).

During 1992 the distribution is an Exponential Distribution with θ = 1 and the average value of the
capped losses is E[X ∧ 1] = 1 - e-1 = 0.6321.
During 1993 the distribution is an Exponential Distribution with θ = 1.1.
Thus in 1993, E[X ∧ 1] = 1.1{1- e-1/1.1} = 0.6568.
The increase in capped losses between 1993 and 1992 is: 0.6568 / 0.6321 = 1.039.
Comments: The rate of inflation of 3.9% for the capped losses with a fixed limit is less than the
overall rate of inflation of 10%.
36.60. B. Prior to inflation in 1991, F(x) = 1 - x-5 , x > 1. After inflation in 1992,
F(x) = 1 - (x/1.1)-5, x > 1.1. f(y) = 5(1.15 )x-6. LER(1.2) = E[X ∧ 1.2] / E[X].
1.2
E[X ∧ 1.2] = ∫1.1x f(x) dx + (1.2){S(1.2)} = (1.15 )(5/4){(1.1-4) - (1.2-4)} + (1.2){(1.2/1.1)-5}
= 0.404 + 0.777 = 1.181

∞ ∞
E[X] = ∫1.1x f(x) dx = (1.15 ) ∫1.1x - 5 dx = (1.15) (5/4)(1.1-4) = 1.375.
LER(1.2) = E[X ∧ 1.2] / E[X] = 1.181 / 1.375 = 0.859.
Comment: Remember that under uniform inflation the domain of the Distribution Function also
changes; in 1992 x > 1.1. This is a Single Parameter Pareto with α = 5 and θ = 1.1.
E[X ∧ x] = θ [{α − (x/θ)1−α} / (α − 1)]. E[X ∧ 1.2] = 1.1[{5 - (1.2/1.1)1-5} / (5 - 1)] = 1.181.
E[X] = θ α / (α − 1) = 1.1(5/4) = 1.375.
LER(x) = 1 - (1/α) (x/θ)1−α. LER(1.2) = 1 - (1/5)(1.2/1.1)-4 = 1 - (0.2)(0.7061) = 0.859.

Note that one could instead deflate the 1.2 deductible in 1992 to a 1.2/1.1 = 1.0909 deductible in
1991 and then work with the 1991 distribution function.
36.61. B. The distribution for the 1993 losses is an exponential distribution F(x) = 1 - e-x.
In order to convert into 1994 dollars, the parameter of 1 is multiplied by 1 plus the inflation rate of
5%; thus the revised parameter is 1.05. The capped losses which are given by the Limited
Expected Value are for the exponential: E[X ∧ x] = (1 - e−x/θ)θ.
Thus in 1993 the losses capped to 1 ($million) is E[X ∧ 1] = (1- e-1) / 1 = 0.6321.
In 1994 with θ = 1.05, E[X ∧ 1] = (1 - e-0.9524)(1.05) = 0.6449.
The increase in capped losses is: 0.6449 / 0.6321 = 1.019, or 1.9% inflation.
Alternately rather than working with the 1994 distribution one can translate everything back to 1993
dollars and use the 1993 distribution. In 1993 dollars the 1994 limit of 1 is only 1/1.05 = 0.9524.
Thus the capped losses in 1994 are in 1993 dollars E[X ∧ 0.9524] = (1 - e-0.9524).
In 1994 dollars the 1994 capped losses are therefore 1.05E[X ∧ 0.9524] = 0.6449.
The solution is therefore 0.6449 / 0.6321 = 1.019, or 1.9% inflation.
36.62. A. Statement 1 is false. In fact the coefficient of variation as well as the skewness are
dimensionless quantities which are unaffected by a change in scale and are therefore unchanged
under uniform inflation. Specifically in this case the new mean is the prior mean times (1 + r), the new
variance is the prior variance times (1+r)2 .
Therefore, the new coefficient of variation = new standard deviation / new mean =
(1+ r) prior standard deviation / (1 + r) prior mean = prior standard deviation / prior mean =
prior coefficient of variation.
Statement 3 is false. In fact, E[Z ∧ d(1+r)] = (1+r) E[X ∧ d]. The left hand side is the Limited
Expected Value in the later year, with a limit of d(1+r); we have adjusted d, the limit in the prior year,
in order to keep up for inflation via the factor 1+r. This yields the Limited Expected Value in the prior
year, except multiplied by the inflation factor to put it in terms of the subsequent year dollars, which is
the right hand side. For example, if the expected value limited to $1 million is $300,000 in the prior
year, then after uniform inflation of 10%, the expected value limited to $1.1 million is $330,000 in the
later year. In terms of the definition of the Limited Expected Value:
d(1+r) d
E[Z ∧ d(1+r)] = ∫0 z fZ(z) dz + SZ(d(1+r))d(1+r) =
∫0 (1+ r) x fX(x) dx + SX(d)d(1+r) =
(1+r)E[X ∧ d].
Where we have applied the change of variables, z = (1+d) x and thus FZ(d(1+r)) = FX(d),
and fX(x) dx = fZ(z) dz.
Statement 2 is true. The mean residual life at d in the prior year is given by eX(d) =
{ mean of X - E[X ∧ d] } / {1 - FX(d)}. Similarly, the mean residual life at d(1+r) in the later year is
given by eZ(d(1+r)) = {mean of Z - E[Z; d(1+r)]} / {1 - FZ(d(1+r))} =
{ (1+r) E[X] - (1+r)E[X ∧ d] } / {1 - FX(d)} = (1+r)eX(d) . Thus the mean residual life in the later year is
multiplied by the inflation factor of (1+r), provided the limit has been adjusted to keep up with
inflation. For example, if the mean residual life beyond $1 million is $3 million in the prior year, then
after uniform inflation of 10%, the mean residual life beyond $1.1 million is $3.3 million in the
subsequent year.
36.63. B. Losses uniform on [0, 10000] in 1991 become

uniform on: [0, 1.052 (10000)] = [0, 11025] in 1993.
500 11,025
LER( 500) = { ∫0 x f(x) dx + S(500)(500) } /
∫0 x f(x) dx
We have f(x) = 1/11025 for 0 ≤ x ≤ 11025. F(500) = 500 / 11025 = 0.04535.

Thus, LER( 500) = {(1/11025)(5002 )/2 + (1-.04535)(500)} / (11025 / 2) = 0.0886.
Alternately, the LER(500) in 1993 is the LER(500/1.1025) = LER(453.51) in 1991.
453.51
In 1991: E[X ∧ 453.51] = ∫0 x / 10,000 dx + S(453.51) (453.51) =
10.28 + 432.96 = 443.24. Mean in 1991 = 10000 / 2 = 5000.

In 1991: LER(453.51) = E[X ∧ 453.51] / mean = 443.24 / 5000 = .0886.
36.64. C. F(x) = 1 - x-3, x ≥ 1 in 1993 dollars.

A loss exceeding $2.2 million in 1994 dollars is equivalent to a loss exceeding
$2.2 million / 1.1 = $2 million in 1993 dollars.
The probability of the latter is: 1 - F(2) = 2-3 = 1/8 = 0.125.
Alternately, the distribution function in 1994 dollars is: G(x) = 1 - (x/1.1)-3, x ≥ 1.1.
Therefore, 1 - G(2.2) = (2.2/1.1)-3 = 1/8 = 0.125.
Comment: Single Parameter Pareto Distribution.
36.65. D.
1993 Amount 1993 1994 Amount 1994
Probability Loss Amount Insurer Payment Loss Amount Insurer Payment
0.1667 1000 0 1050 0
0.1667 2000 500 2100 600
0.1667 3000 1500 3150 1650
0.1667 4000 2500 4200 2700
0.1667 5000 3500 5250 3750
0.1667 6000 4500 6300 4800
Average 3500.00 2083.33 3675.00 2250
2250 / 2083 = 1.080, therefore the insurerʼs payments increased 8%.
Comment: Inflation on the losses excess of the deductible is greater than that of the ground up
losses.
36.66. E. The distribution for the 1993 losses is an exponential distribution F(x) = 1 - e-.1x. In order
to convert into 1994 dollars, the parameter of 1/.1 is multiplied by 1 plus the inflation rate of 10%;
thus the revised parameter is 1.1/.1 = 1/.0909. Thus the 1994 distribution function is
G(x) = 1 - e-0.0909x , where x is in 1994 dollars. The next step is to write down (in 1994 dollars) the
Truncated and Shifted distribution function for a deductible of d:
FP(x) = {G(x+d) - G(d)} / {1 - G(d)} = {e-0.0909d - e-0.0909 (x+d)} / e-0.0909d = 1 - e-0.0909x.
Fp (5) = 1- e-(0.0909)(5) = 0.3653.

Alternately, $5 in 1994 dollars corresponds to $5 / 1.1 = $4.545 in 1993 dollars.
In 1993, the Truncated and Shifted distribution function for a deductible of d:
G(x) = {F(x+d) - F(d)} / {1 - F(d)} = {e-0.1d - e-0.1(x+d)} / e-0.1d = 1 - e-0.1x.
G(4.545) = 1 - e-0.1(4.545) = 0.3653.
Comment: Involves two separate questions: how to adjust for the effects of inflation and how to
adjust for the effects of truncated and shifted data. Note that for the exponential distribution, after
truncating and shifting the new distribution function does not depend on the deductible amount d.
36.67. B. Under uniform inflation, the parameters of a LogNormal become:

µ′ = µ + ln(1.1) = 10 + .09531 = 10.09531, σ′ = σ = 5.
Using the formula for the limited expected value of the LogNormal: E[X ∧ $2,000,000] =
exp(10.09531 + 5/2) Φ[ln(2,000,000) - 10.09531 - 5)/ 5 ] +
(2,000,000){1 - Φ[ln(2,000,000) - 10.09531)/ 5 ]} = 295,171 Φ[-0.26] + (2,000,000)(1 - Φ[1.97])
= (295,171)(0.3974) + (2,000,000)(0.0244) = $166 thousand.
Alternately, using the original LogNormal Distribution, the average payment per loss in 1994 is:
1.1 E1993[X ∧ 2 million / 1.1] = 1.1 E1993[X ∧ 1,818,182] =
1.1 { exp(10 + 5/2) Φ[ln(1,818,182) - 10 - 5)/ 5 ] + (1,818,182){1 - Φ[ln(1,818,182) - 10)/ 5 ]} } =
1.1{268,337 Φ[-0.26] + (1,818,182)(1 - Φ[1.97]) =
(1.1){(268,337)(0.3974) + (1,818,182)(0.0244)} = $166 thousand.
36.68. E. One can put y = 1.05x, where y is the claim size in 1994 and x is the claim size in 1993.
Then let g(y) be the p.d.f. for y, g(y)dy = f(x)dx = f(y/1.05) dy/1.05.
exp[-0.5({(y / 1.05) -1000} / 100)2 ] exp[-0.5{(y -1050) / 105}2]
g(y) = = .
2π (100)(1.05) 2π (105)
This is again a Normal Distribution with both µ and σ multiplied by the inflation factor of 1.05.
Comment: As is true in general, under uniform inflation, both the mean and the standard deviation
have been multiplied by the inflation factor of 1.05. Assuming you remember that a Normal
Distribution is reproduced under uniform inflation, you can use this general result to arrive at the
solution to this particular problem, since for the Normal, µ is the mean and σ is the standard deviation.
26 + 115 + 387 + 449 + 609 + 774 + (4)(1000)

36.69. (a) LER(1000) = =
30,307
6360 / 30,307 = 21.0%.
26 + 115 + 387 + 449 + 609 + 774 + 2131 + (3)(5000)
LER(5000) =
30,307
= 19,491 / 30,307 = 64.3%.
(b) E[X] = 1/λ. E[X ∧ x] = (1 - e-λx)/λ. ⇒ LER{x] = 1 - e-λx.
LER(1000) = 1 - exp[-(0.00033)(1000)] = 28.1%.
LER(5000) = 1 - exp[-(0.00033)(5000)] = 80.8%.
(c) Average payment per loss is: E[X] - E[X ∧ 5000] = e-λ5000 / λ.
Prior to inflation the average payment per loss is: exp[-(0.00033)(5000)] / 0.00033 = 581.97.
After inflation, loss are Exponential with hazard rate = 0.00033/1.1 = 0.00030.
After inflation the average payment per loss is: exp[-(0.0003)(5000)] / 0.0003 = 743.77.
743.77 / 581.97 = 1.278. ⇔ 27.8% inflation.
36.70. B. 0.95 = p0 = S(1) = e-1/θ. ⇒ θ = -1/ln(0.95) = 19.5.
Y is also Exponential with twice the mean of X: (2)(19.5) = 39. fY(1) = e-1/39/39 = 0.0250.
36.71. C. If the losses are uniformly distributed on [0,2500] in 1994 then they are uniform on
[0,2575] in 1995. (Each boundary is multiplied by the inflation factor of 1.03.)
100 2575
LER( 100) = { ∫0 x f(x) dx + S(100)(100) } / ∫0 x f(x) dx =
100 2575
{ ∫0 x (1/ 2575) dx + {1 - (100/2575)} (100) } / ∫0 x (1/ 2575) dx =
{(1/2575)(1002 )/2 + 100 - (1/2575)(1002 )} / (2575/2) = 7.62%.

Alternately, $100 in 1995 is equivalent to 100/1.03 = $97.09 in 1994. In 1994:
97.09 2500
LER( 97.09) = { ∫0 x f(x) dx + (1- F(97.09))(97.09)} /
∫0 x f(x) dx =
97.09 2500
{ ∫0 x (1/ 2575) dx + (1 - (97.09/2500))(97.09)} /
∫0 x (1/ 2575) dx =
{(1/2500)(97.092 )/2 + 97.09 - (1/2500)(97.092 )} / (2500/2) = 7.62%.
36.72. D. Under uniform inflation the parameters of the Pareto become 2 and 1000(1.1) = 1100.
The expected number of insurer payments is 10 losses per year times the percent of losses
greater than 100: 10S(100) = 10 {1100/(1100+100)}2 = 8.40.
Alternately, after inflation the $100 deductible is equivalent to 100/1.1 = 90.91.
For the original Pareto with α = 2 and θ = 1000, 10{1-F(90.91)} = 10 {1000/1090.91}2 = 8.40.
36.73. C. Under uniform inflation the scale parameter θ of the Pareto is multiplied by the inflation
factor, while the shape parameter α remains the same. Therefore the size of loss distribution in 1995
has parameters: θ = (500)(1.05) = 525, α = 1.5.
F(x) = 1 - {525/(525+x)}1.5 . The distribution function of the data truncated from below at 200 is:
G(x) = {F(m) - F(200)} / {1-F(200)} =
{(525/(525+200))1.5 - (525/(525+x))1.5} / (525/(525+200))1.5 = 1 - (725/(525+x))1.5.
At the median m of the distribution truncated from below G(m) = 0.5.
Therefore, 1 - (725/(525+m))1.5 = 0.5. ⇒ (725/(525+m))1.5 = 0.5.
Thus {(525+m)/725} = 21/1.5 = 1.587. Solving, m = (725)(1.587) - 525 = 626.

36.74. B. 1. Above the franchise deductible, the insured is paid the whole loss amount.
Thus there is no incentive for the insured to reduce the magnitude of a loss.
In fact, if the loss is close to the franchise deductible, the insured has an incentive to make sure the
loss exceeds the franchise deductible.
2. Should have multiplied by one minus the loss elimination ratio.
3. True.
36.75. a) Pure premium is: (mean frequency) E[X ∧ 1 million] = (0.15) (75,845) = $11,377.
b) Pure premium is:
(mean frequency) (E[X ∧ 1 million] - E[X ∧ 1000]) = (0.15)(75,845 - 962) = 11,232.
The frequency of (non-zero) claims is: (0.15)(1 - 0.0734) = 0.13899.
Thus the average severity of (non-zero) claims is: 11,232 / 0.13899 = $80,812.
c) After inflation, the pure premium is: (0.15)(1.1) (E[X ∧ 1 million / 1.1] - E[X ∧ 1000 /1.1]) =
(0.15)(1.1)(73,493 - 878) = 11,981.
The increase in the pure premium due to inflation is: 11,981 / 11,232 - 1 = 6.7%.
Comment: In taking the ratio in part (c), the claim frequency drops out.
The rate of inflation for loses excess of a fixed deductible is greater than the overall rate of inflation,
while the rate of inflation for losses capped by a fixed limit is less than the overall rate of inflation.
Thus the rate of inflation in part (c) could turn out to be either greater than or less than 10%.
36.76. B. After 20% uniform inflation, the parameters of the LogNormal are:
µ′ = µ + ln(1+r) = 7 + ln (1.2) = 7.18, while σ is unchanged at 2.
F(2000) = Φ[{ln(2000) − 7.18} / 2] = Φ[0.21] = 0.5832.
Thus the expected number of claims per year greater than 2000 is:
10{1 - F(2000)} = (10)(1 - 0.5832) = 4.17.
Alternately, one can deflate the deductible amount of 2000, which is then 2000/ 1.2 = 1667, and use
the original LogNormal Distribution.
The expected number of claims per year greater than 1667 in the original year is:
10(1 - F(1667)) = (10)(1 - Φ[{ln(1667) - 7}/ 2]) = (10)(1 - Φ[0.21]) = (10)(1 - 0.5832) = 4.17.
Comment: Prior to inflation, the expected number of claims per year greater than 2000 is:
10(1 - F(2000)) = (10)(1 - Φ[{ln(2000) - 7} / 2]) = (10)(1 - Φ[0.30]) = 3.82.
36.77. E. (10 million / 0.8) + (9 million / 0.9) = 22.5 million.

Comment: Prior to working with observed losses, they are commonly brought to one common level
of inflation.
36.78. D. For the Pareto Distribution, LER(x) = E[X ∧ x] / E[X] = 1 - (θ/(θ+x))α−1.

In the later year, losses have doubled, so the scale parameter of the Pareto has doubled, so θ = 2k,
rather than k.
For θ = 2k and α = 2: LER(x) = 1 - {2k/(2k+x)} = x / (2k + x). Thus LER(2k) = 2k/ (4k) = 1/2.
36.79. C. The behavior of the LogNormal under uniform inflation is explained by noting that
multiplying each claim by a factor of 1.1 is the same as adding a constant amount ln(1.1) to the log of
each claim. (For the LogNormal, the log of the sizes of the claims follow a Normal distribution.)
Adding a constant amount to a normal distribution, gives another normal distribution, with the same
variance but with the mean shifted. Thus under uniform inflation for the LogNormal, µ becomes
µ + ln(1.1). The parameter σ remains the same.
36.80. C. An inflation factor of .25 applied to a Weibull Distribution, gives another Weibull with
scale parameter: (0.25)θ = (0.25)(625) = 156.25, while the shape parameter τ is unaffected.
Thus Z is a Weibull with parameters θ = 156.25 and τ = 0.5.
36.81. C. For the Exponential Distribution,under uniform inflation θ is multiplied by the inflation
factor. In this case, the inflation factor is 2, so the new theta is (1000)(2) = 2000.
Prior to inflation the percent of losses that exceed the deductible of 500 is:
e-500/1000 = e-0.5 = 0.6065.
After inflation the percent of losses that exceed the deductible of 500 is: e-500/2000 = e-0.25 =
0.7788. Thus the number of losses that exceed the deductible increased by a factor of
0.7788/0.6065 = 1.284. Since there were 10 losses expected prior to inflation, there are
(10)(1.284) = 12.8 claims expected to exceed the 500 deductible after inflation.
Comment: One can also do this question by deflating the 500 deductible to 250. Prior to inflation,
S(250) = e-250/1000 = e-0.25 and S(500) = e-500/1000 = e-0.5. Thus if 10 claims are expected to
exceed 500, then there a total of 10/e-0.5 claims. Thus the number of clams expected to exceed
250 is: (10e0.5)(e-0.25) = 10e0.25 = 12.8.
36.82. D. Under uniform inflation, for the Burr, theta is multiplied by (1+r), thus theta becomes:
1.1 1000 = 34.8.
Comment: For a mixed distribution, under uniform inflation each of the individual distributions is
transformed just as it would be if an individual distribution. In this case, the Pareto has new
parameters α = 1 and θ = (1000)(1.1) = 1100, while the Burr has new parameters α = 1,
θ = (1.1) 1000 = 1210, and γ = 2. The weights applied to the distributions remain the same.
36.83. B. The mean of the 1997 LogNormal is: exp((µ+ ln k) + σ2 /2).
F96[Mean97] = Φ[(ln(Mean97) - µ) / σ] = Φ[(µ+ ln k + σ2 /2 - µ) / σ] = Φ[(ln k + σ2 /2) / σ].

Since we are given that in 1996 100p% of the losses exceed the mean of the losses in 1997,
1 - F96[Mean97] = p. Thus F96[Mean97] = 1 - p.
Thus Φ[(ln k + σ2 /2) / σ] = 1 - p. Since the Normal Distribution is symmetric,
Φ[-(ln k + σ2 /2) / s] = p. Thus by the definition of zp , -(ln k + σ2 /2) / σ = zp .
Therefore, σ2 /2 + zp σ + ln k = 0. ⇒ σ = -zp ± zp2 - 2 ln k .

Comment: The 1997 distribution is the result of applying uniform inflation, with an inflation factor of k,
to the 1996 distribution. Thus the mean of the 1997 distribution is: k exp(µ+ σ2 /2), k times the
mean of the 1996 distribution. One could take for example p = 0.05, in which case zp = -1.645, and
then solve for σ in that particular case.
36.84. D. At 10% per year for three years, the inflation factor is 1.13 = 1.331. Thus greater than
500 in 1998 corresponds to greater than 500/1.331 = 376 in 1995. At least 45 and at most 50
claims are less than 376 in 1995. Therefore, between 50% and 55% of the total of 100 claims are
greater than 376 in 1995. Therefore, between 50% and 55% of the total of 100 claims are greater
than 500 in 1998.
Comment: One could linearly interpolate that about 52% or 53% of the claims are greater than 500
in 1998.
36.85. D. Deflate the 15,000 deductible in the later year back to the prior year:
15,000/1.5 = 10,000. In the prior year, the average non-zero payment is:
(E[X] - E[X ∧ 10000]) / S(10000) = (20,000 - 6000) / (1 - 0.6) = 14000 / 0.4 = 35,000.
Inflating to the subsequent year: (1.5)(35,000) = 52,500.
Comment: If the limit keeps up with inflation, so does the mean residual life.
36.86. C. In 1999, one has a Pareto with parameters 2 and 1.06θ.

S 1999(d) = {1.06θ / (1.06θ + d)}2 . S1998(d) = {θ / (θ + d)}2 .
r = S1999(d) / S1998(d) = 1.062 {(θ + d) / (1.06θ + d)}2 = 1.1236 {(1+ θ/d ) / (1+ 1.06θ/d)}2
As d goes to infinity, r goes to 1.1236.
Comment: Alternately, S1999(d) = S1998(d/1.06) = {θ / (θ + d/1.06)}2 .
36.87. C. After 10% inflation, the survival function at 1000 is what it was originally at 1000/1.1 =
909.09. S(1000) = 1 - F(1000) = 1 - Φ[{ln(1000)−µ} / σ] = 1 - Φ[0] = 0.5.
S(909.09) = 1 - Φ[{ln(909.09)−µ} / σ] = 1 - Φ[-0.06] = 0.5239.
S(909.09) / S(1000) = 0.5239 / 0.5 = 1.048. An increase of 4.8%.
Comment: After inflation one has a LogNormal with µ = 6.9078 + ln(1.1), σ = 1.5174.
36.88. Assume for simplicity that the expected frequency is 5. ⇔ One loss of each size.
Loss Contribution to Contribution to Contribution to Contribution to Total
Layer 0-50 Layer 50-100 Layer 100-200 Layer 200-∞
40 40 0 0 0 40
80 50 30 0 0 80
120 50 50 20 0 120
160 50 50 60 0 160
200 50 50 100 0 200
Total 240 180 180 0 600
For the next year, increase each size of loss by 10%:
Loss Contribution to Contribution to Contribution to Contribution to Total
Layer 0-50 Layer 50-100 Layer 100-200 Layer 200-∞
44 44 0 0 0 44
88 50 38 0 0 88
132 50 50 32 0 132
176 50 50 76 0 176
220 50 50 100 20 220
Total 244 188 208 20 660
Trend for layer from 0 to 50 is: 244/240 - 1 = 1.7%.
Comment: The limited losses in the layer from 0 to 50 increase slower than the overall rate of
inflation 10%, while the excess losses in the layer from 200 to ∞ increase faster. The losses in
middle layers, such as 50 to 100 and 100 to 200, can increase either slower or faster than the overall
rate of inflation, depending on the particulars of the situation.
36.89. A. Y follows a Pareto Distribution with parameters: α = 2 and θ = (1.10)(100) = 110.

Thus eY(k) = (k+θ)/(α-1) = k + 110. eX(k) = k + 100.
eY(k) / eX(k) = (k+110) / (k+100) = 1 + 10/(k+110).
Therefore, as k goes from zero to infinity, eY(k) / eX(k) goes from 1.1 to 1.
36.90. a. The frequency of claims exceeding the franchise deductible is:

S(1000) 0.05 = (1 - 0.09) (0.05) = 4.55%.
The average payment per payment for an ordinary deductible of 1000 is:
(E[X] - E[X ∧ 1000]) / {1 - F(1000)} = (10,000 - 945) / (1 - 0.09) = 9950.55.
The average payments per payment for a franchise deductible of 1000 is 1000 more: $10,950.55.
b. After inflation, the average payment per payment for an ordinary deductible of 1000 is:
(1.1)(E[X] - E[X ∧ 909]) / S(909)} = (1.1)(10,000 - 870) / (1 - 0.075) = 10,857.30.
The average payments per payment for a franchise deductible of 1000 is 1000 more: $11,857.30.
Subsequent to inflation, the frequency of payments is: (1 - 0.075) (0.05) = 4.625%,
and the pure premium is: (4.625%)($11,857.30) = $548.40.
Comment: Prior to inflation, the pure premium is: (4.55%)($10,950.55) = $498.25.
36.91. B. During 1994, there is a 100/1000 = 10% chance that nothing is paid.
If there is a non-zero payment, it is uniformly distributed on (0, 900).
Thus the mean amount paid is: 90%(450) = 405.
The second moment of the amount paid is: (90%)(9002 )/3 = 243,000.
Thus in 1994, the standard deviation of the amount paid is: 243,000 - 4052 = 281.02.
In 1995, the losses are uniformly distributed from (0, 1050). During 1995, there is a 100/1050
chance that nothing is paid. If there is a non-zero payment it is uniformly distributed on (0, 950).
Thus the mean amount paid is: (950/1050)(950/2) = 429.76.
The second moment of the amount paid is: (950/1050)(9502 )/3 = 272,182.5.
Thus in 1994, the standard deviation of the amount paid is: 272,182.5 - 429.762 = 295.78.
% increase in the standard deviation of amount paid is: 295.78/281.02 - 1 = 5.25%.
Alternately, the variance of the average payment per loss under a maximum covered loss of u and
a deductible of d is: E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - E[X ∧ d]} - {E[X ∧ u] - E[X ∧ d]}2 .
With no maximum covered loss (u = ∞), this is:
E[X2 ] - E[(X ∧ d)2 ] - 2d{E[X] - E[X ∧ d]} - {E[X] - E[X ∧ d]}2 .
For the uniform distribution on (a, b), the limited moments are for a ≤ x ≤ b :
x
E[(X ∧ d)n ] = ∫a yn / (b - a) dy + xnS(x) = (xn+1 - an+1)/{(n+1)(b-a)} + xn(b-x)/(b-a) =
{(n+1)xn b - an+1 - n xn+1} / {(n+1)(b-a)}.
In 1994, a = 0, b = 1000, and d =100. E[X ∧ 100] = {(2)(100)(1000) - 1002 }/2000 = 95.
E[(X ∧ 100)2 ] = ((3)(1002 )(1000) - (2)1003 )/3000 = 9333.33. The variance of the average
payment per loss is: 10002 /3 - 9333.33 - (2)(100)(500 - 95) - (500 - 95)2 = 78,975.
Similarly, in 1995, E[X ∧ 100] = ((2)(100)(1050) - 1002 )/ {(2)(1050)} = 95.238.
E[(X ∧ 100)2 ] = ((3)(1002 )(1050) - (2)1003 )/((3)(1050))= 9365.08.
In 1995, the variance of the average payment per loss is:
10502 /3 - 9365.08 - (2)(100)(525 - 95.238) - (525 - 95.238)2 = 87,487.
% increase in the standard deviation of amount paid is: 87,487 / 78,975 - 1 = 5.25%.
Comment: The second moment of the uniform distribution on (a, b) is: (b3 - a3 ) / {3(a-b)}.
When a = 0, this is b2 / 3. The amount paid is a mixture of two distributions, one always zero and the
other uniform. For example, in 1994, the amount paid is a 10%-90% mixture of a distribution that is
always zero and a uniform distribution on (0, 900). The second moments of these distributions are
zero and 9002 /3 = 270,000. Thus the second moment of the amount paid is:
(10%)(0) + (90%)(270000) = 243,000.
36.92. C. Prior to inflation, 0.250 = F(1000) = 1 - e-1000/θ. ⇒ θ = 3476.

After inflation, θ = (2)(3476) = 6952. F(1000) = 1 - e-1000/6952 = 0.134.
36.93. D. E[X ∧ 10] = -0.025(102 ) + (1.475)(10) - 2.25 = 10.

E[X ∧ 11] = -0.025(112 ) + (1.475)(11) - 2.25 = 10.95.
E[X ∧ 20] = -0.025(202 ) + (1.475)(20) - 2.25 = 17.25
E[X ∧ 22] = -0.025(222 ) + (1.475)(22) - 2.25 = 18.10.
In 2000, there is a deductible of 11 and a maximum covered loss of 22, so the expected payments
are: E[X ∧ 22] - E[X ∧ 11] = 18.10 - 10.95 = 7.15.
A deductible of 11 and maximum covered loss of 22 in the year 2001, when deflated back to the
year 2000 correspond to a deductible of 11/1.1 = 10 and a maximum covered loss of 22/1.1 = 20.
Therefore, reinflating back to the year 2001, the expected payments in the year 2001 are:
(1.1)(E[X ∧ 20] - E[X ∧ 10]) = (1.1)(17.25 - 10) = 7.975.
The ratio of expected payments in 2001 over the expected payments in the year 2000 is:
7.975/ 7.15 = 1.115.
Alternately, the insurerʼs average payment per loss is: (1+r) c (E[X ∧ L/(1+r)] - E[X ∧ d/(1+r)]).
c = 100%, L = 22, d = 11. r = 0.1 for the year 2001 and r = 0 for the year 2000.
Then proceed as previously.
36.94. C. The expected frequency is: 10,000,000 / 2000 = 5000.

E[X ∧ x] = {θ/(α-1)}{1 - (θ/(x+θ))α−1} = (2000){1 - (2000/(x + 2000)} = 2000 x/(x + 2000).
E[X] - E[X ∧ 3000] = 2000 - 1200 = 800.
⇒ Expected losses paid by reinsurer: (5000)(800) = 4 million.
The ceded premium is: (1.1)(4 million) = 4.4 million.
Alternately, the Excess Ratio, R(x) = 1 - E[X ∧ x]/E[X]. For the Pareto, E[X] = θ/(α-1) and
E[X ∧ x] = {θ/(α-1)} {1- (θ/(x+θ))α−1}. Therefore R(x) = (θ/(x+θ))α−1.

In this case, θ = 2000 and α = 2, so R(x) = 2000/(x+2000). R(3000) = 40%.
The expected excess losses are: (40%)(10,000,000) = 4 million.
The ceded premium is: (1.1)(4 million) = 4.4 million.
36.95. E. In 2001 the Pareto has θ = 2000 and α = 2, so in 2002 the Pareto has parameters
θ = (1.05)(2000) = 2100 and α = 2. In 2002, R(3000) = 2100/(3000+2100) = 41.18%.
The expected losses in 2002 are: (1.05)(10 million) = 10.5 million.
The expected excess losses in 2002 are: (41.18%)(10.5 million) = 4.324 million.
The ceded premium in 2002 is: (1.1)(4.324 million) = 4.756 million.
C 2002 / C2001 = 4.756/4.4 = 1.08.
Alternately, the excess ratio at 3000 in 2002 is the same as the excess ratio at 3000/1.05 = 2857 in
2001. In 2001, R(2857) = 2000/(2857+2000) = 41.18%. Proceed as before.
Alternately, the average payment per loss is: (1+r) c (E[X ∧ L/(1+r)] - E[X ∧ d/(1+r)]).
In 2001 this is: E[X] - E[X ∧ 3000]. In 2002 this is: (1.05)(E[X] - E[X ∧ 3000/1.05]).
C 2002 / C2001 = (avg. payment per loss in 2002)/(average payment per loss in 2001) =
(1.05)(E[X] - E[X ∧ 2857])/(E[X] - E[X ∧ 3000]) = (1.05)(2000 - 1176)/(2000 - 1200) = 1.08.
Comment: Over a fixed limit the excess losses increase more quickly than the overall inflation rate of
5%.
36.96. D. After severity increases by 50%:

Probability Severity Payment with 100 deductible
0.25 60 0
0.25 120 20
0.25 180 80
0.25 300 200
Average payment per loss: (0 + 20 + 80 + 200)/4 = 75.
Expected total payment = (300)(75) = 22,500
Comment: Expect 300 payments: 75@0, 75@ 20, 75@80, and 75@ 200, for a total of 22,500.
36.97. B. Prior to inflation, S(100) = e-100/200 = 0.6065.

After inflation, severity is Exponential with θ = (1.2)(200) = 240.
S(100) = e-100/240 = 0.6592.
Percentage increase in the number of reported claims next year:
(1.1)(0.6592/0.6065) - 1 = 19.6%.
36.98. D. After inflation, severity is Pareto with θ = (1.2)(1500) = 1800, and α = 4.
Expected payment per loss: E[X] - E[X ∧ 100] = {θ/(α-1)} - {θ/(α-1)}{1 - (θ/(θ+100))α−1}
= {1800/(4 - 1)}(1800/(1800 +100))4-1 = 510.16.
Alternately, the average payment per loss in the later year is:
(1+r) c (E[X ∧ u/(1+r)] - E[X ∧ d/(1+r)]) = (1.2)(1)(E[X] - E[X ∧ 100/1.2]) =
1.2{500 - 500(1 - (1500/1583.33)3 )} = 510.16.
36.99. A. 1. True.
2. False. Should be α and (1+i)θ. In any case, α is a shape not a scale parameter.
3. False. Should be (1+i)θ; the scale parameter is multiplied by one plus the rate of inflation.
36.100. B. First year: E[X ∧ 5000] = (2000/(2-1)) {1 - (2000/(2000 + 5000))2-1} = 1429.

Next year, d = 100, u = 5000, c = 80% and r = 4%, and the average payment per loss is:
(1.04)(80%){E[X ∧ 5000/1.04] - E[X ∧ 100/1.04]} = 0.832{E[X ∧ 4807.7] - E[X ∧ 96.15]} =
(0.832)(2000){2000/(2000 + 96.15) - 2000/(2000 + 4807.7)} = 1099.
Reduction is: 1 - 1099/1429 = 23.1%.
Alternately, in the next year we have a Pareto with α = 2, and θ = (1.04)(2000) = 2080.
Thus in the next year, E[X ∧ 5000] = (2080/(2-1)) {1 - (2080/(2080 + 5000))2-1} = 1468.9.
E[X ∧ 100] = (2080/(2-1)) {1 - (2080/(2080 + 100))2-1} = 95.4.
Thus the average payment per loss is: (80%)(1468.9 - 95.4) = 1098.8. Proceed as before.
36.101. B. In 2004 losses follow a Pareto with α = 2 and θ = (1.2)(5) = 6.

E[X] = 6/(2 - 1) = 6. E[X ∧ 10] = {θ/(α-1)} {1 - (θ/(θ+x))α−1} = (6){1 - 6/(6 + 10)} = 3.75.
LER(10) = E[X ∧ 10]/E[X] = 3.75/6 = .625 = 5/8.
36.102. D. Sthisyear(750000) = {400000/(400000 + 750000)}2 = 0.12098.

After inflation, the losses follow a Pareto distribution with α = 2 and θ = (1.1)(400,000) = 440,000.
S nextyear(750000) = {440000/(440000 + 750000)}2 = 0.13671.
S nextyear(750000) / Sthisyear(750000) = 0.13671/0.12098 = 1.130.
Alternately, one can calculate the survival function at 750,000 next year, by deflating the 750000 to
this year. 750000/1.1 = 681,818. Snextyear(750000) = Sthisyear(681,818) =
{400000/(400000 + 681,818)}2 = 0.13671. Proceed as before.
36.103. B. This is Pareto Distribution with α = 3 and θ = 800.

In 2006, losses follow a Pareto Distribution with α = 3 and θ = (1.08)(800) = 864.
With a franchise deductible, the total amount of those losses of size 300 or less are eliminated, while
the full amount of all losses of size greater than 300 is paid.
300
Losses eliminated = ∫ x f(x)dx = E[X ∧ 300] - 300S(300) =
0
(864/2){1 - (864/(300 + 864))2 } - (300){864/(300 + 864)}3 = 71.30.

Loss elimination ratio = Losses eliminated / mean = 71.30/(864/2) = 16.5%.
Alternately, the expected payment per loss with an ordinary deductible would be:
E[X] - E[X ∧ 300] = (864/2) - (864/2){1 - (864/(300 + 864))2 } = 238.02.
With the franchise deductible one pays 300 more on each loss of size exceeding 300 than under
the ordinary deductible: 238.02 + 300S(300) = 238.02 + (300){864/(300 + 864)}3 = 360.71.
E[X] = 864/2 = 432. Loss Elimination Ratio is: 1 - 360.71/432 = 16.5%.
36.104. D. For a Pareto Distribution, E[X] - E[X ∧ d] = θ/(α-1) - {θ/(α-1)} {1 - (θ/(d + θ))α −1} =
θα / {(α-1)(d + θ)α−1}.
The premium in each year is: 1.2(E[X] - E[X ∧ 600]) = 1.2θα / {(α-1)(600 + θ)α−1}.
If the reinsurance covers the layer excess of d in ground up loss, then the reinsurance premium is:
1.1(E[X] - E[X ∧ d]) = 1.1θα / {(α-1)(d + θ)α−1}.
In 2005, the Pareto has α = 2 and θ = 3000.
R2005 = 0.55P2005. ⇒ (1.1) 30002 /(d + 3000) = (0.55)(1.2) 30002 /3600.
⇒ (0.66)(d + 3000) = (1.1)(3600). ⇒ d = 3000.

In 2006, the losses follow a Pareto Distribution with α = 2 and θ = (3000)(1.2) = 3600.
P2006 = (1.2)36002 /4200. R2006 = (1.1)36002 /6600.
R2006/P2006 = (1.1)(4200)/{(1.2)(6600)} = 0.583.
Comment: The higher layer increases more due to inflation, and therefore the ratio of R/P has to
increase, thereby eliminating choices A, B, and C.
One could have instead described the reinsurance as covering the layer excess of 600 + d in
ground up loss, in which case d = 2400 and one obtains the same final answer.
P2005 = 3000. R2005 = 1650. P2006 = 3703. R2006 = 2160.
36.105. D. The losses in dollars are all 30% bigger.

For example, 10,000 euros ⇔ 13,000 dollars.
This is mathematically the same as 30% uniform inflation.
We get another Lognormal with σ = 2 the same, and µʼ = µ + ln(1+r) = 8 + ln(1.3) = 8.26.
Comment: The mean in dollars should be 1.3 times the mean in euros. The mean in euros is:
exp[8 + 22 /2] = 22,026. For the choices we get means of: exp[6.15 + 2.262 /2] = 6026,
exp[7.74 + 22 /2] = 16,984, exp[8 + 2.62 /2] = 87,553, exp[8.26 + 22 /2] = 28,567, and
exp[10.4 + 2.62 /2] = 965,113. Eliminating all but choice D.
36.106. A. In 2005, S(250) = (1000/1250)3 = 0.512.

In 2006, the Pareto distribution has α = 3 and θ = (1.1)(1000) = 1100.
In 2006, S(250) = (1100/1350)3 = 0.54097.
Increase in expected number of claims: (14)(0.54097 - 0.512) = 0.406 claims.
Alternately, deflate 250 in 2006 back to 2005: 250/1.1 = 227.27.
In 2005, S(227.27) = (1000/1227.27)3 = 0.54097. Proceed as before.
Comment: We make no use of the fact that frequency is Poisson.
36.107. D. In 4 years, severity is Pareto with parameters α = 4 and θ = (1.064 )3000 = 3787.
Under Policy R, the expected cost per loss is the mean: 3787/(4 - 1) = 1262.
Under Policy S, the expected cost per loss is: E[X ∧ 3000] - E[X ∧ 500] =
{θ/(α-1)} {(θ/(θ+500))α−1 - (θ/(θ+3000))α−1} = (1262) {(3787/4287)3 - (3787/6787)3 } = 651.
Difference is: 1262 - 651 = 611.
Alternately, under Policy S, the expected cost per loss, using the Pareto for year t, is:
(1.064 ) {E[X ∧
3000/1.064 ] - E[X ∧ 500/1.064 ]} =
(1.2625) {E[X ∧ 2376] - E[X ∧ 396]} =
(1.2625) {{3000/(4-1)}{1 - (3000/5376)4-1} - {3000/(4-1)}{1 - (3000/3396)4-1} =
(1.2625) {826 - 311} = 650.
Difference is: 1262 - 650 = 612.
Comment: This exam question should have said: “Policy S has a deductible of $500 and a
maximum covered loss of $3,000.”
36.108. Compare the contributions to the layer from 100,000 to 200,000 before and after inflation:
Contribution Inflated Contribution
Claim Loss to Layer Loss to Layer
A 35,000 0 37,800 0
B 125,000 25,000 135,000 35,000
C 180,000 80,000 194,400 94,400
D 206,000 100,000 222,480 100,000
E 97,000 0 104,760 4,760
Total 205,000 234,160
234,160/205,000 = 1.142. 14.2% effective trend on this layer.
2016-C-2, Loss Distributions, §37 Lee Diagrams HCM 10/21/15, Page 714
Section 37, Lee Diagrams
A number of important loss distribution concepts can be displayed graphically.

Important concepts will be illustrated using this graphical approach of Lee.289
While this material is not on your exam, graphically oriented students may benefit from looking at this
material. You may find that it helps you to remember formulas that are used on your exam.
Below is shown a conventional graph of a Pareto Distribution with α = 4 and θ = 2400:
Exercise: For a Pareto Distribution with α = 4 and θ = 2400, what is F(1000)?

⎛ θ ⎞α
[Solution: F(x) = 1 - ⎜ ⎟ . F(1000) = 1 - (2400/3400)4 = 0.752.]
⎝ θ + x⎠
In the conventional graph, the x-axis correspond to size of loss, while the y-axis corresponds to
probability. Thus for example, the above graph of a Pareto includes the point (1000, 0.752).
In contrast, “Lee Diagrams” have the x-axis correspond to probability, while the y-axis
corresponds to size of loss.
289
“The Mathematics of Excess of Loss Coverage and Retrospective Rating --- A Graphical Approach”, by
Y.S. Lee, PCAS LXXV, 1988. Currently on the syllabus of CAS Exam 9. Lee cites “A Practical Guide to the Single
Parameter Pareto Distribution”, by Steven Philbrick, PCAS 1985. Philbrick in turn points out the similarity to the
treatment of Insurance Charges and Savings in “Fundamentals of Individual Risk Rating and Related Topics”, by
Richard Snader.
Here is the Lee Diagram of a Pareto Distribution with α = 4 and θ = 2400:
For example, since F(1000) = 0.752, the point (0.752, 1000) is on the curve. Note the way that the
probability approaches a vertical asymptote of unity as the claim size increases towards infinity.290
Advantages of this representation of Loss Distributions include the intuitively appealing features:291
1. The mean is the area under the curve.292
2. A loss limit is represented by a horizontal line, and excess losses lie above the line.
3. Losses eliminated by a deductible lie below the horizontal line represented by the deductible
amount.
4. After the application of a trend factor, the new loss distribution function lies above the prior
distribution.
290
F(x) →1, as x →∞.
291
“A Practical Guide to the Single Parameter Pareto Distribution”, by Steven Philbrick, PCAS 1985.
292
As discussed below. In a conventional graph, the area under a distribution function is infinite, the area under the
survival function is the mean, and the area under the density is one.
Means:
One can compute the mean of this size of loss distribution as the integral from 0 to ∞
of y f(y)dy. As shown below, this is represented by summing narrow vertical strips, each of width
f(y)dy and each of height y, the size of loss.293
5000
4000
3000
2000
f(y)dy
1000
y
0.2 0.4 0.6 0.8 1.0
Summing over all y, would give the area under the curve. Thus the mean is the area under the
curve.294
293
y = size of loss. x = F(y) = Prob[Y ≤ y]. f(y) = dF/dy = dx/dy. dx = (dx/dy)dy = f(y)dy.
The width of each vertical strip is dx = f(y)dy.
294
In this case, the mean = θ/(α-1) = 2400/3 = 800.
Alternately as shown below, one can compute the area under the curve by summing narrow
horizontal strips, each of height dy and each of width 1 - F(y) = S(y), the survival function.295
Summing over all y, would give the area under the curve.
5000
4000
3000
2000
1000 dy
S(y)
0.2 0.4 0.6 0.8 1.0
Two integrals can be gotten from each other via integration by parts:
∞ ∞
∫ S(y) dy = ∫ y f(y) dy = mean claim size.
0 0
Thus the area under the curve can be computed in either of two ways. The mean is either the integral
of the survival function, via summing horizontal strips, or the integral of y f(y), via summing vertical
strips .
In the Pareto example:

∞ ⎛ ∞
2400 ⎞ 4 24004
∫ ⎜⎝ y + 2400 ⎟⎠ dy = 800 = ∫ y 4 (y + 2400)5 dy = mean claim size.
0 0
295
Each horizontal strip goes from x = F(y) to x = 1, and thus has width 1 - F(x) = S(x).
Mean vs. Median:
The Lee Diagram below, illustrates why for loss distributions skewed to the right, i.e., with positive
coefficient of skewness, the mean is usually greater than the median.296
The area under the curve is the mean; since the width is one, the average height of the curve is the
mean. On the other hand, the median is the height at which the curve reaches a probability of one
half, which in this case is less than the average height of the curve. The diagram would be similar for
most distributions significantly skewed to the right, and the median is less than the mean.
296
For distributions with skewness close to zero, the mean is usually close to the median. (For symmetric
distributions, the mean equals the median.) Almost all loss distributions encountered in practical applications by
casualty actuaries have substantial positive skewness, with the median significantly less than the mean.
Limited Expected Value:
A loss of size less than 1000 contributes its size to the Limited Expected Value at 1000,
E[X ∧ 1000]. A loss of size greater than or equal to 1000 contributes 1000 to E[X ∧ 1000]. These
contributions to E[X ∧ 1000] correspond to vertical strips:
5000
4000
3000
2000
1000 f(500) dy
1000
500
0.2 0.4 0.6 0.8 1.0
The contribution to E[X ∧ 1000] of the small losses, as a sum of vertical strips, is the integral from 0 to
1000 of y f(y) dy. The contribution to E[X ∧ 1000] of the large losses, is the area of the rectangle of
height 1000 and width S(1000): 1000 S(1000).
1000
These 2 pieces correspond to the 2 terms: E[X ∧ 1000] = ∫0 y f(y) dy + 1000S(1000).
Adding up these two types of contributions, E[X ∧ 1000] corresponds to the area under both the
distribution curve and the horizontal line y = 1000:
In general, E[X ∧ L] ⇔ the area below the curve and also below the horizontal line at L.
Summing horizontal strips, the Limited Expected Value also is equal to the integral of the survival
function from zero to the limit:
5000
4000
3000
2000
1000
dy
S ( y)
0.2 0.4 0.6 0.8 1.0
1000
E[X ∧ 1000] = ∫0 S(t) dt .
Losses Eliminated:
A loss of size less than 1000 is totally eliminated by a deductible of 1000. For a loss of size greater
than or equal to 1000, 1000 is eliminated by a deductible of size 1000. Summing these
contributions as vertical strips, as shown below the losses eliminated by a 1000 deductible
correspond to the area under both the curve and y = 1000.
Losses Eliminated by 1000 Deductible ⇔ area under both the curve and y = 1000
⇔ E[X ∧ 1000].
In general, the losses eliminated by a deductible d ⇔ the area below the curve and also
below the horizontal line at d.
Loss Elimination Ratio (LER) = losses eliminated / total losses.

The Loss Elimination Ratio is represented by the ratio of the area under both the curve and
y = 1000 to the total area under the curve.297 One can either calculate the area corresponding to the
losses eliminated by summing of horizontal strips of width S(t) or vertical strips of height t limited to
x x
x. Therefore: LER(x) = ∫0 S(t) dt / µ = { ∫0 t f(t) dt + xS(x)} / µ = E[X ∧ x] / E[X].
297
In the case of a Pareto with α = 4 and θ = 2400, the Loss Elimination Ratio at 1000 is 518.62 / 800 = 64.8% .
Excess Losses:
A loss of size y > 1000, contributes y - 1000 to the losses excess of a 1000 deductible. Losses of
size less than or equal to 1000 contribute nothing. Summing these contributions, as shown below
the losses excess of a 1000 deductible corresponds to the area under the distribution curve but
above y = 1000.
This area under the distribution curve but above y = 1000, are the losses excess of 1000,
E[(X - 1000)+], the numerator of the Excess Ratio.298 The denominator of the Excess Ratio is the
mean, the whole area under the distribution curve. Thus the Excess Ratio, R(1000) =
E[(X - 1000)+] / E[X], corresponds to the ratio of the area under the curve but above y = 1000, to
the total area under the curve.299
Since the Losses Eliminated by a 1000 deductible and the Losses Excess of 1000 sum to the total
losses, E[(X - 1000)+] = Losses Excess of 1000 = total losses - losses limited to 1000 ⇔
E[X] - E[X ∧ 1000]. LER(1000) + R(1000) = 1.
298
In the case of a Pareto with α = 4 and θ = 2400, the area under the curve and above the line y=1000 is:
θα (θ+y)1−α / (α−1) = 24004 / {(34003 )3} = 281.38.
299
In the case of a Pareto with α = 4 and θ = 2400, the Excess Ratio at 1000 is: 281.38 / 800 = 35.2% =
(2400 / 3400)3 = {θ/(θ+x)}α−1. R(1000) = 1 - LER(1000) = 1 - .648 = .352.
∞ ∞
E[(X - 1000)+] = ∫ S(t) dt = ∫ (t - 1000) f(t) dt .
1000 1000
The first integral ⇔ summing of horizontal strips.
The second integral ⇔ summing of vertical strips.
E[(X - 1000)+] ⇔ area under the distribution curve but above y = 1000.
In general, the losses excess of u ⇔ the area below the curve and also above the
horizontal line at u.
As was done before, one can divide the limited losses into two Area A and B.
1000
Area A = ∫0 t f(t) dt = 271.
Area B = (1000)S(1000) = (1000)(.248) = 248.

Area A + Area B = E[X ∧ 1000] = 519.
Area C = E[X] - E[X ∧ 1000] = 800 - 519 = 281.
Excess Ratio at 1000 = C / (A + B + C) = 281/ 800 = 35%.

Mean Excess Loss (Mean Residual Life):
The Mean Excess Loss can be written as the excess losses divided by the survival function:
∞ ∞
e(x) = ∫x S(t) dt / S(x) = ∫x (t - x) f(t) dt / S(x).
For example, the losses excess of a 1000 deductible corresponds to the area under the distribution
curve but above y = 1000:
5000
4000
3000
2000
Losses
Excess
of 1000
1000
S(1000) = 0.248
0.2 0.4 0.6 0.8 1.0
This area under the distribution curve but above y = 1000 is the numerator of the Mean Excess
Loss.300 The first integral above corresponds to the summing of horizontal strips. The second integral
above corresponds to the summing of vertical strips. The denominator of the Excess Ratio is the
survival function S(1000) = .248, which is the width along the horizontal axis of the area
corresponding to the numerator.
Thus the Mean Excess Loss, e(1000), corresponds to the average height of the area under the
curve but above y = 1000.301 For example, in this case that average height is e(1000) = 1133.
However, since the curve extends vertically to infinity, it is difficult to use this type of diagram to
distinguish the mean excess loss, particularly for heavy-tailed distributions such as the Pareto.
300
The numerator of the Mean Excess Loss is the same as the numerator of the Excess Ratio. In the case of a Pareto
with α = 4 and θ = 2400, the area under the curve and above the line y=1000 is:
θα (θ+y)1−α / (α−1) = 24004 / {(34003 )3} = 281.38.
301
In the case of a Pareto with α = 4 and θ = 2400, the Mean Excess Loss at 1000 is 281.38 / .2483 = 1133, since
S(x) = {θ/(θ+x)}α = (2400 / 3400)4 = .2483. Alternately, for the Pareto e(x) = (θ+x)/(α−1) = 3400/3 = 1133.
Layers of Loss:
Below is shown the effect of imposing a 3000 maximum covered loss:
Amount paid with a 3000 maximum covered loss ⇔ the losses censored from above at 3000
⇔ E[X ∧ 3000] ⇔ the area under both the curve and the horizontal line y = 3000.
The Layer of Loss between 1000 and 3000 would be those dollars paid in the presence of both a
1000 deductible and a 3000 maximum covered loss.
As shown below, the layer of losses from 1000 to 3000 is the area under the curve but between
the horizontal lines y = 1000 and y = 3000 ⇔ Area B.
Area A ⇔ Losses excess of 3000 ⇔ Losses not paid due to 3000 maximum covered loss.
Area C ⇔ Losses limited to 1000 ⇔ Losses not paid due to 1000 deductible.
Area B ⇔ Layer from 1000 to 3000
⇔ Losses paid with 1000 deductible and 3000 maximum covered loss.
Summing horizontal strips, Area B can be thought of as the integral of the survival function from 1000
to 3000:
3000
Layer of losses from 1000 to 3000 = ∫ S(t) dt = E[X ∧ 3000] - E[X ∧ 1000].
1000
This area is also equal to the difference of two limited expected values: the area below the curve
and y = 3000, E[X ∧ 3000], minus the area below the curve and y = 1000, E[X ∧ 1000].
In general, the layer from d to u ⇔ the area under the curve but also between the
horizontal line at d and the horizontal line at u.
Summing vertical strips this same area can also be expressed as the sum of an integral and the area
of a rectangle:
5000
4000
3000
2000
t-1000
1000
f(t)dt
0.2 0.4 0.6 0.8 1.0

3000
Layer of losses from 1000 to 3000 = ∫ (t - 1000) f(t) dt + (3000 - 1000) S(3000).
1000
For the Pareto example, losses excess of 3000 = E[X] - E[X ∧ 3000] =
2400/3 - (2400/3){1 - (2400/(2400 + 3000))3 } = 800 - 729.767 = 70.233.
Losses limited to 1000 = E[X ∧ 1000] = (2400/3){1 - (2400/(2400 + 1000))3 ) = 518.624.
Layer from 1000 to 3000 = E[X ∧ 3000] - E[X ∧ 1000] = 729.767 - 518.624 = 211.143.
(Losses limited to 1000) + (Layer from 1000 to 3000) + (Losses excess of 3000) =
518.624 + 211.143 + 70.233 = 800 = Mean ⇔ total losses.
3000
Alternately, the layer from 1000 to 3000 = ∫ (t - 1000) f(t) dt + (3000 - 1000) S(3000) =
1000
3000
24004
∫ (t - 1000) (4)
(t + 2400)5
dt + (2000){2400/(3000+2400)}4
1000
= 133.106 + 78.037 = 211.143.

The area below the Pareto distribution curve, which equals the mean of 800, can be divided into four
pieces, where the layer from 1000 to 3000 has been divided into the two terms calculated above:
Exercise: For a Pareto Distribution with α = 4 and θ = 2400, determine the expected losses in the
following layers: 0 to 500, 500 to 1000, 1000 to 1500, and 1500 to 2000.
θ ⎧ ⎛ θ ⎞ α− 1⎫ ⎧ ⎛ 2400 ⎞ 3 ⎫
[Solution: E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬ = 800 ⎨1 - ⎜ ⎟ ⎬.
α −1 ⎩ ⎝ θ + x⎠ ⎭ ⎩ ⎝ 2400 + x ⎠ ⎭
E[X ∧ 500] = 347. E[X ∧ 1000] = 519. E[X ∧ 1500] = 614. E[X ∧ 2000] = 670.
Expected Losses in layer from 0 to 500: E[X ∧ 500] = 347.
Expected Losses in layer from 500 to 10000: E[X ∧ 1000] - E[X ∧ 500] = 519 - 347 = 172.
Expected Losses in layer from 1000 to 1500: E[X ∧ 1500] - E[X ∧ 1000] = 614 - 519 = 95.
Expected Losses in layer from 1500 to 2000: E[X ∧ 2000] - E[X ∧ 1500] = 670 - 614 = 56.]
For a given width, lower layers of loss are larger than higher layers of loss.302
This is illustrated in the following Lee Diagram:
3000
2500
2000
D
1500
C
1000
B
500
0.2 0.4 0.6 0.8 1
Area A > Area B > Area C > Area D.
302
Therefore, incremental increased limits factors decrease as the limit increases.
These ideas are discussed in “On the Theory of Increased Limits and Excess of Loss Pricing,” by Robert Miccolis,
with discussion by Sheldon Rosenberg, PCAS 1997.
Various Formulas for a Layer of Loss:303
In the above diagram, the layer from a to b is: Area F + Area G.

There are various different formulas for the layer from a to b.
Algebraic Expression for Layer from a to b Corresponding Areas on the Lee Diagram
E[X ∧ b] - E[X ∧ a] (C + D + E + F + G) - (C + D + E)
E[(X - a)+] - E[(X - b)+] (F + G + H) - H

b
∫a (y - a) f(y) dy + (b-a)S(b) F+G
∫a y f(y) dy + (b-a)S(b) - a{F(b)-F(a)} (D + F) + G - D
∫a y f(y) dy + bS(b) - aS(a) (D + F) + (E + G) - (D + E)
303
See pages 58-59 of “The Mathematics of Excess of Loss Coverage and Retrospective Rating --- A Graphical
Approach” by Y.S. Lee, PCAS LXXV, 1988.
Average Sizes of Losses in an Interval:
As shown below, the dollars of loss on losses of size less than 1000 correspond to the area under
the curve and to the left of the vertical line x = F(1000) = .7517:
Summing up vertical strips, this left hand area corresponds to the integral of y f(y) from 0 to 1000.
As discussed previously, this area is the contribution of the small losses, one of the two pieces
making up E[X ∧ 1000]. The other piece of E[X ∧ 1000] was the contribution of the large losses,
1000S(1000). Thus the dollars of loss on losses of size less than 1000 can also be calculated as
E[X ∧ 1000] - 1000S(1000) or as the difference of the corresponding areas.
In this case, the area below the curve and to the left of x = F(1000) is:
1000
∫0 y f(y) dy = E[X ∧ 1000] - 1000S(1000) = 518.62 - 248.27 = 270.35.304
Therefore, the average size of the losses of size less than 1000 is:
270.35 / F(1000) = 270.35 / 0.75173 = 359.64.
In general, the losses of size a to b ⇔ the area below the curve and also between the
vertical line at a and the vertical line at b.
In the case of a Pareto with α = 4 and θ = 2400, E[X ∧ 1000] = (2400/3){1 - (2400/3400)3 } = 518.62, and
304
S(1000) = (2400/3400)4 = .24827.

As shown below, the dollars of loss on losses of size between 1000 and 2000 correspond to the
area under the curve and between the vertical lines the vertical line x = F(1000) = 0.7517 and
x = F(2000) = 0.9115:
Summing up vertical strips, this area corresponds to the integral of y f(y) from 1000 to 2000.
This area between the two vertical lines can also be thought of as the difference between the area to
the left of x = F(2000) and that to the left of x = F(1000). In other words the dollars of loss on losses
between 1000 and 2000 are the difference between the dollars of loss on losses of size less than
2000 and dollars of loss on losses of size less than 1000.
In this case, the area below the curve and to the left of x = F(2000) is:
E[X ∧ 2000] - 2000S(2000) = 670.17 - (2000)(.088519) = 493.13.
The area below the curve and to the left of x = F(1000) is:
E[X ∧ 1000] - 1000S(1000) = 518.62 - 248.27 = 270.35.
The area between the vertical lines is the difference:

2000 2000 1000
∫ y f(y) dy =
∫0 y f(y) dy -
∫0 y f(y) dy = 493.13 - 270.35 = 222.78.
1000
The average size of the losses of size between 1000 and 2000 is:
222.78 / {F(2000) -F(1000)} = 222.78 / (0.9115 - 0.7517) = 1394.
The numerator is the area between the vertical lines at F(2000) and F(1000) and below the curve,
while the denominator is the width of this area. The ratio is the average height of this area, the
average size of the losses of size between 1000 and 2000.
This is all summarized In the following Lee Diagram.

Area A = Dollars of loss on losses of size < 1000.
Area C = Dollars of loss on losses of size > 2000.
Area B = Dollars of loss on losses of size between 1000 and 2000.
Size
4000
3000
2000
C
1000
B
A
Prob.
0.75 0.91 1
The average size of the losses of size between 1000 and 2000 is:
Area B 222.78
= = 1394.
F(2000) - F(1000) 0.9115 - 0.7517
E[X ∧ 1000] = A + D = A + 1000S(1000). ⇒ Area A = E[X ∧ 1000] - 1000S(1000).
Size
4000
3000
2000
1000
D
A
Prob.
0.75 1
E[X ∧ 2000] = A + B + G = A + B + 2000S(2000).

⇒ Area A + Area B = E[X ∧ 2000] - 2000S(2000).
Size
4000
3000
2000
1000 G
B
A
Prob.
0.75 0.91 1
⇒ Dollars of loss on losses of size between 1000 and 2000 = Area B =

(A+B) - A = {E[X ∧ 2000] - 2000S(2000)} - {E[X ∧ 1000] - 1000S(1000)}.
Expected Amount by Which Losses are Less than a Given Value:
Assume we take 1000 - x for x ≤ 1000 and 0 for x > 1000. This is the amount by which X is less
than 1000, or (1000 - X)+. As shown below for various x < 1000, (1000 - X)+ is the height of a
vertical line from the curve to 1000, or from x up to 1000:
The expected amount by which X is less than 1000, E[(1000 - X)+], is Area A below:
Area B = E[X ∧ 1000]. Area A + Area B = area of a rectangle of width 1 and height 1000 = 1000.
E[(1000 - X)+] = the expected amount by which losses are less than 1000
= Area A = 1000 - Area B = 1000 - E[X ∧ 1000].
1000
We can also write area A as a sum of horizontal strips: ∫ F(x) dx .
0
d d
In general, E[(d - X)+] = ∫0 F(x) dx = ∫0 1- S(x) dx = d - E[X ∧ d].
E[(X - d)+] versus E[(d - X)+]:
In the following Lee Diagram, Area A = E[(1000 - X)+] and Area C = E[(X - 1000)+].
Area A + Area B = a rectangle of height 1000 and width 1.

Therefore, A + B = 1000.
B = 1000 - A = 1000 - E[(1000 - X)+].
Area B + Area C = area under the curve.

Therefore, B + C = E[X].
B = E[X] - C = E[X] - E[(X - 1000)+].
Therefore, 1000 - E[(1000 - X)+] = E[X] - E[(X - 1000)+].

Therefore, E[(X - 1000)+] - E[(1000 - X)+] = E[X] - 1000 = E[X - 1000].
In general, E[(X-d)+] - E[(d-X)+] = E[X] - d = E[X - d].
Payments Subject to a Minimum:
Max[X, 1000] = A + B + C = (A + B) + C = 1000 + (E[X] - E[X ∧ 1000]).

Max[X, 1000] = A + B + C = A + (B + C) = (1000 - E[X ∧ 1000]) + E[X].
Payments Subject to both a Minimum and a Maximum:
Min[Max[X, 1000], 3000] = A + B + C = (A + B) + C = 1000 + (E[X ∧ 3000] - E[X ∧ 1000]).

Min[Max[X, 1000], 3000] = A + B + C = A + (B + C) = (1000 - E[X ∧ 1000]) + E[X ∧ 3000].
Inflation:
After 50% uniform inflation is applied to the original Pareto distribution, with α = 4 and
θ = 2400, the revised distribution is also a Pareto with α = 4 but with θ = (1.5)(2400) = 3600.
Here are the original Pareto (solid) and the Pareto after inflation (dashed):
The increase in the losses due to inflation corresponds to the area between the distribution curves.
The total area under the new curve is: (1.5)(800) = 3600 / (4-1) = 1200. The area under the old
curve is 800. The increase in losses is the difference = 1200 - 800 = (.5)(800) = 400. The increase
in losses is 50% from 800 to 1200.
The losses excess of 1000 are above the horizontal line at 1000. Prior to inflation the excess losses
are below the original Pareto (solid), Area E. After inflation the excess losses are below the revised
Pareto (dashed), Areas D + E:
Area D represents the increase in excess losses due to inflation, while Area A represents the
increase in limited losses due to inflation. Note that the excess losses have increased more quickly
(as a percent) than the total losses, while the losses limited to 1000 have increased less quickly (as
a percent) than the total losses.
The loss excess of 1000 for a Pareto with α = 4 and θ = 3600 is: 1200 - E[X ∧ 1000] =
1200 - 624.80 = 575.20, Areas D + E above. The loss excess of 1000 for a Pareto with α = 4 and
θ = 2400 is: 800 - E[X ∧ 1000] = 800 - 518.62 = 281.38, Area E above. Thus under uniform inflation
of 50%, in this case the losses excess of 1000 have increased by 104.4%, from 281.38 to 575.20.
The increase in excess losses is Area D above, 575.20 - 281.38 = 293.82.
The loss limited to 1000 for a Pareto with α = 4 and θ = 3600 is: E[X ∧ 1000] = 624.80. The loss
limited to 1000 for a Pareto with α = 4 and θ = 2400 is: E[X ∧ 1000] = 518.62. Thus under uniform
inflation of 50%, in this case the losses limited to 1000 have increased by only 20.5%, from 518.62
to 624.80. The increase in limited losses is Area A above,
624.80 - 518.62 = 106.18.
The total losses increase from 800 to 1200; Area A + Area D = 293.82 + 106.18 = 400.
Another version of this same Lee Diagram, showing the numerical areas:
% Increase in Limited Losses is: 106/519 = 20% < 50%.

% Increase in Excess Losses is: 294/281 = 105% > 50%.
% Increase in Total Losses is: (106 + 294)/ (519 + 281) = 400/800 = 50%.
In the earlier year, the losses limited to 2000 are below the horizontal line at 2000, and below the
solid Pareto, Area A in the Lee Diagram below.
In the later year, the losses limited to 3000 are below the horizontal line at 3000, and below the
dotted Pareto, Areas A + B + C in the Lee Diagram below.
Every loss in combined Areas A + B + C is exactly 1.5 times the height of a corresponding loss in
Area A.
Showing that Elater year[X ∧ 3000] = 1.5 Eearlier year[X ∧ 3000/1.5] = 1.5 Eearlier year[X ∧ 2000].
Call Options:305
The expected payoff on a European Call on a stock is equal to E[(ST - K)+], where ST is the future
price of the stock at expiration of the option, time T.
Let F(x) be the distribution of the future stock price at time T.306
E[(X - K)+] is the expected losses excess of K, and corresponds to the area above the horizontal
line at height K and also below the curve graphing F(x) in the following Lee Diagram:
As K increases, the area above the horizontal line at height K decreases; in other words, the value of
the call decreases as K increases.
305
Not on the syllabus of this exam. See “Mahlerʼs Guide to Financial Economics.”
306
A common model is that the future prices of a stock follow a LogNormal Distribution.
For an increase in K of ΔK, the value of the call decreases by Area A in the following Lee Diagram:
Stock Price
K+ΔK
A
K
Prob.
1
The absolute change in the value of the call, Area A, is smaller than a rectangle of height ΔK and
width 1 - F(K). Thus Area A is smaller than ΔK {1 - F(K)} ≤ ΔK. Thus a change of ΔK in the strike
price results in an absolute change in the value of the call option smaller than ΔK.
The following Lee Diagram shows the effect of raising the strike price by fixed amounts.
The successive absolute changes in the value of the call are represented by Areas A, B, C, and D.
We see that the absolute changes in the value of the call get smaller as the strike price increases.
Put Options:307
The expected payoff of a European put is E[(K - ST)+], where ST is the future price of the stock at
expiration of the option, time T.
Let F(x) be the distribution of the future stock price at time T.
Then this expected payoff corresponds to Area P below the horizontal line at height K and also
above the curve graphing F(x) in the following Lee Diagram:
As K increases, the area below the horizontal line at height K increases; in other words, the value of
the put increases as K increases.
307
Not on the syllabus of this exam. See “Mahlerʼs Guide to Financial Economics.”
For an increase in K of ΔK, the value of the put increases by Area A in the following Lee Diagram:
Stock Price
K+ΔK
A
K
Prob.
1
The change in the value of the put, Area A, is smaller than a rectangle of height ΔK and width F(K).
Thus Area A is smaller than ΔK F(K) ≤ ΔK. Thus a change of ΔK in the strike price results in a change
in the value of the put option smaller than ΔK.
The following Lee Diagram shows the effect of raising the strike price by fixed amounts.
The successive changes in the value of the put are represented by Areas A, B, C, and D.
We see that the changes in the value of the put get larger as the strike price increases.
Tail Value at Risk (TVaR):308
The Tail Value at Risk of a loss distribution is defined as: TVaRp ≡ E[X | X > πp ],
where the percentile πp is such that F(πp ) = p.
Exercise: For a Pareto Distribution with α = 4 and θ = 2400, determine π0.90.
[Solution: 0.90 = 1 - {2400/(2400 + x)}4. ⇒ x = 1868.]
TVaR0.90 is the average size of those losses of size greater than π0.90 = 1868.
The denominator of TVaR0.90 is: 1 - 0.90 = 0.10.
The numerator of TVaR0.90 is Area A + Area B in the following Lee Diagram:
Therefore, TVaR0.90 is the average height of Areas A + B.

Area A has height π0.90 = 1868.
Area B is the expected losses excess of π0.90 = 1868.
The average height of Area B is the mean excess loss, e(1868) = e(π0.90).
Therefore, TVaR0.90 = π0.90 + e(π0.90). In general, TVaRp = πp + e(πp ).

308
See “Mahlerʼs Guide to Risk Measures.”
Problems:

The size of loss distribution F(x), with corresponding survival function S(x) and density f(x), is shown
in the following diagram, with probability along the horizontal axis and size of loss along the vertical
axis. Express each of the stated quantities algebraically in terms of the six labeled areas in the
diagram: α, β, γ, δ, ε, η.
η
5
ε
γ
2
β δ
α
0
0 F(2) F(5) 1
37.1 (1 point) E[X].
37.2 (1 point) Losses from claims of size less than 2.
37.3 (1 point) Portion of total losses in the layer from 0 to 2.
2
37.4 (1 point) ∫0 x dF(x) + 2{1 - F(2)}.
37.5 (1 point) Portion of total losses from claims of size more than 2.
37.7 (1 point) E[X ∧ 5].
37.8 (1 point) Portion of total losses from claims of size less than 5.
37.10 (1 point) R(2) = excess ratio at 2 = 1 - LER(2).
∞
37.11 (1 point) ∫5 x dF(x) .
37.12 (1 point) Portion of total losses in the layer from 2 to ∞.
37.13 (1 point) LER(5) = loss elimination ratio at 5.
37.14 (1 point) 2(F(5)-F(2)).
37.15 (1 point) Portion of total losses from claims of size between 2 and 5.
∞
37.16 (1 point) ∫5 S(t) dt .
37.17 (1 point) LER(2) = loss elimination ratio at 2.

∞
37.18 (1 point) ∫5 (t - 5) f(t) dt .
37.19 (1 point) 2S(5).
37.20 (1 point) R(5) = excess ratio at 5 = 1 - LER(5).
37.21 (1 point) e(5) = mean excess loss at 5.
5
37.22 (1 point) ∫2 (t - 2) f(t) dt .
37.23 (1 point) Losses in the layer from 5 to ∞.
5
37.24 (1 point) ∫2 (1 - F(t)) dt .
37.25 (1 point) 3S(5).
37.26 (1 point) e(2) = mean excess loss at 2.
37.27 (2 points) Using Leeʼs “The Mathematics of Excess of Loss Coverages and Retrospective
Rating -- A Graphical Approach,” show graphically why the limited expected value increases at a
decreasing rate as the limit is increased.
Label all axes and explain your reasoning in a brief paragraph.
37.28 (2 points) Losses follow an Exponential Distribution with mean 500.

Using Leeʼs “The Mathematics of Excess of Loss Coverages and Retrospective Rating --
A Graphical Approach,” draw a graph to show the expected losses from those losses of size 400
to 800. Label all axes.

Prior to the effects of any maximum covered loss or deductible, losses follow a Weibull Distribution
with θ = 300 and τ = 1/2.
Using Leeʼs “The Mathematics of Excess of Loss Coverages and Retrospective Rating -- A
Graphical Approach,” draw a graph to show the expected payments. Label all axes.
37.29 (1 point) With no deductible and no maximum covered loss.
37.30 (1 point) With a 500 deductible and no maximum covered loss.
37.31 (1 point) With no deductible and a 1500 maximum covered loss.
37.32 (1 point) With a 500 deductible and a 1500 maximum covered loss.
37.33 (1 point) With a 500 franchise deductible and no maximum covered loss.
37.34 (2 points) You are given the following graph of the cumulative loss distribution:
2500
2000
1500
B
1000
500
A
0.63 1
• Size of the area labeled A = 377.
• Size of the area labeled B = 139.
Calculate the loss elimination ratio at 1000.
A. Less than 0.6
E. At least 0.9
37.35 (3 points) The following graph is of the cumulative loss distribution in 2001:
7500
P
5000
3333
1500
T
1000
U
667
V
Prob.
1

A policy in 2008 has a 1000 deductible and a 5000 maximum covered loss.
Which of the following represents the expected size of loss under this policy?
A. 1.5(Q + R + T)
B. 1.5(R + T + U)
C. (P + Q + R)/1.5
D. (Q + R + T)/1.5
37.36 (2 points) The size of loss distribution is shown in the following diagram, with probability
along the horizontal axis and size of loss along the vertical axis.
Which of the following represents the expected losses under a policy with a franchise deductible of
2 and a maximum covered loss of 5?
A. γ + ε B. α + β + γ C. δ + ε + η D. β + γ + δ + ε E. β + γ + δ + ε + η
η
5
ε
γ
2
β δ
α
0
0 F(2) F(5) 1
37.37 (2 points) The following graph shows the distribution function, F(x), of loss severities.
P
6250
Q
5000
R
4000
F(x)
1
A policy has a 5000 maximum covered loss and an 80% coinsurance.

Which of the following represents the expected size of loss under this policy?
A. Q + R
B. R + T
C. Q + R + T
D. P + Q + R + T
37.38 (3 points) For a Pareto Distribution with α = 4 and θ = 3, draw a Lee Diagram,
showing the curtate expectation of life, e0 .
For the next nine questions, use the following graph of the cumulative loss distribution:
2000
1500
1000
B
500
0.323 0.762 1
Size of the area labeled A = 107. Size of the area labeled B = 98. The mean size of loss is 750.
Calculate the following items:
37.39 (1 point) The average size of loss for those losses of size less than 500.
37.40 (1 point) The loss elimination ratio at 500.
37.41 (1 point) The average payment per loss with a deductible of 500 and
a maximum covered loss of 1000.
37.42 (1 point) The mean excess loss at 500.
37.43 (1 point) The average size of loss for those losses of size between 500 and 1000.
37.44 (1 point) The loss elimination ratio at 1000.
37.45 (1 point) The average payment per payment with a deductible of 500 and
a maximum covered loss of 1000.
37.46 (1 point) The mean excess loss at 1000.
37.47 (1 point) The average size of loss for those losses of size greater than 1000.
37.48 (3 points) You are given the following graph of cumulative distribution functions.
Size
P
L/(1+r)
Prob.
1
The thicker curve is the cumulative distribution function for the size of loss in a later year, F(x), with
corresponding Survival Function S(x) and density f(x).
There is total inflation of r between this later year and an earlier year.
The thinner curve is the cumulative distribution function for the size of loss in this earlier year.
Which of the following is an expression for Area P?
L
A. ∫ S(x) dx - S(L)Lr.
L/ (1+r)
L
B. ∫ x f(x) dx - L{S(L/(1+r)) - S(L)}.
L/ (1+r)
L
C. (1+r) ∫ S(x) dx - S(L)Lr/(1+r).
L/ (1+r)
L
D. ∫ x f(x) dx - L{F(L) - F(L/(1+r))}/(1+r).
L/ (1+r)

37.49 (1 point) Let a size of loss distribution be given by F(x) with density f(x).
Let S(x) = 1 - F(x).
For the layer of losses from d to u, give two different forms each involving integrals.
Briefly describe how these two forms relate to the graphical Lee diagrams.
37.50 (1 point) Draw a graph of a loss distribution F(x) in the manner described by Lee.
What geometrical quantity corresponds to P(X ≤ L)?
37.51 (3.5 points) For Workers Compensation Insurance in Massachusetts, let

R = workers weekly wage divided by the state average weekly wage.
You are given the following information:
R Percentage of Workers with Wages At Most R times the State Average Weekly Wage
0.25 2.2%
0.50 11.3%
0.75 35.0%
1.00 57.5%
1.25 74.0%
1.50 85.3%
1.75 92.9%
2.00 96.9%
2.50 99.3%
3.00 99.7%
(a) (2.5 points) Draw a Lee Diagram. Label the axes.
(b) (0.5 point) Label the area corresponding to the percentage of wages earned by those making
at most 150% of the state average weekly wage.
(c) (0.5 point) Assume injured workers are paid a benefit equal to their average weekly wage,
subject to a maximum benefit of the state average weekly wage.
Label the area corresponding to the average benefit paid.
37.52 (8 points) Losses on a policy have the following distribution:

• 50% probability of a loss between $0 and $10,000
• 30% probability of a loss between $10,000 and $25,000
Losses are uniformly distributed within each range.
Assume a 40% trend is applied uniformly to all losses.
a. (3 points) Draw a Lee diagram depicting the cumulative loss distribution described above
before and after the 40% trend. Label all relevant features of the diagram.
b. (5 points) Use Lee diagrams to calculate the amounts in the layer from 20,000 to 70,000
both prior and subsequent to trend, and thus to calculate implied trend for this layer.
37.53 (4 points) As a result of benefit reforms, claims in a given line of insurance have been sharply
reduced in size. The curves in the diagram below represent the severity distribution before and after
reform. An insurance company writes coverage in this line of insurance, excess of a self-insured
retention S.
Size of Loss
K
M
S
J
D
E
C
A B
Probability
1
Define small claims as those that were of size ≤ S prior to the benefit reform.
Define medium claims as those that were of size > S prior to the benefit reform,
and are of size ≤ S after the benefit reform.
Define large claims as those that are of size > S after the benefit reform.
Put each of the verbal descriptions in terms of the labeled areas and S.
1. Reduction due to benefit reform in the insuredʼs expected retained losses from small claims.
2. After benefit reform, contribution to the insuredʼs expected retained losses from small claims.
3. Reduction due to benefit reform in the insuredʼs expected retained losses from medium claims.
4. After benefit reform, contribution to the insuredʼs expected retained losses from medium claims.
5. Reduction due to benefit reform in the insuredʼs expected retained losses from large claims.
6. After benefit reform, contribution to the insuredʼs expected retained losses from large claims.
7. Reduction due to benefit reform in the insurerʼs expected losses from small claims.
8. After benefit reform, contribution to the insurerʼs expected losses from small claims.
9. Reduction due to benefit reform in the insurerʼs expected losses from medium claims.
10. After benefit reform, contribution to the insurerʼs expected losses from medium claims.
11. Reduction due to benefit reform in the insurerʼs expected losses from large claims.
12. After benefit reform, contribution to the insurerʼs expected losses from large claims.
13. Prior to benefit reform, the insurerʼs losses eliminated by the retention S. Use Area D.
37.54 (CAS9, 11/92, Q.8) (1 point) According to Lee in "The Mathematics of Excess of Loss
Coverages and Retrospective Rating - A Graphical Approach," which of the following statements
are true?
Assume that claims of all sizes inflate by a constant factor a (x' = ax). The cumulative distribution
function is F(x) before inflation and F'(x') after inflation. R is the retention.
Loss Size
F'(x')
H
F(x)
G
R
E D
R/a
A B
C
Cumulative Frequency
1
1. (G + H) / G > a.
2. (B + C + D + E) / (D + C) > a.
3. (B + C + D + E) = a C.
A. 1 only B. 2 only C. 3 only D. 1 and 2 E. 1 and 3
37.55 (CAS9, 11/94, Q.40) (2 points) Answer the following based on "The Mathematics of
Excess of Loss Coverage and Retrospective Rating - A Graphical Approach" by Lee.
For a random variable, X, that can have a value in the interval (0, b], the limited expected value
function of X with limit equal to t is given by the function:
E[X ∧ t] = t - (0.5)(t2 / b); for 0 < t ≤ b.
Using this limited expected value function, calculate the probability density function f(x) for x.
Show all of your work.
37.56 (CAS9, 11/94, Q.41) (3 points) Answer the questions below using graphs in the style of
Lee in his paper "The Mathematics of Excess of Loss Coverage and Retrospective Rating - A
Graphical Approach."
A company writes two lines of business, A and B. Each line has identical loss characteristics, except
that their severity distributions (which have the same mean of $10,000 and the same approximately
normal form) have different standard deviations. The standard deviation of A is large, although there
is insignificant probability of claims near zero. The standard deviation of B is small. Assume that
policy limits do not apply to coverage under A or B.
a. (1.5 points) For a deductible of $8,000, how would you expect the loss elimination ratios (LERs)
for the two lines to compare?
b. (1.5 points) Suppose that the annual claim cost trend factor is α. Assume positive trend, with
trend factor α > 1.000, and assume that α is uniform by size of loss. Compare the expected
effect of this trend on the LERs for these two lines.
37.57 (CAS9, 11/96, Q.39) (2 points) Using Lee's "The Mathematics of Excess of Loss
Coverages and Retrospective Rating - A Graphical Approach," draw a graph and use letters to
show the impact of inflation on each of the following types of loss.
Assume that claims of all sizes inflate by a constant factor of 1 + r.
1. total losses
2. basic limit losses
3. excess limit losses
For each type of loss, describe the impact as the ratio of losses after inflation to losses before
inflation, and compare this impact to the overall inflation rate of 1 + r.
37.58 (CAS9, 11/99, Q.34) (2 points) Using Lee's "The Mathematics of Excess of Loss
Coverages and Retrospective Rating - A Graphical Approach," answer the following:
a. (0.5 point) If the total limits inflation rate is 6%, describe why the inflation rate for the basic limits
coverage is lower than 6%.
b. (1 point) Use Lee to graphically justify your answer.
c. (0.5 point) What are the two major reasons why the inflation rate in the excess layer is greater
than the total limits inflation rate?
37.59 (CAS9, 11/02, Q.43) (3 points) a. (1.5 points) Using Lee's "The Mathematics of Excess of
Loss Coverages and Retrospective Rating - A Graphical Approach," draw a graph to show what
the expected losses would be for an excess of loss contract, covering losses in excess of retention
(R), subject to a maximum limit (L). Include all appropriate labels on the graph.
b. (1.5 points) Assuming retention (R) but no maximum limit (L), describe how inflation will affect the
expected losses for the excess cover relative to unlimited ground up losses. Explain your answer
graphically or in words.
37.60 (5, 5/03, Q.14) (1 point)

∞
Given E[x] = ∫0 x f(x) dx = $152,500
and the following graph of the cumulative loss distribution, F(x), as a function of the size of loss, x,
calculate the excess ratio at $100,000.
• Size of the area labeled Y = $12,500
Loss Size (x)
$100,000
0.20 1.00
F(x) = Cumulative Claim Frequency
A. Less than 0.3

E. At least 0.9
37.61 (CAS3, 11/03, Q.20) (2.5 points) Let X be the size-of-loss random variable with
cumulative distribution function F(x) as shown below:
Which expression(s) below equal(s) the expected loss in the shaded region?
∞
I. ∫K x dF(x)
K
II. E(x) - ∫0 x dF(x) - K[1-F(K)]
∞
Ill. ∫K [1 - F(x)] dx
A. I only
B. II only
C. Ill only
D. I and Ill only
E. II and Ill only
37.62 (CAS3, 11/03, Q.23) (2.5 points)

F(x) is the cumulative distribution function for the size-of-loss variable, X.
P, Q, R, S, T, and U represent the areas of the respective regions.
What is the expected value of the insurance payment on a policy with a deductible of "DED" and a
limit of "LIM"? (For clarity, that is a policy that pays its first dollar of loss for a loss of
DED + 1 and its last dollar of loss for a loss of LIM.)
A. Q B. Q+R C. Q+T D. Q+R+T+U E. S+T+U

37.63 (CAS3, 5/04, Q.33) (2.5 points)

F(x) is the cumulative distribution function for the size-of-loss variable, X.
P, Q, R, S, T, and U represent the areas of the respective regions.
What is the expected value of the savings to the insurance company of implementing a franchise
deductible of “DED" and a limit of “LIM" to a policy that previously had no deductible and no limit?
(For clarity, that is a policy that pays its first dollar of loss for a loss of DED + 1 and its last dollar of
loss for a loss of LIM.)
A. S B. S+P C. S+Q+P D. S+P+R+U E. S+T+U+P

37.64 (CAS3, 11/04, Q.30) (2.5 points) Let X be a random variable representing an amount of
loss. Define the cumulative distribution function F(x) as F(x) = Pr(X ≤ x).
Determine which of the following formulas represents the shaded area.

b b
A. ∫a x dF(x) + a - b + aF(b) - bF(a) B. ∫a x dF(x) + a - b + aF(a) - bF(b)
b b
C. ∫a x dF(x) - a + b + aF(b) - bF(a) D. ∫a x dF(x) - a + b + aF(a) - bF(b)
b
E. ∫a x dF(x) - a + b - aF(a) + bF(b)
37.65 (CAS3, 5/06, Q.28) (2.5 points)

The following graph shows the distribution function, F(x), of loss severities in 2005.
P
1.1 D
Q R
D
S T U
D/1.1
0 1 F(x)
Loss severities are expected to increase 10% in 2006 due to inflation.

A deductible, D, applies to each claim in 2005 and 2006.
Which of the following represents the expected size of loss in 2006?
A. P
B. 1.1P
C. 1.1(P+Q+R)
D. P+Q+R+S+T+U
E. 1.1(P+Q+R+S+T+U)
37.66 (CAS8, 11/11, Q.11) (3 points) Losses follow a uniform distribution between $0 and $100.
Use a Lee diagram to calculate the implied trend for the layer from $25 to $75 ($50 excess of $25.)
Label all relevant features of the diagram.
37.67 (CAS8, 11/12, Q.22) (5 points) The current deductible pricing for an auto insurer is based
on the following claim distribution:
Size of Loss Number of Claims
$100 21
$250 50
$500 42
$1,000 37
$5,000 22
An actuary wants to review the effect of loss trend on the insurer's loss elimination ratios.
a. (1.5 point) Calculate the loss elimination ratio for a straight $500 deductible assuming no
trend adjustment.
b. (2.5 points) Assuming no frequency trend, calculate the percentage change in
the loss elimination ratio for a straight $500 deductible assuming
a ground-up loss severity trend of 10%.
c. (1 point) Explain why the loss cost for a given straight deductible policy can increase
by more than the ground-up severity trend.
37.68 (CAS8, 11/14, Q.6) (9 points) Losses on a policy have the following distribution:
• 60% probability of a loss between $0 and $250,000
• 10% probability of a loss between $500,000 and $1 million
Losses are uniformly distributed within each range.
a. (4 points) Draw a diagram depicting the cumulative loss distribution described above before
and after the 20% trend. Label all relevant features of the diagram.
b. (5 points) Calculate the implied trend for the layer $500,000 excess of $500,000.
(The layer from $500,000 to $1 million.)
37.1. α+β+γ+δ+ε+η, the mean is the area under the distribution curve.
37.2. α, the result of summing vertical strips under the curve from zero to F(2).
37.3. (α+β+δ) / (α+β+γ+δ+ε+η) = E[X ∧ 2] / E[X].
37.4. E[X ∧ 2] is: α+β+δ, the area under the curve and the horizontal line at 2.
37.5. (β+γ+δ+ε+η) / ( α+β+γ+δ+ε+η) = 1 - α / ( α+β+γ+δ+ε+η).
37.6. (α+β+γ+δ+ε) / (α+β+γ+δ+ε+η) = 1 - η / (α+β+γ+δ+ε+η) = E[X ∧ 2]) / E[X].
37.7. α+β+δ+γ+ε, the area under the curve and the horizontal line at 5.
37.8. (α+β+γ) / (α+β+γ+δ+ε+η), the numerator is the result of summing vertical strips under the
curve from zero to F(5), while the denominator is the total area under the curve.
37.9. (γ+ε) / (α+β+γ+δ+ε+η) = (E[X ∧ 5] - E[X ∧ 2]) / E[X].
37.10. (γ+ε+η) / (α+β+γ+δ+ε+η) = (E[X] - E[X ∧ 2])) / E[X].
37.11. Losses from claims of size more than 5: δ+ε+η.
37.12. (γ+ε+η) / (α+β+γ+δ+ε+η) = (E[X] - E[X ∧ 2])) / E[X].
37.13. (α+β+γ+δ+ε) / (α+β+γ+δ+ε+η) = 1 - η / (α+β+γ+δ+ε+η) = E[X ∧ 5] / E[X].
37.14. β, a rectangle of height 2 and width F(5) - F(2).
37.15. (β+γ) / ( α+β+γ+δ+ε+η), the numerator is the result of summing vertical strips under the curve
from F(2) to F(5), while the denominator is the total area under the curve.
37.16. η = E[X] - E[X ∧ 5], the sum of horizontal strips of length S(t) = 1-F(t) between the horizontal
lines at 5 and ∞.
37.17. (α+β+δ) / (α+β+γ+δ+ε+η) = E[X ∧ 2]) / E[X].
37.18. η, the sum of vertical strips of height t-5 between the vertical lines at F(5) and 1.
37.19. δ, a rectangle of height 2 and width S(5).
37.20. η / ( α+β+γ+δ+ε+η) = (E[X] - E[X ∧ 5]) / E[X].
37.21. e(5) = η /S(5). But, δ = 2S(5) and ε = 3S(5). Thus e(5) = 3η/ε or 2η/δ.
37.22. γ, the sum of vertical strips of height t-2 between the vertical lines at F(2) and F(5).
37.23. η = E[(X - 5)+] = E[X] - E[X ∧ 5].
37.24. γ+ε = E[X ∧ 5] - E[X ∧ 2], the sum of horizontal strips of length 1-F(t) between the horizontal
lines at 2 and 5.
37.25. ε, a rectangle of height 5 -2 and width 1-F(5) = S(5).
37.26. e(2) = losses excess of 2 / S(2) = (γ+ε+η) /S(2).

But, β+δ = 2S(2). ⇒ e(2) = 2(γ+ε+η) / (β+δ).
37.27. In the Lee diagram below, E[X ∧ 1000] = Area A.
E[X ∧ 2000] = Area A + Area B. Therefore, Area B = E[X ∧ 2000] - E[X ∧ 1000].
Area B is the increase in the limited expected value due to increasing the limit from 1000 to 2000.
Similarly, Area C is the increase in the limited expected value due to increasing the limit by another
1000. Area C < Area B, and therefore the increase in the limited expected value is less. In general
the areas of a given height get smaller as one moves up the diagram, as the curve moves closer to
the righthand asymptote. Therefore, the rate of increase of the limited expected value decreases.
Comment: The diagram was based on a Pareto Distribution with α = 4 and θ = 2400.
37.28. If y = size of loss and x = probability, then for an Exponential with θ = 500,
x = 1 - exp[-y/500]. Therefore, y = -500 ln[1 - x].
loss of size 400 ⇔ probability = 1 - e-0.8 = 0.551.
loss of size 800 ⇔ probability = 1 - e-1.6 = 0.798.

Losses of size 400 to 800 corresponds to the area below the curve and between vertical lines at
0.551 and 0.798:
Size of Loss
800
400 Losses of
size 400
to 800
Probability
0.2 0.4 0.6 0.8 1.0
37.29. If y = size of loss and x = probability, then for a Weibull with θ = 300 and τ = 1/2,
x = 1 - exp[-(y/300)0.5]. Therefore, y = 300 ln[1 -x]2 . Lee Diagram:
Comment: One has to stop graphing at some size of loss, unless one has infinite graph paper!
In this case, I only graphed up to 2500.
37.30. The payments with a 500 deductible and no maximum covered loss are represented by
the area above the line at height 500 and to the right of the curve:
37.31. The payments with no deductible and a 1500 maximum covered loss are represented by
the area below the line at height 1500, and to the right of the curve:
37.32. The payments with a 500 deductible and a 1500 maximum covered loss are represented
by the area above the line at height 500, below the line at height 1500, and to the right of the curve:
37.33. Under a 500 franchise deductible, nothing is paid on a loss of size 500 or less, and the
whole loss is paid for a loss of size greater than 500. The payments with a 500 franchise deductible
and no maximum covered loss are the losses of size greater than 500, represented by the area to
the right of the vertical line at F(500) and below the curve:
37.34. D. Area C is a rectangle with height 1000 and width (1 - 0.63) = 0.37, with area 370.
2500
2000
1500
B
1000
500
A C
0.63 1
Expected losses limited to 1000 = Area A + Area C = 377 + 370 = 747.
E[X] = Area A + Area B + Area C = 377 + 139 + 370 = 886.
Loss elimination ratio at 1000 = E[X ∧ 1000]/E[X] = 747/886 = 84.3%.
The Lee Diagram was based on a Weibull Distribution with θ = 1000 and τ = 2.
37.35. B. Deflate 5000 from 2008 to 2001, where it is equivalent to: 5000/1.5 = 3333.
Deflate 1000 from 2008 to 2001, where it is equivalent to: 1000/1.5 = 667. Then the average
expected loss in 2001 is the area between horizontal lines at 667 and 3333, and under the curve:
R+T+U. In order to get the expected size of loss in 2008, reinflate back up to the 2008 level, by
multiplying by 1.5: 1.5(R + T + U).
37.36. D. A policy with a franchise deductible of 2 pays the full amount of all losses of size greater
than 2. This is the area under the curve and to the right of the vertical line at 2:
β + γ + δ + ε + η. However, there is also a maximum covered loss of 5, which means the policy does
not pay for the portion of any loss greater than 5, which eliminates area η, above the horizontal line at
5. Therefore the expected payments are: β + γ + δ + ε.
37.37. E. Prior to the effect of the coinsurance, the expected size of loss is below the line at 5000
and below the curve: R + T. We multiply by 80% before paying under this policy.
The expected size of loss is: 0.8(R + T).
37.38. The curtate expectation of life, e0 , is the sum of a series of rectangles, each with height 1,
with areas: S(1) = (3/4)4 , S(2) = (3/5)4 , S(3) = (3/6)4 , etc.
The first six of these rectangles are shown below:
Size of Loss
Prob.
0.2 0.4 0.6 0.8 1
Comment: e0 < e(0) = E[X] = area under the curve.
The curtate expectation of life is discussed in Actuarial Mathematics.
37.39. We are given Area A = 107 and Area B is 98. We can get the areas of three rectangles.
One rectangle has width: .762 - .323 = .439, height 500, and area: (.439)(500) = 219.5.
Two rectangles have width: 1 - .762 = .238, height 500, and area: (.238)(500) = 119.
The total area under the curve is equal to the mean, given as 750.
Therefore, the area under the curve and above the horizontal line at 1000 is:
750 - (107 + 219.5 + 98 + 119 + 119) = 87.5.
2000
1500
87.5
1000
98 119
500
107 219.5 119
0.323 0.762 1
The average size of loss for those losses of size less than 500 is:
(dollars from losses of size less than 500)/F(500) = 107/.323 = 331.
Comment: The Lee Diagram was based on a Gamma Distribution with α = 3 and θ = 250.
37.40. LER(500) = E[X ∧ 500]/E[X] = (107 + 219.5 + 119)/750 = 59.4%.
37.41. The average payment per loss with a deductible of 500 and maximum covered loss of
1000 is: layer from 500 to 1000 ⇔
the area under the curve and between the horizontal lines at 500 and 1000 ⇔
98 + 119 = 217.
37.42. e(500) = (losses excess of 500)/S(500) = (98 + 119 + 87.5) / (1 - 0.323) = 450.
37.43. The average size of loss for those losses of size between 500 and 1000 is:
(dollars from losses of size between 500 and 1000) / {F(1000) - F(500)} =
(219.5 + 98) / (0.762 - 0.323) = 723.
37.44. The loss elimination ratio at 1000 = E[X ∧ 1000] / E[X] =

(107 + 219.5 + 119 + 98 + 119) / 750 = 88.3%.
Alternately, excess ratio at 1000 is: 87.5/750 = 11.7%. ⇒ LER(1000) = 1 - 11.7% = 88.3%.
37.45. The average payment per (non-zero) payment with a deductible of 500 and maximum
covered loss of 1000 is: (average payment per loss)/S(500) = 217 / (1 - 0.323) = 321.
37.46. e(1000) = (losses excess of 1000) / S(1000) = 87.5 / (1 - 0.762) = 368.
37.47. The average size of loss for those losses of size greater than 1000 is:
(dollars from losses of size > 1000)/S(1000) = (87.5 + 119 + 119) / (1 - 0.762) = 1368.
37.48. D. The rectangle below Area P plus Area P, represent those losses of size between
L/(1+r) and L.
Size
P
L/(1+r)
Prob.
1
L
⇒ Area P + Rectangle = ∫ x f(x) dx .
L / (1+r)
The rectangle has height: L/(1+r), and width: F(L) - F(L/(1+r)).

L
⇒ Area P = ∫ x f(x) dx - {F(L) - F(L/(1+r))} L/(1+r).
L / (1+r)
Comment: Area P can be written in other ways as shown below:
Size
P
L/(1+r)
Prob.
1
Area P plus the rectangle to the left of Area P, represents the layer of loss from L/(1+r) to L.
L
⇒ Area P + Rectangle = ∫ S(x) dx .
L / (1+r)
The rectangle has height: L - L/(1+r), and width: S(L).

L L
⇒ Area P = ∫ S(x) dx - S(L){L - L/(1+r)} = ∫ S(x) dx - S(L) L r/(1+r).
L / (1+r) L / (1+r)
u
37.49. The layer can be gotten by adding horizontal strips: ∫d G(x) dx .
u
Also, the layer can be gotten by adding vertical strips: ∫d (x - d) f(x) dx + (u-d)G(u).
Comment: In the second version, the first term is the contribution of medium losses, while the
second term is the contribution of large losses.
37.50. The vertical axis is size of loss. Where the horizontal line at height L hits the curve is F(L).
Dropping a vertical line, where we hit the horizontal-axis has distance of F(L) from the origin.
Size of Loss
Probability
F(L) 1
37.51. (a) As an approximation, we connect by straight lines the points:

(0, 0), (0.022, 0.25), (0.113, 0.5), (0.350, 0.75), (0.575, 1), etc.
(b) 85.3% of workers earn at most 150% of the state average weekly wage.
The area corresponding to the percentage of wages earned by those making at most 150% of the
state average weekly wage is below the curve and to the left of the vertical line at 0.853, Area A.
(Area A is analogous to the dollars of loss from small losses.)
(c) The benefits are capped at the State Average Weekly Wage, corresponding to R = 1.
Therefore, the area corresponding to the average benefit paid is below the curve and also below a
horizontal line at 1, Area B.
(Area B is analogous to the percentage of loss dollars in the layer from 0 to 1.)
Comment: Since R is with respect to the State Average Weekly Wage, the area under the curve
should be 1.
37.52. (a) Subsequent to trend, we have three uniform distributions:

from 0 to 14,000; from 14,000 to 35,000; and from 35,000 to 140,000.
A Lee Diagram, with size in thousands:
Size
140
100
After Trend
35
25
14 Before Trend
10
Prob.
0.5 0.8 1
(b) Prior to trend the distribution function at 20 is: (1/3)(0.5) + (2/3)(0.8) = 0.7.
Prior to trend the distribution function at 70 is: (30/75)(0.8) + (45/75)(1) = 0.92.
Prior to trend, the layer from 20,000 to 70,000 is the area below the distribution and
between horizontal lines at heights 20,000 and 70,000.
This is Areas A plus B in the following Lee Diagram.
Size
100
(0.92, 70)
70
25
B
20
10 (0.7, 20)
Prob.
0.5 0.8 1
Area A is a trapezoid, with area: (70 - 25)(0.2 + 0.08)/2 = 6.3 (thousand).

Area B is a trapezoid, with area: (25 - 20)(0.3 + 0.2)/2 = 1.25 (thousand).
Prior to trend, the losses in the layer are: 6.3 + 1.25 = 7.55 (thousand).
After trend the distribution function at 20 is: (15/21)(0.5) + (6/21)(0.8) = 0.5857.

After trend the distribution function at 70 is: (2/3)(0.8) + (1/3)(1) = 0.8667.
After trend the Lee Diagram is as follows, with the excess layer being the sum of Areas A and B:
Size
140
(0.8667, 70)
70
35
B
20
14
(0.5857, 20)
Prob.
0.5 0.8 1
Area A is a trapezoid with area: (70 - 35)(0.2 + 0.1333)/2 = 5.833 (thousand).

Area B is a trapezoid with area: (35 - 20)(0.4143 + 0.2)/2 = 4.607 (thousand).
Thus after trend the excess layer is: 5.833 + 4.607 = 10.440 (thousand).
The implied trend for the layer $50,000 excess of $20,000 is: 10.440/7.55 - 1 = 38.3%.
Prior to trend:
E[X ∧ 70,000] = (0.5)(5K) + (0.3)(17.5K) + (0.2)(45/75)(47.5K) + (0.2)(30/75)(70K) = 18.05K.
E[X ∧ 20,000] = (0.5)(5K) + (0.3)(2/3)(15K) + (0.3)(1/3)(20K) + (0.2)(20K) = 11.5K.
E[X ∧ 70,000] - E[X ∧ 20,000] = 18.45K - 11.5K = 7550.
Subsequent to trend:
E[X ∧ 70,000] = (0.5)(7K) + (0.3)(24.5K) + (0.2)(1/3)(52.5K) + (0.2)(2/3)(70K) = 23,683.
E[X ∧ 20K] = (0.5)(7K) + (0.3)(6/21)(17K) + (0.3)(15/21)(20K) + (0.2)(20K) = 13,243.
E[X ∧ 70,000] - E[X ∧ 20,000] = 23,683 - 13,243 = 10,440.
37.53. The higher curve is prior to benefit reform, while the lower curve is after benefit reform.
(Unlike inflation, here the loss sizes went down.)
The losses prior to benefit reform are below the higher curve.
The losses after benefit reform are below the lower curve.
The reduction in losses due to benefit reform are between the curves.
Retained losses are below the horizontal line at S.
Insurerʼs losses are above the horizontal line at S.
1. E.
2. A.
3. J.
4. B.
5. Zero. The contribution from large claims to the insuredʼs retained losses are the same before and
after benefit reform, C.
6. C.
7. Zero. There is no contribution from small claims to the insurerʼs losses both before and after
benefit reform.
8. Zero.
9. K.
10. Zero.
11. L
12. M.
13. Prior to benefit reform, the losses eliminated by the retention S are: A + B + C + E + J.
However, S = D + A + B + C + E + J.
Therefore, prior to benefit reform, the losses eliminated = S - Area D.
Comment: After benefit reform, the losses eliminated by the retention S are:
A + B + C = S - Area D - Area E - Area J.
37.54. E. 1. True. (G + H) / G is the ratio of excess losses after inflation to excess losses prior to
inflation. Excess losses increase at a rate greater than the overall rate of inflation.
2. False. (B + C + D + E) / (D + C) is the ratio of retained losses after inflation to retained losses
prior to inflation. Retained losses increase at a rate less than the overall rate of inflation.
3. True. B + C + D + E is the retained losses after inflation.
Area C is what the retained losses would be prior to inflation at a retention of R/a.
One way to get the retained losses after inflation is to take the losses below the deflated retention
(limit) and multiply by the inflation factor, in other words to take: a C.
(This is the idea behind the formula for the average payment per loss.)
37.55. If S(x) = 1 - F(x), the survival function, then the limited expected value is:
t
E[X ∧ t] = ∫0 S(x) dx .
Therefore, S(t) is the derivative with respect to t of the limited expected value.
Thus, S(t) = 1 - t/b.
f(t) = -Sʼ(t) = 1/b.
Comment: A uniform distribution.
37.56. a. The large standard deviation case has more probability in both tails.
The LER is in each case: C/(C + D).
Large StandDev
Size
D
8000
Prob.
1
Small StandDev
Size
D
8000
Prob.
1
Since the distributions have the same mean, the denominators of the two LERs are equal.
For the larger standard deviation, Area C is smaller.
For the larger standard deviation, the LER is smaller.
b. Under uniform inflation, excess losses increase faster than limited losses; the LER declines.
In each case, the increase due to inflation in the losses eliminated is area G.
Large StandDev
Size
H
D
8000
G
Prob.
1
Small StandDev
Size
H
D
8000
G
Prob.
1
Since the distributions have the same mean, and the same inflation factor is applied to both, we can
concentrate on the changes in the losses eliminated.
For the smaller standard deviation, G is smaller, than for the larger standard deviation.
With a smaller standard deviation, the LER declines more due to inflation.
Comment: For the intuition on theoretical problems, it often helps to substitute simple numbers.
We are told that both lines have means of $10,000, but they have different standard deviations.
Let's make up some simple numbers that fit this scenario:
The line with the large standard deviation also has three claims: $5,000, $10,000, and $15,000.
The line with the small standard deviation has three claims: $9,000, $10,000, and $11,000.
With a deductible of $8,000, the LER for the large standard deviation is:
($5,000 + $8,000 + $8,000) / $30,000 = 70%.
With a deductible of $8,000, the LER for the small standard deviation is:
($8,000 + $8,000 + $8,000) / $30,000 = 80%.
With a larger standard deviation, the LER is less.
For Part (b) of the examination question, we continue with our numerical example. Suppose that
inflation is 10%.
For the large standard deviation with claims of $5,000, $10,000, $15,000, the losses inflate by
10%, and the amount eliminated increases. Instead of $5,000 + $8,000 + $8,000, the amount
eliminated becomes $5,500 + $8,000 + $8,000.
For the large standard deviation, the LER is now: 21,500/(5500 + 11,000 + 16,500) = 65.2%.
For the small standard deviation with claims of $9,000, $10,000, $11,000, the losses inflate by
10%, but the amount eliminated doesn't change; it is still $8,000 for each claim.
For the small standard deviation, the LER is now: 24,000/(9900 + 11,000 + 12,100) = 72.7%.
Both LERs decrease due to inflation.
For the large standard deviation, the LER went from 70% to 65.2%.
For the small standard deviation, the LER went from 80% to 72.7%.
With a smaller standard deviation, the LER is more affected by inflation.
In my graphs, the small standard deviation line is a Normal Distribution with µ = 10,000 and
σ = 2000, while the large standard deviation line has µ = 10,000 and σ = 3000.
I took the inflation factor α = 1.2.
37.57. The basic limits losses are below the horizontal line at height equal to the limit.
Prior to inflation the basic limits losses are below the original distribution (thick), Area B.
After inflation the basic limits losses are below the inflated distribution (thinner), Areas A + B.
The excess limits losses are above the horizontal line.
Prior to inflation the excess limits losses are below the original distribution (thick), Area D.
After inflation the excess losses are below the inflated distribution (thinner, Areas C + D.
Size of Loss
C D
Limit
A B
Prob.
0.2 0.4 0.6 0.8 1.0
Area C represents the increase in excess losses due to inflation, while Area A represents the
increase in limited losses due to inflation.
(A+B)/B = 1 + A/B = ratio of basic limit losses after inflation and before inflation < 1+ r.
(C+ D)/D = 1 + C/D = ratio of excess limit losses after inflation and before inflation > 1 + r.
(A + B + C + D) / (B + D) = 1 + (A + C) / (B + D)
= ratio of total losses after inflation and before inflation = 1 + r.
37.58. a. Some losses will hit the basic limit before they get the total increase from inflation, while
some were already at the basic limit so inflation wonʼt increase them.
For example, if the basic limit is $100,000, then a loss of $125,000 will still contribute $100,000 to
the basic limit after inflation. A loss of $98,000 will increase to $103,880 after inflation, and would
then contribute $100,000 to the basic limit, an increase of only 100/98 - 1 = 2.04% in that
contribution, less than 6%.
b. Let L be the basic limit. The solid curve refers to the losses prior to inflation, while the dashed
curve refers to the losses after inflation:
The expected excess losses prior to inflation are Area D.

The increase in the expected excess losses due to inflation is Area C.
The expected basic limit losses prior to inflation are Area B.
The increase in the expected basic limit losses due to inflation is Area A.
Area C is larger compared to Area D, than is Area A compared to Area B.
Therefore, the basic limit losses increase slower due to inflation than do the excess losses.
Since the unlimited ground up losses are the sum of the excess and basic limit losses,
the basic limit losses increase slower due to inflation than the unlimited ground up losses.
Looked at from a somewhat different point of view, those losses in Area M will have their
contributions to the basic limit increase at a rate less than 6%, while those losses in Area N, will not
have their contribution to the basic limit increases at all due to inflation.
c. All losses that were already in the excess layer receive the full increase from inflation. Some
losses that did not contribute to the excess layer will, after inflation, contribute something to the
excess layer.
Comment: In order to allow one to see what is going on, the Lee Diagrams in my solution are based
on much more than 6% inflation.
37.59. a. The expected losses for an excess of loss contract, covering losses in excess of retention
(R), subject to a maximum limit (L) is represented by Area A, the area above the horizontal line at
R, below the horizontal line at L, and below the curve for the distribution of sizes of loss.
b. The solid curve is the loss distribution prior to inflation, while the dashed curve is the loss
distribution after inflation.
The expected excess losses prior to inflation are Area D.

The increase in the expected excess losses due to inflation is Area C.
The expected retained losses prior to inflation are Area B.
The increase in the expected retained losses due to inflation is Area A.
Area C is larger compared to Area D, than is Area A compared to Area B.
Therefore, the excess losses increase faster due to inflation than do the retained losses.
Since the unlimited ground up losses are the sum of the excess and retained losses,
the excess losses increase faster due to inflation than the unlimited ground up losses.
Comment: Excess losses increase faster than the overall rate of inflation.
Limited losses increase slower than the overall rate of inflation.
37.60. B. Area Y + Area A + Area B = E[x] = $152,500.

Area A is a rectangle with height $100,000 and width .8, with area $80,000.
Area Y is given as $12,500.
Expected losses excess of $100,000 = Area B = $152,500 - $12,500 - $80,000 = $60,000.
Excess ratio at $100,000 = (Expected losses excess of $100,000)/E[X] =
$60,000 / $152,500 = 0.393.
Loss Size (x)
B
$100,000
Y A
0.20 1.00
F(x) = Cumulative Claim Frequency
Comment: Loss Elimination Ratio at $100,000 is: 1 - 0.393 = 0.607.

Not one of the usual size of loss distributions encountered in casualty actuarial work.
37.61. E. The shaded area represents the losses excess of K = E[(X-K)+] = E[X] - E[X ∧ K] =
K ∞ ∞ ∞
E[X] - { ∫0 x f(x) dx + KS(K)} = ∫K x f(x) dx - K S(K) = ∫K (x- K) f(x) dx = ∫K S(x) dx .
Since S(x) = 1 - F(x) and f(x) dx = dF(x), statements II and III are true.
Statement I is false; it would be true if the integrand were x-K rather than x.
37.62. B. The layer of loss from DED to LIM is the area under the curve and between the
horizontal lines at DED and LIM: Q + R.
37.63. B. Under a franchise deductible one does not pay any losses of size less than DED, but
pays the whole of any loss of size greater than DED. Due to the franchise deductible, one saves S,
the area corresponding to the losses of size less than or equal to DED. Due to the limit, one pays at
most LIM for any loss, so one saves P, the area representing the expected amount excess of LIM.
The expected savings are: S + P.
Comment: Similar to CAS3, 11/03, Q.23, except here there is a franchise deductible rather than an
ordinary deductible. The effect of the franchise deducible is to left truncate the data at DED, which
removes area S.
37.64. D. Label some of the areas in the Lee Diagram as follows:
D = (b-a)S(b).
E = a{F(b) - F(a)}
b
C + E = losses on losses of size between a and b = ∫a x dF(x).
Shaded Area = C + D = (C + E) + D - E =
b b
∫a x dF(x) + (b-a)S(b) - a{F(b) - F(a)} = ∫a x dF(x) - a + b + aF(a) - bF(b).

Alternately, the shaded area is the layer from a to b:
b a b
E[X ∧ b] - E[X ∧ a] = ∫0 x dF(x) + bS(b) - ∫0 x dF(x) - aS(a) = ∫a x dF(x) - a + b + aF(a) - bF(b).
37.65. E. Deflate D from 2006 to 2005, where it is equivalent to a deductible of: D/1.1.
Then the average expected loss in 2005 is the area above the deductible, D/1.1, and under the
curve: P+Q+R+S+T+U. In order to get the expected size of loss in 2006, reinflate back up to the
2006 level, by multiplying by 1.1: 1.1(P+Q+R+S+T+U).
37.66. Prior to inflation, the losses are uniform from 0 to 100.

On the following Lee Diagram, this is represented by the straight line from (0, 0) to (1, 100).
After inflation the losses are uniform from 0 to 110.
On the following Lee Diagram, this is represented by the straight line from (0, 0) to (1, 110).
Size
110
100
After
75
B
A
Prior
25
Prob.
0.25 0.75 1
Area A = losses in the layer prior to inflation = (height) (average width) =
(1 - 0.25) + (1 - 0.75)
(75 - 50) = 25.
2
Area A + Area B = losses in the layer after inflation = (height) (average width) =
(1 - 25 / 110) + (1 - 75 / 110)
(75 - 50) = 27.27.
2
⇒ Area B = 2.27. ⇒ Inflation for the layer = B / A = 2.27/25 = 9.1%.

Alternately, one can use limited expected values. Geometrically a limited expected value is the area
below the curve and also below a horizontal line at the limit.
75
E[X ∧ 75] = ∫0 (x / 100) dx + (0.25)(75) = 46.875.
25
E[X ∧ 25] = ∫0 (x / 100) dx + (0.75)(25) = 21.875.
75/1.1
E[X ∧ 75/1.1] = ∫0 (x / 100) dx + (1 - 0.75/1.1)(75/1.1) = 44.938.
25/1.1
E[X ∧ 25/1.1] = ∫0 (x / 100) dx + (1 - 0.25/1.1)(25/1.1) = 20.145.
The inflation factor for the layer is:

(1.1) (E[X ∧ 75 / 1.1] - E[X ∧ 25 / 1.1]) (1.1) (44.938 - 20.145)
= = 1.091.
E[X ∧ 75] - E[X ∧ 25] 46.875 - 21.875
The implied trend for the layer $50 excess of $25 is 9.1%.
Comment: Area A is a trapezoid. Area A plus Area B is another trapezoid.
A layer like this can inflate either slower or faster than the overall rate of inflation.
75
Losses in layer prior to inflation: ∫25 (x - 25) / 100 dx + (1 - 75/100) (75 - 25) = 25.
75
Losses in layer after to inflation: ∫25 (x - 25) / 110 dx + (1 - 75/110) (75 - 25) = 27.27.
The alternate solution takes the ratio of average payments per loss, after inflation and prior to
inflation.
37.67. a) Losses eliminated: (21)(100) + (50)(250) + (42 + 37 + 22)(500) = 65,100.

Total losses: (21)(100) + (50)(250) + (42)(500) + (37)(1000) + (22)(5000)= 182,600.
loss elimination ratio: 65,100 / 182,600 = 35.65%.
Alternately, the excess losses are: (37)(1000 - 500) + (22)(5000 - 500) = 117,500.
Loss elimination ratio: 1 - 117,500 / 182,600 = 35.65%.
b) The total losses increase by 10%: (1.1)(182,600) = 200,860.
Losses eliminated: (21)(110) + (50)(275) + (42 + 37 + 22)(500) = 66,560.
After inflation, loss elimination ratio: 66,560/200,860 = 33.14%.
Percentage change in LER is: 33.14% / 35.65% - 1 = -7.0%.
c) The total losses increase at the overall rate of inflation. The loss elimination ratio declines under
uniform inflation with a fixed limit (as occurred in this example.) Therefore the loss cost for a given
straight deductible policy increases by more than the ground-up severity trend.
The reason why the loss elimination ratio declines, is because some large claims have already had
the whole deductible amount eliminated prior to inflation and no more will be eliminated after inflation.
Alternately, under uniform inflation, some claims for which the insurer paid nothing prior to inflation it
will pay something for after inflation. In this example, a claim for 500 prior to inflation will pierce the
covered layer after inflation. Also for large claims the payments by the insurer will increase faster than
the overall rate of inflation. For example, for a $1000 claim prior to inflation the insurer paid $500, but
after inflation will pay 1100 - 500 = 600, an increase of 20%. Thus in combination the loss cost for a
given straight deductible policy will increase by more than the ground-up severity trend. Losses
excess of a fixed limit increase faster than the overall rate of inflation!
Alternately, look at the following Lee diagram:
Size
After
Inflation
C D
Ded.
A B
Before
Inflation
Prob.
1
C / D is the increase in the losses paid by the insurer due to inflation.
(C + A) / (D + B) is the increase in the total losses due to inflation. C / D > (C + A) / (D + B).
37.68. (a) Subsequent to trend, we have three uniform distributions:

from 0 to 300,000; from 300,000 to 600,000; and from 600,000 to 1,200,000.
Here is a Lee Diagram, with size in thousands:
Size
1200
1000
600
500 After Trend
300
250 Before Trend
Prob.
0.6 0.9 1
(b) Prior to trend, the layer from 500,000 to 1,000,000 is the area below the distribution and
between horizontal lines at heights 500,000 and 1,000,000.
This is Area A in the following Lee Diagram.
Size
1200
1000
600 A
500
300
250
Prob.
0.6 0.9 1
Area A is a triangle, with base 0.1 and height 500, and thus area: (0.1)(500)/2 = 25 (thousand).
After trend the distribution function at 500 is: (1/3)(0.6) + (2/3)(0.9) = 0.8.
After trend the distribution function at 1000 is: (1/3)(0.9) + (2/3)(1) = 0.96667.
After trend the Lee Diagram is as follows, with the excess layer being the sum of Areas B and C:
Size
1200
(0.96667, 1000)
1000
600
B
500
0.8, 500)
300
250
Prob.
0.6 0.8 0.9 1
Area B is a trapezoid with height 100 and widths 0.1 and 0.2, and thus area:
(100)(0.1 + 0.2)/2 = 15 (thousand).
Area C is a trapezoid with height 400 and widths 0.1 and 0.03333, and thus area:
(400)(0.1 + 0.03333)/2 = 26.667 (thousand).
Thus after trend the excess layer is: Area B + Area C = 15 + 26.667 = 41.667 (thousand).
The implied trend for the layer $500,000 excess of $500,000 is: 41.667/25 - 1 = 66.7%.
Alternately, both prior and posterior to trend, the first interval contributes nothing to the excess layer.
Prior to trend, the second interval contributes nothing to the excess layer.
After trend, the second interval is uniform from 300,000 to 600,000.
After trend, the second interval contributes to the excess layer:
600,000
(1/300,000) ∫ (x - 500,000) dx = (1/300,000) (100,0002 / 2) = 16,667.
500,000
Prior to trend, the third interval contributes to the excess layer:

1,000,000
(1/500,000) ∫ (x - 500,000) dx = (1/500,000) (500,0002 / 2) = 250,000.
500,000
After trend, the third interval is uniform from 600,000 to 1,200,000,

and contributes to the excess layer:
1,000,000
(1/600,000) ∫ (x - 500,000) dx + (200,000/600,000)(500,000) =
600,000
(1/600,000) (500,0002 / 2 - 100,0002 /2) + 166,667 = 366,667.

Thus prior to trend the expected losses in the excess layer are:
(60%)(0) + (30%)(0) + (10%)(250,000) = $25,000.
After to trend the expected losses in the excess layer are:
(60%)(0) + (30%)(16,667) + (10%)(366,667) = $41,667.
The implied trend for the layer $500,000 excess of $500,000 is: 41,667/25,000 - 1 = 66.7%.
Alternately, prior to trend: E[X ∧ 1 million] = (0.6)(125K) + (0.3)(375K) + (0.1)(750K) = 262.5K.
E[X ∧ 500K] = (0.6)(125K) + (0.3)(375K) + (0.1)(500K) = 237.5K.
E[X ∧ 1 million] - E[X ∧ 500K] = 262.5K - 237.5K = 25,000.
Subsequent to trend, we have three uniform distributions:
from 0 to 300,000; from 300,000 to 600,000; and from 600,000 to 1,200,000.
Break the last interval into 600K to 1000K and 1000K to 1200K.
E[X ∧ 1 million] = (0.6)(150K) + (0.3)(450K) + (0.1)(2/3)(800K) + (0.1)(1/3)(1000K) = 311,667.
Break the middle interval into 300K to 500K and 500K to 600K.
E[X ∧ 500K] = (0.6)(150K) + (0.3)(2/3)(400K) + (0.3)(1/3)(500K) + (0.1)(500K) = 270,000.
E[X ∧ 1 million] - E[X ∧ 500K] = 311,667 - 270,000 = 41,667.
The implied trend for the layer $500,000 excess of $500,000 is: 41,667/25,000 - 1 = 66.7%.
Comment: In part (a) it would have been helpful if the question specified whether or not they
wanted a Lee Diagram.
2016-C-2, Loss Distributions, §38 N-Point Mixtures HCM 10/21/15, Page 804
Section 38, N-Point Mixtures of Models
Mixing models is a technique that provides a greater variety of loss distributions.

Such mixed distributions are referred to by Loss Models as n-point or two-point mixtures.309
2-point mixtures:
For example, let A be a Pareto Distribution with parameters α = 2.5 and θ = 10,
while B is a LogNormal Distribution with parameters µ = 0.5 and σ = 0.8.
Let p = 0.10, the weight for the Pareto Distribution.
If we let G(x) = pA(x) + (1-p)B(x) = 0.1 A(x) + 0.9 B(x),

then G is a Distribution Function since G(0) = 0 and G(∞) = 1.
G is a mixed “Pareto-LogNormal” Distribution.
Hereʼs the individual distribution functions, as well as that of this mixed distribution:
Pareto LogNormal Mixed Pareto-LogNormal
Limit Distribution Distribution Distribution
Function Function Function
0.5 0.1148 0.0679 0.0726
1 0.2120 0.2660 0.2606
2.5 0.4276 0.6986 0.6715
5 0.6371 0.9172 0.8892
10 0.8232 0.9879 0.9714
25 0.9564 0.9997 0.9954
50 0.9887 1.0000 0.9989
100 0.9975 1.0000 0.9998
For example, 0.9954 = (0.1)(0.9564) + (0.09)(0.9997). For the mixed distribution, the chance of a
claim greater than 25 is: 0.46% = (0.1)(4.36%) + (0.9)(0.03%). The Distribution Function and the
Survival Function of the mixture are mixtures of the individual Distribution and Survival Functions.
Also we see from the above table, that for example the 89th percentile of this mixed Pareto-
LogNormal Distribution is a little more than 5, since F(5) = 0.8892.
In general, one can take a weighted average of any two Distribution Functions:
G(x) = p A(x) + (1-p)B(x). Such a Distribution Function H, called a 2-point mixture of models,
will generally have properties that are a mixture of those of A and B.
309
Loss Models, Section 4.2.3.
One can create a very large number of possible combinations by choosing various types of
distributions for A and B. The mixture will have a number of parameters equal to the sum of the
number of parameters of the two distributions A and B, plus one more for p, the weighting
parameter. The mixed Pareto-LogNormal Distribution discussed above has 2 + 2 + 1 = 5
parameters: α, θ, µ, σ, and p.
Densities:
The density of the mixture is the derivative of its Distribution Function.

Therefore, the density of the mixture is the mixture of the densities.
Exercise: What is the density at 5 of this mixed Pareto-LogNormal Distribution?

α θα
[Solution: The density of the Pareto is: f(x) = = (2.5)(102.5)(10 + x)-3.5.
(θ + x)α + 1
f(5) = 790.57/153.5 = 0.06048. The density of the LogNormal is:
( ln(x)
− µ)2 ( ln(x) − 0.5)2
f(x) =
[
exp -
2σ2 ] =
[
exp -
2 (0.82) ] .
x σ 2π x 0.8 2π
f(5) = exp[-0.5 ({ln(5) - 0.5}/0.8)2 ] / {(5)(0.8) 2 π } = 0.03813.

The density for the mixed distribution is: (0.1)(0.06048) + (0.9)(0.03813) = 0.0404.]
Moments:
Moments of the mixed distribution are the weighted average of the moments of the individual
distributions: EG[Xn ] = p EA[Xn ] + (1-p) EB[Xn ].
E G[X] = p EA [X] + (1-p) EB [X].
For example, the mean of the above mixed Pareto-LogNormal Distribution is:
(0.1)( mean of the Pareto) + (0.9)( the mean of the LogNormal).
Exercise: What are the first and second moments of the Pareto Distribution with parameters
α = 2.5 and θ = 10?
[Solution: For the Pareto, the mean is: θ / (α-1) = 10/1.5 = 6.667,
2 θ2 200
while the second moment is: = = 266.67.]
(α − 1) (α − 2) (1.5) (0.5)
Exercise: What are the first and second moments of the LogNormal Distribution with parameters
µ = 0.5 and σ = 0.8?
[Solution: For the LogNormal, the mean is: exp[µ + 0.5 σ2] = e0.82 = 2.270,
while the second moment is: exp[2µ + 2σ2] = e2.28 = 9.777.]
Thus the mean of the mixed Pareto-LogNormal is: (0.1)(6.667) + (0.9)( 2.27) = 2.71.
We also note that of the total loss dollars represented by the mixed distribution,
(0.1)(6.667) = 0.667 come from the underlying Pareto, while (0.9)( 2.27) = 2.04 come from the
underlying LogNormal. Thus 0.667 / 2.71 = 25% of the total losses come from the underlying
Pareto, while the remaining 2.04 / 2.71 = 75% come from the underlying LogNormal.
In general, p EA[X] / {p EA[X] + (1-p) EB[X]} represents the portion of the total losses for the mixed
distribution that come from the first of the individual distributions.
For a 2 point mixture of A and B with weights p and 1-p:

E[X] = E[X | A]Prob[A] + E[X | B]Prob[B] = p(mean of A) + (1-p)(mean B).
E[X2 ] = E[X2 | A]Prob[A] + E[X2 | B]Prob[B] = p(2nd moment of A) + (1-p)(2nd moment of B).
E G[X2 ] = p EA [X2 ] + (1-p) EB [X2 ].
The second moment of this mixed distribution is the weighted average of the second moments of
the two individual distributions: (0.1)(266.67) + (0.9) (9.777) = 35.47.
The moment of the mixture is the mixture of the moments.
Thus the variance of this mixed distribution is: 35.47 - 2.712 = 28.13.
First one gets the moments of the mixture, and then one gets the variance of the mixture.
One does not weight together the individual variances.
One can now get the coefficient of variation of this mixture, from the mean and variance of this
mixture. The Coefficient of Variation of this mixture is: 28.16 / 2.71 = 1.96. The C.V. of the mixed
distribution is between the C.V. of the Pareto at 2.24 and that of the LogNormal at 0.95. The mixed
distribution has a heavier tail than the LogNormal and a lighter tail than the Pareto.
Limited Moments:
Limited Moments of the mixed distribution are the weighted average of the limited moments of the
individual distributions: EG[(X ∧ x)n ] = p EA[(X ∧ x)n ] + (1-p) EB[(X ∧ x)n ].
E G[X ∧ x] = p EA [X ∧ x] + (1-p) EB [X ∧ x].
For example, the limited expected value of the mixed Pareto-LogNormal Distribution is:
(0.1)(LEV of the Pareto) + (0.9)(LEV of the LogNormal).
Exercise: What is E[X ∧ 4] for the Pareto Distribution with parameters α = 2.5 and θ =10?
θ ⎧ ⎛ θ ⎞ α -1⎫
[Solution: For the Pareto, E[X ∧ x] = ⎨1 - ⎬.
α−1 ⎩ ⎝ θ+ x ⎠ ⎭
E[X ∧ 4] = (10/1.5) {1 - (10/14)1.5} = 2.642.]
Exercise: Compute E[X ∧ 4] for the LogNormal with parameters µ = 0.5 and σ = 0.8.
⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤
[Solution: E[X ∧ x] = exp(µ + σ2/2) Φ⎢ ⎥ + x {1 - Φ⎢ ⎥⎦ }.
⎣ σ ⎦ ⎣ σ
E[X ∧ 4] = e0.82 Φ[.3079] + 4 {1 - Φ[1.1079]} = (2.2705)(0.6209) + (4){1 - 0.8660} = 1.946.]
Thus for the mixed Pareto-LogNormal, E[X ∧ 4] = (0.1)(2.642) + (0.9)(1.946) = 2.016.
Quantities that are Mixtures:
Thus there are a number of quantities which are mixtures:
The Distribution Function of the mixture is the mixture of the Distribution Functions.
The Survival Function of the mixture is the mixture of the Survival Functions.
The density of the mixture is the mixture of the densities.
The moments of the mixture are the mixture of the moments.
The limited moments of the mixture are the mixture of the limited moments.
E[(X -d)+] = E[X] - E[X ∧ d] is also a mixture.

As discussed already the variance of the mixture is not the mixture of the variances.
Rather, first one gets the moments of the mixture, and then one gets the variance.
The coefficient of variation of the mixture is not the mixture of the coefficients of variation.
Rather, first one computes the moments of the mixture.
Similarly, the skewness of the mixture is not the mixture of the skewnesses.
Rather, first one computes the moments of the mixture. Then one gets the third central moment of
the mixture and divides it by the standard deviation of the mixture cubed.
A number of other quantities of interest, such as the hazard rate, mean residual life, Loss Elimination
Ratio, Excess Ratio, and percent of losses in a layer, have to be computed from their components
for the mixture, as will be discussed.
Hazard Rates:
h(x) = f(x)/S(x). Therefore, in order to get the hazard rate for a mixture, one computes the density
and the survival function for that mixture.
As computed previously, for this mixture f(5) = (0.1)(0.06048) + (0.9)(0.03813) = 0.0404.

As computed previously, for this mixture F(5) = (0.1)(0.6371) + (0.9)(0.9172) = 0.8892.
S(5) = 1 - 0.8892 = 0.1108.
For this mixture, h(5) = 0.0404/0.1108 = 0.3646.
For the Pareto, h(5) = 0.06048 / (1 - 0.6371) = 0.1667.

For the LogNormal, h(5) = 0.03813 / (1 - 0.9172) = 0.4605.
(0.1)(0.1667) + (0.9)(0.4605) = 0.4311 ≠ 0.3646.
The hazard rate of the mixture is not equal to the mixture of the hazard rates.
Excess Ratios:
Here is how one calculates the Excess Ratio for this mixed distribution at 10, RG(10).
The numerator is the loss dollars excess of 10. For the Pareto this is the excess ratio of the Pareto at
10 times the mean for the Pareto: RA(10)EA[X]; for the LogNormal it is: RB(10)EB[X].
Thus the numerator of RG(10) is: pEA[X]RA(10) + (1 - p)EB[X]RB(10).
Exercise: What is the Excess Ratio at 10 of the Pareto Distribution with α = 2.5 and θ = 10?
⎛ θ ⎞ α− 1
[Solution: The excess ratio for the Pareto is: R(x) = ⎜ ⎟ .
⎝θ + x ⎠
RA(10) = {10/(10 + 10)}2.5-1 = 0.3536.]
Exercise: What is the Excess Ratio at 10 of the LogNormal Distribution with parameters µ = 0.5 and
σ = 0.8?
[Solution: The excess ratio for the LogNormal is:
⎡ ln(x) − µ ⎤
1 − Φ⎢ ⎥⎦
⎡ ln(x) − µ − σ2 ⎤ ⎣ σ
R(x) = 1 - Φ⎢ ⎥⎦ - x .
⎣ σ exp[µ + σ2 / 2]
RB(10) = 1 - Φ[(ln10 - 0.5 - 0.82 )/0.8] - x {1 - Φ[(ln10 - 0.5)/0.8]} / exp(0.5 + 0.82 /2) = 0.0197. ]
Thus for the mixed distribution the excess ratio at 10 is:

RG(10) = {pEA[X]RA(10) + (1 - p)EB[X]RB(10)} / EG[X] =
{pEA[X]RA(10) + (1 - p)EB[X]RB(10)} / {p EA[X] + (1 - p) EB[X]} =
{(0.1)(6.667)(0.3536) + (0.9)(2.27)(0.0198)} / 2.71= 10.2%
At each limit, the Excess Ratio for the mixed distribution is a weighted average of the individual
excess ratios, with weights: pEA[X] = (0.1)(6.667), and (1-p)EB[X] = (0.9)(2.27).
Here are the Excess Ratios computed at different limits:
Pareto LogNormal Mixed Pareto-LogNormal
Limit Excess Ratio Excess Ratio Excess Ratio
1 86.68% 59.96% 66.53%
2.5 71.55% 27.83% 38.59%
5 54.43% 9.64% 20.66%
10 35.36% 1.97% 10.18%
25 15.27% 0.10% 3.83%
50 6.80% 0.00% 1.68%
We note that the excess ratio for the LogNormal declines much more quickly than that of the Pareto.
The excess ratio for the mixed distribution is somewhere in between.
In general, the Excess Ratio for the mixed distribution is a weighted average of the individual excess
ratios, with weights pEA[X] and (1 - p)EB[X]:
p EA[X] RA(x) + (1 - p) E B[X] RB(x) p EA[X] RA(x) + (1 - p) E B[X] RB(x)
RG(x) = = .
EG[X] p EA[X] + (1 - p) EB[X]
Layers of Losses:
One can compute the percent of total dollars in a layer by taking the difference of the excess ratios.
For example, for the Pareto the layer from 5 to 10 represents: 54.43% - 35.36% = 19.07% of the
Paretoʼs total dollars of loss. For the LogNormal the layer from 5 to 10 represents only:
9.64% - 1.97% = 7.67% of its total dollars of loss. Using the excess ratios computed for the mixed
distribution, for the mixed distribution the layer from 5 to 10 represents: 20.66% - 10.18% =
10.48% of its total dollars of loss.
One can divide the losses for a layer for the mixed distribution into those from the Pareto and those
from the LogNormal.
The contribution from the Pareto to the mixed distribution for the layer from 5 to 10 is:
(0.1)(6.667)(54.43% - 35.36%) = 0.1271.
The contribution from the LogNormal for the mixed distribution to the layer from 5 to 10 is:
(0.9)(2.270)(9.64% - 1.97%) = 0.1566.
The mixed distribution has losses in the layer from 5 to 10 of:
(2.71)(20.66% - 10.18%) = 0.284 = 0.1271 + 0.1566.
Thus for the layer from 5 to 10, about 45% of the losses for the mixed distribution come from the
Pareto while the remaining 55% come from the LogNormal.310
One can perform similar calculations for other layers:

Losses in Layer Losses in Layer Portion of Losses in Layer
Bottom Top Contributed Contributed Contributed Contributed
of Layer of Layer by Pareto by LogNormal by Pareto by LogNormal
0 1 0.0888 0.8180 9.8% 90.2%
1 2.5 0.1008 0.6564 13.3% 86.7%
2.5 5 0.1141 0.3716 23.5% 76.5%
5 10 0.1272 0.1567 44.8% 55.2%
10 25 0.1339 0.0382 77.8% 22.2%
25 50 0.0565 0.0020 96.7% 3.3%
50 ∞ 0.0454 0.0001 99.8% 0.2%
We note that for this mixed distribution the losses for lower layers come mainly from the
LogNormal Distribution, while those for the higher layers come mainly from the Pareto
Distribution. In that sense the LogNormal is modeling the behavior of the smaller claims, while
the Pareto is modeling the behavior of the larger claims. This is typical of a mixture of models;
a lighter-tailed distribution mostly models the behavior of the smaller losses, while a
heavier-tailed distribution mostly models the behavior of the larger losses.
310
0.127 / 0.284 = 45%, and 0.157 / 0.284 = 55%.
Fitting to Data:
One can fit size of loss data to mixed models using the same techniques as for other size of loss
distributions. However, due to the generally larger number of parameters there are often practical
difficulties. For example, if one attempted to use maximum likelihood to fit a mixed
Pareto-LogNormal Distribution to data, one would be fitting 5 parameters. Thus numerical algorithms
would be searching through 5 dimensional space and might take a long time. Thus it is important in
practical applications to have a good staring point.
Sometimes one fits each of the individual distributions to the given data and uses these results
to help pick a starting value. Sometimes, keeping one or more of the parameters fixed while
fitting the others will help to determine a good starting place. Often one can use a prior yearʼs
result or a result from a similar data set to help choose a starting value. Sometimes one just
has to try a few different starting values before finding one that seems to work.
Sometimes using numerical methods, one has problems getting p, the weight parameter, to
stay between 0 and 1 as desired.
One could in such cases reparameterize by letting p = eb / (eb +1).
n-Point Mixtures:
In general, one can weight together any number of distributions, rather than just two.311
While I have illustrated two-point mixtures, there is no reason why one could not use
three-point, four-point, etc. mixtures. The quantities of interest can be calculated in a manner parallel
to that used here for the two-point distributions. Also besides those situations where all the
distributions of are different types, some or all of the individual distributions can be of the same type.
For example, the Distribution:

(0.2)(1 - e-x/10) + (0.5)(1 - e-x/25) + (0.3)(1 - e-x/100) = 1 - 0.2e-x/10 - 0.5e-x/25 - 0.3e-x/100,
is a three-point mixture of Exponential Distributions, with means of 10, 25, and 100 respectively.
An n-point mixture of Exponential Distributions would have 2n -1 parameters,

n means of Exponential Distributions and n-1 weighting parameters.
For example a three-point mixture of Exponential Distributions has 3 means and 2 weights for a total
of 5 parameters.
311
Of course one must be careful of introducing too many parameters. The “principal of parsimony” applies; one
should use the minimum number of parameters necessary. One can always improve the fit to a given data set by
adding parameters, but the resulting model is often less useful.
Variable Mixtures:
Variable-Component Mixture Distribution ⇔
weighted average of unknown # of distributions ⇔ F(x) = Σ wi Fi(x), with Σ wi =1.312
Variable Mixture ⇔
weighted average of unknown # of distributions of the same family but differing parameters ⇔ F(x)
= Σ wi Fi(x), with each Fi of the same family and Σ wi =1.
For example a variable mixture of Exponentials would have:
F(x) = Σ wi (1 - exp[-x/θi]) = 1 - Σ wi exp[-x/θi], with Σ wi =1.
The key difference from a n-point mixture of Exponentials is that in the variable mixture, the number
of Exponentials weighted together is unknown and is a parameter to be determined.313
Variable-Component Mixture Distribution, and their special cases Variable Mixtures, are called
semi-parametric, since they share some of the properties of both parametric and nonparametric
distributions.314
312
313
For example, one can fit a variable mixture of Exponentials via maximum likelihood. See “Modeling Losses with
the Mixed Exponential Distribution”, by Clive L. Keatinge, PCAS 1999.
314
The Exponential and LogNormal are examples of parametric distributions. The empirical distribution function is an
example of a nonparametric distribution.
Summary:
The mixture of models can be useful when more flexibility is desired to fit size of loss data.315 It has
been useful for a number of practical applications.316 One can perform either n-point mixtures or
continuous mixing.317 One can use the same techniques to mix together frequency models.
Sometimes the mixture of models is just a mathematical device with no physical significance.
However, sometimes the mixture directly models a feature of the real world.
For example, sometimes a mixture is useful when a population is divided into two
sub-populations such as smoker and nonsmoker.
Mixtures can also be useful when the data results from different perils.
For example, for Homeowners Insurance it might be useful to fit Theft, Wind, Fire, and Liability
losses each to a separate size of loss distribution. This is an example, where one might weight
together 4 different distributions.
Mixtures will come up again in Buhlmann Credibility.318
315
See for example “Methods of Fitting Distributions to Insurance Loss Data”, by Charles C. Hewitt and
Benjamin Lefkowitz, PCAS 1979.
316
For example, Insurance Services Office used mixed Pareto-Pareto models to calculate Increased Limits Factors.
ISO has switched to an n-point mixture of Exponential Distributions. See “Modeling Losses with the Mixed
Exponential Distribution”, PCAS 1999, by Clive Keatinge. The Massachusetts Workersʼ Compensation Rating
Bureau has used mixed Pareto-Exponential models to calculate Excess Loss Factors.
See “Workersʼ Compensation Excess Ratios, an Alternative Method,” PCAS 1998, by Howard C. Mahler.
317
To be discussed in the next section.
318
See “Mahlerʼs Guide to Buhlmann Credibility and Bayesian Analysis.”
Problems:

• V follows a Pareto Distribution, with parameters α = 4, θ = 10.
• W follows an Exponential Distribution: F(w) = 1 - e- w /0.8.
• Y is a two-point mixture of V and W, with 5% weight to the Pareto Distribution

and 95% weight to the Exponential Distribution.
38.1 (1 point) For V, what is the chance of a claim greater than 2?

A. less than 49%
E. at least 52%
38.2 (1 point) What is the mean of V?

A. less than 3.1
E. at least 3.4
38.3 (1 point) What is the second moment of V?

A. less than 34
E. at least 37
38.4 (1 point) What is the coefficient of variation of V?

A. less than 1.3
E. at least 1.6
38.5 (1 point) What is the third moment of V?

A. less than 1000
E. at least 1003
38.6 (2 points) What is the skewness of V?

A. less than 7.0
E. at least 7.3
38.7 (2 points) What is the excess ratio at 5 of V?

A. less than 28%
E. at least 31%
38.8 (1 point) For W, what is the chance of a claim greater than 2?

A. less than 8%
E. at least 11%
38.9 (1 point) What is the mean of W?

A. less than 0.5
E. at least 0.8
38.10 (1 point) What is the second moment of W?

A. less than 1.3
E. at least 1.6
38.11 (1 point) What is the coefficient of variation of W?

A. less than 0.7
E. at least 1.0
38.12 (1 point) What is the third moment of W?

A. less than 2.7
E. at least 3.0
38.13 (1 point) What is the skewness of W?

A. less than 1.8
E. at least 2.1
38.14 (1 point) What is the excess ratio at 5 of W?

A. less than 0.16%
E. at least 0.22%
38.15 (1 point) For Y, what is the chance of a claim greater than 2?

A. less than 8%
E. at least 11%
38.16 (1 point) What is the mean of Y?

A. less than 0.9
E. at least 1.2
38.17 (1 point) What is the second moment of Y?

A. less than 2.8
E. at least 3.1
38.18 (1 point) What is the coefficient of variation of Y?

A. less than 1.4
E. at least 1.7
38.19 (1 point) What is the third moment of Y?

A. 50 B. 51 C. 52 D. 53 E. 54
38.20 (2 points) What is the skewness of Y?

A. less than 14
E. at least 17
38.21 (2 points) What is the excess ratio at 5 of Y?

A. less than 4%
E. at least 7%
38.22 (4 points) What is the mean excess loss at 2 of Y?

A. 0.8 B. 1.0 C. 1.2 D. 1.4 E. 1.6

The random variable X has the density function:
f(x) = 0.4 exp(-x/1128)/1128 + 0.6 exp(-x/5915)/5915, 0 < x < ∞.
38.23 (2 points) Determine the variance of X.

38.24 (2 points) Determine E[X ∧ 2000].

(A) 1250 (B) 1300 (C) 1350 (D) 1400 (E) 1450
38.25 (1 point) Determine E[(X - 2000)+].

(A) 2600 (B) 2650 (C) 2700 (D) 2750 (E) 2800

• The random variable X has a distribution that is a mixture of a Pareto distribution with parameters
θ = 1000 and α = 1, and another Pareto distribution, but with parameters θ = 100 and α = 1.
• The first Pareto is given a weight of 0.3 and the second Pareto a weight of 0.7.
Determine the 20th percentile of X.
A. Less than 15
E. At least 45

Medical losses are Poisson with λ = 2.
The size of medical losses are uniform from 0 to 2000.
Dental losses are Poisson with λ = 1.
The size of dental losses is uniform from 0 to 500.
A policy, with an ordinary deductible of 200, covers both medical and dental losses.
38.27 (2 points) Determine the average payment per loss for this policy.
(A) 570 (B) 580 (C) 590 (D) 600 (E) 610
38.28 (1 point) Determine the average payment per payment for this policy.
(A) 700 (B) 710 (C) 720 (D) 730 (E) 740

F(x) = (0.2)(1 - e-x/10) + (0.5)(1 - e-x/25) + (0.3)(1 - e-x/100).
38.29 (1 point) What is the probability that x is more than 15?

A. 56% B. 58% C. 60% D. 62% E. 64%

A. Less than 20
E. At least 50

A. Less than 19
E. At least 22

(A) 0 (B) 10 (C) 25 (D) 100 (E) None of A, B, C, D
38.33 (1 point) What is the second moment?

A. Less than 3000
E. At least 6000
38.34 (1 point) What is the coefficient of variation?

(A) 0.7 (B) 0.9 (C) 1.1 (D) 1.3 (E) 1.5
38.35 (2 points) What is the third moment?

A. Less than 1.0 million
B. At least 1.0 million, but less than 1.3 million
C. At least 1.3 million, but less than 1.6 million
D. At least 1.6 million, but less than 1.9 million
E. At least 1.9 million
38.36 (2 points) What is the skewness?

A. Less than 2.0
E. At least 3.5
38.37 (2 points) What is the hazard rate at 50, h(50)?

A. Less than 0.020
E. At least 0.035
38.38 (2 points) What is the Limited Expected Value at 20, E[X ∧ 20]?
A. Less than 11
E. At least 17
38.39 (2 points) What is the Loss Elimination Ratio at 15?

A. Less than 26%
E. At least 35%
38.40 (2 points) What is the Excess Ratio at 75?

A. Less than 26%
E. At least 35%
38.41 (4 points) What is the Limited Second Moment at 30, E[(X ∧ 30)2 ]?
j=n-1
Γ(n; x) = 1 - ∑ xj e- x / j!, for n a positive integer.
j=0
A. Less than 430

E. At least 470
38.42 (3 points) What is the Mean Excess Loss at 50?

A. Less than 78
E. At least 90
38.43 (3 points) With the aid of a computer, graph a two-point mixture of a Gamma Distribution
with α = 4 and θ = 3 and a Gamma Distribution with α = 2 and θ = 10, with 60% weight to the first
distribution and 40% weight to the second distribution.
38.44 (2 points) 40% of lives follow DeMoivreʼs Law with ω = 80.

The other 60% of lives follow DeMoivreʼs law with ω = 100.
A life is picked at random.
If the life has survives to at least age 70, what is its expected age at death?
A. 80 B. 81 C. 82 D. 83 E. 84
38.45 (2 points) You are the consulting actuary to a group of venture capitalists financing a search for
pirate gold. Itʼs a risky undertaking: with probability 0.80, no treasure will be found, and thus the
outcome is 0. The rewards are high: with probability 0.20 treasure will be found.
The outcome, if treasure is found, is uniformly distributed on [1000, 5000].
Calculate the variance of the distribution of outcomes.
(A) 1.3 million (B) 1.4 million (C) 1.5 million (D) 1.6 million (E) 1.7 million
38.46 (3 points) With the aid of a computer, graph a two-point mixture of a Gamma Distribution with
α = 4 and θ = 3 and a Gamma Distribution with α = 6 and θ = 10, with 30% weight to the first
distribution and 70% weight to the second distribution.
38.47 ( 3 points) The Maytag repairman repairs washers and dryers.
The time for him to repair a washer is exponentially distributed with mean 100 minutes.
The time for him to repair a dryer is exponentially distributed with mean 200 minutes.
The probability that a repair will be of a washer is 70% and that it will be of a dryer is 30%.
If a repair takes him longer than 50 minutes, what is its expected length?
A. 180 B. 185 C. 190 D. 195 E. 200
38.48 (3 points) In 2002 losses follow the following density:

f(x) = 0.7 exp(-x/1000)/1000 + 0.3 exp(-x/5000)/5000, 0 < x < ∞.
Losses uniformly increase by 8% between 2002 and 2004.
In 2004 a policy has a 3000 maximum covered loss.
In 2004 what is the average payment per loss?
A. 1320 B. 1340 C. 1360 D. 1380 E. 1400
38.49 (4 points) You are given the following information for Homeowners Insurance:
• 10% of losses are due to Wind.
• 30% of losses are due to Fire.
• 20% of losses are due to Liability.
• 40% of losses are due to All Other Perils.
• Losses due to Wind follow a LogNormal distribution with µ = 10 and σ = 0.7.
• Losses due to Fire follow a Gamma distribution with α = 2 and θ = 10,000.
• Losses due to Liability follow a Pareto distribution with α = 5 and θ = 200,000.
• Losses due to All Other Perils follow an Exponential distribution with θ = 5,000.
Determine the standard deviation of the size of loss for Homeowners Insurance.
A. 15,000 B. 20,000 C. 25,000 D. 30,000 E. 35,000

Risk Type Number of Risks Size of Loss Distribution
I 600 Single Parameter Pareto, θ = 10, α = 4
II 400 Single Parameter Pareto, θ = 10, α = 3
38.50 (2 points) You independently simulate a single loss for each risk.
Let S be the sum of these 1000 amounts. You repeat this process many times.
What is the variance of S?
A. Less than 42,000
E. At least 45,000
38.51 (2 points) A risk is selected at random from one of the 1000 risks.
You simulate a single loss for this risk.
This risk is replaced, and a new risk is selected at random from one of the 1000 risks.
You simulate a single loss for this new risk.
You repeat this process many times, each time picking a new risk at random.
What is the variance of the outcomes?
A. 42 B. 43 C. 44 D. 45 E. 46
38.52 (2 points) A risk is selected at random from one of the 1000 risks.
You simulate a single loss for this risk. You then simulate another loss for this same risk.
You repeat this process many times. What is the expected variance of the outcomes?
A. 42 B. 43 C. 44 D. 45 E. 46
38.53 (3 points) Y is a two point mixture.

With probability 0.3, Y is exponentially distributed with mean 0.5.
With probability 0.7, Y is uniformly distributed on [-3, 3].
Determine E[Y], Var[Y], and Prob[Y ≤ 1].

• Bob is an overworked underwriter.
• Applications arrive at his desk.
• Each application has a 1/3 chance of being a “bad” risk and a 2/3 chance of being a “good” risk.
• Since Bob is overworked, each time he gets an application he flips a fair coin.
• If it comes up heads, he accepts the application without looking at it.
• If the coin comes up tails, he accepts the application if and only if it is a “good” risk.
• The expected profit on a “good” risk is 300 with variance 10,000.
• The expected profit on a “bad” risk is -100 with variance 90,000.
38.54 (2 points) Calculate the variance of the profit per applicant.

A. 50,000 B. 51,000 C. 52,000 D. 53,000 E. 54,000
38.55 (2 points) Calculate the variance of the profit per applicant that Bob accepts.
A. Less than 50,000
E. At least 53,000
38.56 (4 points) On an exam, the grades of Good students are distributed via a Beta Distribution
with a = 6, b = 2 and θ = 100.
On this exam, the grades of Bad students are distributed via a Beta Distribution with
a = 3, b = 2 and θ = 100.
3/4 of students are good, while 1/4 of students are bad.
A grade of 65 or more passes.
What is the expected grade of a student who fails this exam?
A. Less than 50
E. At least 53

X follows a two-point mixture of LogNormal Distributions.
The first LogNormal is given weight 65%, and has parameters µ = 8 and σ = 0.5.
The second LogNormal is given weight 35%, and has parameters µ = 9 and σ = 0.3.
38.57 (1 point) Determine E[X].

A. Less than 4400
E. At least 5000
38.58 (1 point) Determine E[X2 ].

A. Less than 35 million
B. At least 35 million, but less than 40 million
C. At least 40 million, but less than 45 million
D. At least 45 million, but less than 50 million
E. At least 50 million
38.59 (1 point) Determine 1 / E[1/X].

A. Less than 3200
E. At least 3800
38.60 (3 points) Determine E[X ∧ 6000].

A. 3400 B. 3600 C. 3800 D. 4000 E. 4200
38.61 (3 points) Determine Var[X ∧ 6000].


R is the annual return on a stock.
At random, half of the time R is a random draw from a Normal Distribution with µ = 8% and σ = 20%.
The other half of the time, R is a random draw from a Normal Distribution with µ = 11% and σ = 30%.
Hint: The third moment of a Normal Distribution is: µ3 + 3µσ2.
The fourth moment of a Normal Distribution is: µ4 + 6µ2σ2 + 3σ4.
38.62 (1 point) What is the mean of R?

A. 8.5% B. 9% C. 9.5% D. 10.0% E. 10.5%
38.63 (2 points) What is the standard deviation of R?

A. Less than 25%
E. 31% or more
38.64 (3 points) What is the skewness of R?

A. Less than -0.10
B. At least -0.10, but less than -0.05
C. At least -0.05, but less than 0.05
E. 0.10 or more
38.65 (4 points) What is the kurtosis of R?

A. 3.0 B. 3.2 C. 3.4 D. 3.6 E. 3.8
38.66 (5 points) For a mixture of two distributions with the same coefficient of variation, compare the
coefficient of variation of the mixture with that of the components.
38.67 (12 points) The size of loss distribution is given by a mixture of Exponentials:
Mean Weight
2763 0.824796
24,548 0.159065
275,654 0.014444
1,917,469 0.001624
10,000,000 0.000071
The excess ratio is one minus the loss elimination ratio.

Determine the excess ratios at: 100,000, 1 million, and 10 million.

• X is a mixture of three uniform distributions.
• 50% weight to a uniform distribution from 1 to 4.
38.68 (1 point) What is the mean of this mixture?

A. 3.0 B. 3.2 C. 3.4 D. 3.6 E. 3.8
38.69 (2 points) What is the variance of this mixture?

A. 1.4 B. 1.6 C. 1.8 D. 2.0 E. 2.2
38.70 (2 points) What is the median of this mixture?

A. 3.0 B. 3.2 C. 3.4 D. 3.6 E. 3.8
38.71 (2 points) What is the 90th percentile of this mixture?

A. 4.2 B. 4.4 C. 4.6 D. 4.8 E. 5.0
38.72 (3 points) X is a mixture of two Exponential Distributions with different means µ and θ.
The weight to the first Exponential Distributions is p, 0 < p < 1.
Show that the hazard rate of this mixture decreases with size.
38.73 (2, 5/85, Q.15) (1.5 points) If X is a random variable with density function
f(x) = 1.4e-2x + 0.9e-3x for x ≥ 0. Determine E(X).
A. 9/20 B. 5/6 C. 1 D. 230/126 E. 23/10
38.74 (160, 11/86, Q.1) (2.1 points) Three populations have constant forces of mortality 0.01,
0.02, and 0.04, respectively. For a group of newborns, one-third from each population, determine
the complete expectation of future lifetime at age 50.
(A) 8.3 (8) 24.3 (C) 50.0 (D) 58.3 (E) 74.3
38.75 (160, 11/86, Q.4) (2.1 points) For a certain population, you are given:
(i) At any point in time, equal numbers of males and females are born.
(ii) The mean and variance of the lifetime distribution for males at birth are 60 and 200, respectively.
(iii) The mean and variance of the lifetime distribution for females at birth are 80 and 300,
respectively.
Determine the variance of the lifetime distribution for the population.
(A) 150 (B) 200 (C) 250 (D) 300 (E) 350
38.76 (4B, 5/93, Q.20) (1 point) Which of the following statements are true?
1. With an n-point mixture of models, the large number of parameters that need
to be estimated may be a problem.
2. Starting with several distributions, the two-point mixture of models leads to
many more pairs of distributions.
3. A potential computational problem with the mixture of models is that estimation
of p in the equation F(x) = pF1 (x) + (1-p)F2 (x) via iterative numerical techniques,
may lead to a value of p outside the interval from 0 to 1.
A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
• The random variable X has a distribution that is a mixture of a Burr distribution,
⎛ 1 ⎞α
F(x) = 1 - ⎜ γ ⎟ , with parameters θ = 1000 , α = 1 and γ = 2,
⎝ 1 + (x / θ) ⎠
and a Pareto distribution, with parameters θ = 1,000 and α = 1.

• Each of the two distributions in the mixture has equal weight.
Determine the median of X.
A. Less than 5
D. At least 500, but less than 5,000
E. At least 5,000
• X1 is a mixture of a random variable with a uniform distribution on [0,2]
and a random variable with a uniform distribution on [1, 3].
(Each distribution in the mixture has positive weight.)
• X2 is the sum of a random variable with a uniform distribution on [0,2]
and a random variable with a uniform distribution on [1, 3].
• X3 is a random variable that has a normal distribution that is right censored at 1.
Match X1 , X2 , and X3 with the following descriptions:
1. Continuous distribution function and continuous density function
2. Continuous distribution function and discontinuous density function
3. Discontinuous distribution function
A. X1 :1, X2 :2, X3 :3 B. X1 :1, X2 :3, X3 :2 C. X1 :2, X2 :1, X3 :3
D. X1 :2, X2 :3, X3 :1 E. X1 :3, X2 :1, X3 :2
38.79 (4B, 11/98 Q.8) (2 points) You are given the following:
• A portfolio consists of 75 liability risks and 25 property risks.
• The risks have identical claim count distributions.
• Loss sizes for liability risks follow a Pareto distribution, with parameters θ = 300 and α = 4.
• Loss sizes for property risks follow a Pareto distribution, with parameters
θ = 1,000 and α = 3.
Determine the variance of the claim size distribution for this portfolio for a single claim.
E. At least 375,000

You are given S = S1 + S2 , where S1 and S2 are independent and have compound Poisson
distributions with the following characteristics:
(i) λ1 = 2 and λ2 = 3
(ii) x p 1 (x) p 2 (x)
1 0.6 0.1
2 0.4 0.3
3 0.0 0.5
4 0.0 0.1
Determine the variance of individual claim amounts for S.
(A) 0.83 (B) 0.87 (C) 0.91 (D) 0.95 (E) 0.99
38.81 (Course 1 Sample Exam. Q.24) (1.9 points) An automobile insurance company divides
its policyholders into two groups: good drivers and bad drivers.
For the good drivers, the amount of an average claim is 1400, with a variance of 40,000.
For the bad drivers, the amount of an average claim is 2000, with a variance of 250,000.
Sixty percent of the policyholders are classified as good drivers.
Calculate the variance of the amount of a claim for a policyholder.
A. 124,000 B. 145,000 C. 166,000 D. 210,400 E. 235,000
38.82 (Course 3 Sample Exam, Q.10) An insurance company is negotiating to settle a liability
claim. If a settlement is not reached, the claim will be decided in the courts 3 years from now.
You are given:
• There is a 50% probability that the courts will require the insurance company to make
a payment. The amount of the payment, if there is one, has a lognormal distribution
with mean 10 and standard deviation 20.
• In either case, if the claim is not settled now, the insurance company will have to pay
5 in legal expenses, which will be paid when the claim is decided, 3 years from now.
• The most that the insurance company is willing to pay to settle the claim is the
expected present value of the claim and legal expenses plus 0.02 times the variance
of the present value.
• Present values are calculated using i = 0.04.
Calculate the insurance company's maximum settlement value for this claim.
A. 8.89 B. 9.93 C. 12.45 D. 12.89 E. 13.53
38.83 (IOA 101, 9/00, Q.8) (4.5 points) Claims on a certain class of policy are classified as being
of two types, I and II.
Past experience has shown that:
25% of claims are of type I and 75% are of type II;
Type I claim amounts have mean 500 and standard deviation 100;
Type II claim amounts have mean 300 and standard deviation 70.
Calculate the mean and the standard deviation of the claim amounts on this class of policy.
38.84 (1, 5/01, Q.17) (1.9 points) An auto insurance company insures an automobile worth
15,000 for one year under a policy with a 1,000 deductible. During the policy year there is a 0.04
chance of partial damage to the car and a 0.02 chance of a total loss of the car.
If there is partial damage to the car, the amount X of damage (in thousands) follows a distribution with
density function f(x) = 0.5003 e-x/2, 0 < x <15.
What is the expected claim payment?
(A) 320 (B) 328 (C) 352 (D) 380 (E) 540
38.85 (3, 11/01, Q.28 & 2009 Sample Q.100) (2.5 points) The unlimited severity distribution for
claim amounts under an auto liability insurance policy is given by the cumulative distribution:
F(x) = 1 - 0.8e-0.02x - 0.2e-0.001x, x ≥ 0.
The insurance policy pays amounts up to a limit of 1000 per claim.
Calculate the expected payment under this policy for one claim.
(A) 57 (B) 108 (C) 166 (D) 205 (E) 240
38.86 (4, 11/02, Q.13) (2.5 points) Losses come from an equally weighted mixture of an
exponential distribution with mean m1 , and an exponential distribution with mean m2 .
Determine the least upper bound for the coefficient of variation of this distribution.
(A) 1 (B) 2 (C) 3 (D) 2 (E) 5
38.87 (SOA3, 11/03, Q.18) (2.5 points) A population has 30% who are smokers with a constant
force of mortality 0.2 and 70% who are non-smokers with a constant force of mortality 0.1.
Calculate the 75th percentile of the distribution of the future lifetime of an individual selected at
random from this population.
(A) 10.7 (B) 11.0 (C) 11.2 (D) 11.6 (E) 11.8
38.88 (CAS3, 11/04, Q.28) (2.5 points) A large retailer of personal computers issues a Warranty
contract with each computer that it sells. The warranty covers any cost to repair or replace a defective
computer within the first 30 days of purchase. 40% of all claims are easily resolved with minor
technical help and do not involve any cost to replace or repair.
If a claim involves some cost to replace or repair, the claim size is distributed as a Weibull with
parameters τ = 1/2 and θ = 30.
1. The expected cost of a claim is $60.
2. The survival function at $60 is 0.243.
3. The hazard rate at $60 is 0.012.
A. 1 only. B. 2 only. C. 3 only. D. 1 and 2 only. E. 2 and 3 only.
38.89 (CAS3, 11/04, Q.29) (2.5 points) High-Roller Insurance Company insures the cost of
injuries to the employees of ACME Dynamite Manufacturing, Inc.
• 30% of injuries are "Fatal" and the rest are "Permanent Total" (PT).
• There are no other injury types.
• Fatal injuries follow a log-logistic distribution with θ = 400 and γ = 2.
• PT injuries follow a log-logistic distribution with θ = 600 and γ = 2.
• There is a $750 deductible per injury.
Calculate the probability that an injury will result in a claim to High-Roller.
A. Less than 30%
E. 45% or more
The distribution of a loss, X, is a two-point mixture:
(i) With probability 0.8, X has a two-parameter Pareto distribution with α = 2 and θ = 100.
(ii) With probability 0.2, X has a two-parameter Pareto distribution with α = 4 and θ = 3000.
Calculate Pr(X ≤ 200).
(A) 0.76 (B) 0.79 (C) 0.82 (D) 0.85 (E) 0.88
38.91 (CAS3, 11/05, Q.32) (2.5 points) For a certain insurance company, 60% of claims have a
normal distribution with mean 5,000 and variance 1,000,000.
The remaining 40% have a normal distribution with mean 4,000 and variance 1,000,000.
Calculate the probability that a randomly selected claim exceeds 6,000.
A Less than 0.10
E. At least 0.25
38.92 (SOA M, 11/05, Q.32) (2.5 points) For a group of lives aged 30, containing an equal
number of smokers and non-smokers, you are given:
(i) For non-smokers, µn (x) = 0.08, x ≥ 30.
(ii) For smokers, µs(x) = 0.16, x ≥ 30.
Calculate q80 for a life randomly selected from those surviving to age 80.
(A) 0.078 (B) 0.086 (C) 0.095 (D) 0.104 (E) 0.112
38.93 (CAS3, 11/06, Q.20) (2.5 points)

An insurance company sells hospitalization reimbursement insurance. You are given:
• Benefit payment for a standard hospital stay follows a lognormal distribution with µ = 7 and σ = 2.
• Benefit payment for a hospital stay due to an accident is twice as much as a standard benefit.
• 25% of all hospitalizations are for accidental causes.
Calculate the probability that a benefit payment will exceed $15,000.
A. Less than 0.12
E. At least 0.18
38.94 (IOA, CT8, 4/10, Q.9) (7.5 points) An asset is worth 100 at the start of the year and is
funded by a senior loan and a junior loan of 50 each.
The loans are due to be repaid at the end of the year;
the senior one with annual interest at 6% and the junior one with annual interest at 8%.
Interest is paid on the loans only if the asset sustains no losses.
Any losses of up to 50 sustained by the asset reduce the amount returned to the investor in the
junior loan by the amount of the loss. Any losses of more than 50 mean that the investor in the junior
loan gets 0 and the amount returned to the investor in the senior loan is reduced by the excess of
the loss over 50.
The probability that the asset sustains a loss is 0.25. The size of a loss, L, if there is one, follows a
uniform distribution between 0 and 100.
(i) (6 points)
(a) Calculate the variance of the distribution of amounts paid back to the investors in the junior loan.
(b) Calculate the variance of the distribution of amounts paid back to the investors in the senior loan.
(ii) (1.5 points) Calculate the probabilities for the investors in the junior and senior loans, that they get
paid back less than the original amounts of their loans.
38.95 (IOA CT8, 9/10, Q.1) (5.25 points) An investor holds an asset that produces a random rate
of return, R, over the course of a year.
The distribution of this rate of return is a mixture of Normal distributions:
R has a Normal distribution with a mean of 0% and standard deviation of 10% with probability 0.8
and a Normal distribution with a mean of 30% and a standard deviation of 10% with a probability of
0.2.
S is the normally distributed random rate of return on another asset that has the same mean and
variance as R.
(i) (2.25 points) Calculate the mean and variance of R.
(ii) (3 points) Calculate the following probabilities for R and for S:
(a) probability of a rate of return less than 0%.
(b) probability of a rate of return less than -10%.
38.1. A. The chance of a claim greater than 2 is: 1 - F(2) = (θ/(θ+2))α = (10/12)4 = 0.4823.
38.2. D. The mean of a Pareto is: θ/(α-1) = 10/3 = 3.333.
38.3. A. The second moment of a Pareto is: 2θ2 / {(α-1)(α-2)} = 200/6 = 33.333.
38.4. C. The variance is : 33.333 - 3.3332 = 22.22. Thus the CV = 22.22 / 3.333 = 1.414.
Comment: For the Pareto the CV is: α / (α - 2) = 4 / 2 = 2 = 1.414.
38.5. B. The third moment of a Pareto is : 6θ3 / {(α-1)(α-2)(α-3)} = 6000/6 = 1000.
E[X3] - 3 E[X] E[X2] + 2 E[X] 3

38.6. B. Skewness = =
STDDEV 3
{1000 - (3)(3.333)(33.333) + (2)(3.333)3 } / (22.22)1.5 = 7.07.
Comment: For the Pareto, Skewness = 2{(a+1)/(α-3)} (α - 2) / α = (2){(5)/(1)} 2 / 4 = 7.07.
38.7. C. The excess ratio for the Pareto is: {θ/(θ+x)}α−1 = (10/15)3 = 0.2963.
38.8. B. The chance of a claim greater than 2 is: 1 - F(2) = e-2/δ = e-2/0.8 = 0.0821.
38.9. E. The mean of the Exponential Distribution is: δ = 0.8.
38.10. A. The second moment of the Exponential is: 2δ2 = 1.28.
38.11. E. The variance = 1.28 - 0.82 = 0.64. Thus the standard deviation = 0.8.
The CV = standard deviation divided by the mean = 0.8 / 0.8 = 1.
38.12. E. The third moment of the exponential is 6δ3 = 3.072.

E[X3] - 3 E[X] E[X2] + 2 E[X] 3

38.13. D. Skewness = =
STDDEV 3
{3.072 - (3)(0.8)(1.28) + (2)(0.8)3 } / (0.64)1.5 = 2.

Comment: The C.V. of the exponential distribution is always 1, while the skewness is always 2.
38.14. C. The excess ratio for the Exponential is: e−x/δ = e-5/0.8 = 0.00193.
38.15. D. The chance of a claim greater than 2 is: (0.05)(0.4823) + (0.95)(0.0821) = 0.102.
38.16. B. The mean is a weighted average of the individual means:

(0.05)(3.333) + (0.95)(0.8) = 0.9267.
38.17. B. The second moment is a weighted average of the individual second moments:
(0.05)(33.333 ) + (0.95)(1.28) = 2.883.
38.18. C. The variance is: 2.883 - 0.9272 = 2.024. Thus the CV = 2.024 / .927 = 1.535.
38.19. D. The third moment is a weighted average of the individual third moments:
(0.05)(1000) + (0.95)(3.072) = 52.918.
E[X3] - 3 E[X] E[X2] + 2 E[X] 3

38.20. D. Skewness = =
STDDEV 3
{52.918 - (3)(0.927)(2.883) + (2)(0.927)3 } / (2.024)1.5 = 16.15.
38.21. C. The excess ratio for the mixed distribution is the weighted average of the individual
excess ratios, using as the weights the means times p or 1-p:
{(0.05)(3.333)(0.2963) + (0.95)(0.8)(0.00193)} / {(0.05)(3.333) + (0.95)(0.8)} =
0.05085 / 0.9267 = 0.0549.
Comment: Almost certainly beyond what will be asked on the exam.
38.22. E. For the Pareto, S(2) = {θ/(θ+2)}α = (10/12)4 = 0.4823.

For the Exponential, S(2) = exp[-2/θ]} = exp[-2/.8] = 0.0821.
For the mixture, S(2) = (5%)(0.4823) + (95%)(0.0821) = 0.1021.
For the Pareto, the expected losses excess of 2 are:
E[X] - E[X ∧ 2] = θ/(α-1) - {θ/(α-1)} {1-(θ/(θ+2))α−1} = (10/3)(10/12)3 = 1.9290.
For the Exponential, the expected losses excess of 2 are:
E[X] - E[X ∧ 2] = θ - θ{1 - exp[-2/θ]} = θ exp[-2/θ] = 0.8 exp[-2/.8] = 0.0657.
For the mixture, the expected losses excess of 2 are: (5%)(1.9290) + (95%)(0.0657) = 0.1589.
For the mixture, e(2) = 0.1589/0.1021 = 1.556.
Comment: For the Exponential e(2) = θ = 0.8.
For the Pareto, e(2) = (2 + θ) / (α - 1) = (2 + 10) / (4 - 1) = 4.
The mean excess loss of the mixture is not equal to the mixture of the mean excess losses:
(5%)(4) + (95%)(0.8) = 0.96 ≠ 1.556.
Here is a graph of the mean excess loss of the mixture:
Mean ExcessLoss
x
2 4 6 8 10
38.23. D. The mean of each Exponential is: θ.

The second moment of each Exponential is: 2θ2 .
The mean and second moment of the mixed distribution are the weighted average of those of the
individual distributions. Therefore, the mixed distribution has mean:
0.4θ1 + 0.6θ2 = (0.4)(1128) + (0.6)(5915) = 4000 and
second moment: 2(0.4θ12 + 0.6θ22) = 2{(0.4)(11282 ) + (0.6)(59152 )} = 43,002,577.
Variance = 43,002,577 - 40002 = 27.0 million.

38.24. D. For the Exponential Distribution, E[X ∧ x] = θ(1 - e-x/θ).

The given distribution is a 40%-60% mixture of two Exponentials, with means 1128 and 5915.
Therefore, the limited expected value at 2000 is a weighted average of the LEVs for the individual
Exponentials.
E[X ∧ 2000] = (0.4) {(1128)(1 - e-2000/1128)} + (0.6) {(5915)(1 - e-2000/5915)} = 1393.
38.25. A. E[(X - 2000)+] = E[X] - E[X ∧ 2000] = 4000 - 1393 = 2607.

Alternately, for each Exponential, E[(X - 2000)+] =
∞ ∞
∫ S(x) dx =
∫ e- x / θ dx = θ e-2000/θ.
2000 2000
For θ = 1128, E[(X - 2000)+] = 1128 e-2000/1128 = 191.6.
For θ = 5915, E[(X - 2000)+] = 5915 e-2000/5915 = 4218.0.

For the mixture, E[(X - 2000)+] = (0.4)(191.5) + (0.6)(4218.0) = 2607.
38.26. D. Let F be the mixed distribution, then:

F(x) = (0.3){1-1000/(1000+x)} + (0.7){1-100/(100+x)}
The 20th percentile is that x such that F(x) = .2.
Thus the 20th percentile of the mixed distribution is the value of x such that:
0.2 = (0.3){1-1000/(1000+x)} + (0.7){1-100/(100+x)}.
Thus 0.8 = 300/(1000+x) + 70/(100+x). Thus 0.8x2 + 510x - 20000 = 0.
Thus x = {-510 + (5102 ) + (4)(0.8)(20,000) }/ {(2)(0.8)} = 37.1.
38.27. A. E[X] = (2/3)(1000) + (1/3)(250) = 750.

E[X ∧ 200] = (2/3){(0.1)(100) + (0.9)(200)} + (1/3){(.4)(100) + (0.6)(200)} = (
2/3)(190) + (1/3)(160) = 180. E[(X - 200)+] = E[X] - E[X ∧ 200] = 750 - 180 = 570.
Alternately, for each uniform from 0 to b, E[(X - 200)+] =
b b
∫200 S(x) dx = 200∫ (1 - x / b) dx = (b - 200) - (b/2 - 20000/b) = b/2 + 20000/b - 200.

For b = 2000, E[(X - 200)+] = 810. For b = 500, E[(X - 200)+] = 90.
For the mixture, E[(X - 200)+] = (2/3)(810) + (1/3)(90) = 570.
Comment: Mathematically the same as a mixture of two uniform distributions, with weight 2/(2 + 1) =
2/3 to the first uniform distribution.
38.28. B. For the mixture, S(200) = (2/3)(.9) + (1/3)(.6) = 0.8.

Average payment per payment = E[(X - 200)+] / S(200) = 570/0.8 = 712.5.
Alternately, nonzero payments for medical have mean frequency of: (0.9)(2) = 1.8.
Nonzero payments for medical are uniform from 0 to 1800 with mean 900.
Nonzero payments for dental have mean frequency of: (0.6)(1) = .6.
Nonzero payments for dental are uniform from 0 to 300 with mean 150.
{(1.8)(900) + (0.6)(150)} / (1.8 + 0.6) = 712.5.
38.29. B. S(x) = 1 - F(x) = 0.2 e-x/10 + 0.5 e-x/25 + 0.3 e-x/100.

S(15) = 0.2 e-15/10 + 0.5 e-15/25 + 0.3 e-15/100 = 57.7%.
38.30. D. The mean of the mixed distribution is a weighted average of the mean of each
Exponential Distribution: (0.2)(10) + (0.5)(25) + (0.3)(100) = 44.5.
Comment: This is a 3-point mixture of Exponential Distributions, with means of 10, 25, and 100
respectively.
38.31. B. Set F(x) = 1 - 0.2e-x/10 + 0.5e-x/25 + 0.3e-x/100.

One can calculate the distribution function at the endpoints of the intervals and determine that the
median is between 19 and 20. (Solving numerically, median = 19.81.)
x 1 - Exp(-x/10) 1- Exp(-x/25) 1- Exp(-x/100) Mixed Distribution
19 0.850 0.532 0.173 0.488
20 0.865 0.551 0.181 0.503
21 0.878 0.568 0.189 0.516
22 0.889 0.585 0.197 0.530
38.32. A. The mode of every Exponential Distribution is 0, thus so is that of the Mixed
Exponential.
Comment: If the individual distributions of a mixture have different modes, then in general it would
difficult to calculate algebraically the mode. One could do so by graphing the mixed density and
seeing where it reached a maximum.
38.33. E. Each Exponential Distribution has a second moment of 2θ2.

The second moment of the mixture is a weighted average of the individual second moments:
(0.2)(2)(102 ) + (0.5)(2)(252 ) + (0.3)(2)(1002 ) = 6665.
38.34. E. Variance = 6665 - 44.522 = 4684.75. CV = 4684.75 / 44.5 = 1.54.

Comment: Note that while the CV of every Exponential is 1, the CV of a mixed exponential is
always greater than one.
38.35. D. Each Exponential Distribution has a third moment of 6θ3.

The third moment of the mixture is a weighted average of the individual third moments:
(0.2)(6)(103 ) + (0.5)(6)(253 ) + (0.3)(6)(1003 ) = 1,848,078.
38.36. E. skewness = {1,848,078 -(3)(44.50)(6665) + (2)(44.53 )} / 4684.751.5 = 3.54.

Comment: Note that while the skewness of every Exponential is 2, the skewness of a mixed
exponential is always greater than 2.
38.37. A. f(x) = (0.2)(e-x/10/10) + (0.5)(1-e-x/25/25) + (0.3)(1-e-x/100/100) . f(50) = 0.00466.

S(x) = 0.2e-x/10 + 0.5e-x/25 + 0.3e-x/100. S(50) = 0.25097.
h(50) = f(50)/S(50) = 0.00466 / 0.25097 = 0.0186.
Comment: Note that while the hazard rate of each Exponential Distribution is independent of x, that
is not true for the Mixed Exponential Distribution.
38.38. C. For each individual Exponential, the Limited Expected Value is: θ(1-e-x/θ).
The Limited Expected Value of the mixture is a weighted average of the individual Limited
Expected Values: (0.2)(10)(1-e-20/10) + (0.5)(25)(1-e-20/25) + (0.3)(100)(1-e-20/100) = 14.05.
38.39. A. E[X ∧ 15] = (0.2)(10)(1-e-15/10) + (0.5)(25)(1-e-15/25) + (0.3)(100)(1-e-15/100) =

11.37. E[X] = 44.5. LER(15) = E[X ∧ 15] / E[X] = 11.37 / 44.5 = 25.6%.
38.40. D. E[X ∧ 75] = (0.2)(10)(1-e-75/10) + (0.5)(25)(1-e-75/25) + (0.3)(100)(1-e-75/100) =

29.71. E[X] = 44.5. R(75) = 1 - E[X ∧ 75] / E[X] = 29.71 / 44.5 = 33.2%.
38.41. D. For each individual Exponential, the Limited Second Moment is:
2θ2Γ(3;x/θ) + x2 e-x/θ. Using Theorem A.1 in Appendix A of Loss Models,
Γ(3;x/θ) = 1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2 e-x/θ /2.
Thus E[(X ∧ x)2 ] = 2θ2Γ(3; x/θ) + x2 e-x/θ = 2θ2 - 2θ2e-x/θ - 2xθe-x/θ = 2θ{θ - (θ+x)e-x/θ} .
For θ = 10, E[(X ∧ 30)2 ] = 20{10 - 40e-3} = 160.17.
For θ = 25, E[(X ∧ 30)2 ] = 50{25 - 55e-1.2} = 421.72.
For θ = 100, E[(X ∧ 30)2 ] = 200{100 - 130e-0.3} = 738.72.
The Limited Second Moment of the mixture is a weighted average of the individual Limited
Second Moments: (0.2)(160.17) + (0.5)(421.72) + (0.3)(738.72) = 464.51.
Comment: Difficult. One can compute the integral in the limited second moment of the Exponential
Distribution by repeated use of integration by parts.
38.42. B. E[X ∧ 50] = (0.2)(10)(1-e-50/10) + (0.5)(25)(1-e-50/25) + (0.3)(100)(1-e-50/100) =

24.60. E[X] = 44.5. S(x) = 0.2e-x/10 + 0.5e-x/25 + 0.3e-x/100. S(50) = 0.25097.
e(50) = (E[X] - E[X ∧ 50]) / S(50) = (44.5-24.60)/.25097 = 79.3.
Comment: Note that while for each Exponential Distribution e(x) is independent of x, that is not true
for the Mixed Exponential Distribution. For the Mixed Distribution, the Mean Excess Loss increases
with x, towards the largest mean of the individual Exponentials.
For example, in this case e(200) = 99.7.
The tail behavior of the mixed exponential is that of the individual exponential with the largest mean.
38.43. In this case, the mixed distribution is unimodal.
density
0.05
0.04
0.03
0.02
0.01
x
10 20 30 40 50
38.44. D. Prob[ω = 80 | surviving to least 70] =

Prob[surviving to least 70 | ω = 80] Prob[ω = 80] / Prob[surviving to least 70] =
(10/80)(0.4) / {(10/80)(0.4) + (30/100)(0.6)} = 21.7%.
Prob[ω = 100 | surviving to least 70] = (30/100)(0.6) / {(10/80)(0.4) + (30/100)(0.6)} = 78.3%.
Therefore, expected age at death is: (21.7%)(70 + 80)/2 + (78.3%)(70 + 100)/2 = 82.8.
Comment: DeMoivreʼs Law is uniform from 0 to ω.
38.45. E. This can be thought of as a two point mixture between a severity that is always zero and
a uniform distribution on [1000, 5000].
Mean = (80%)(0) + (20%)(3000) = 600.
2nd moment of uniform on [1000, 5000] is: (50003 - 10003 )/{(3)(5000 - 1000)} = 10,333,333.
Second moment of mixture = (80%)(0) + (20%)(10,333,333) = 2,066,667.
Variance of mixture = 2,066,667 - 6002 = 1,706,667.
Alternately, this can be thought of as a Bernoulli Frequency with q = 0.2 and a uniform severity.
Variance of Aggregate = (mean freq.)(var. of severity) + (mean severity)2 (variance of freq.) =
(0.2)(40002 /12) + 30002 (0.2)(0.8) = 1,706,667.
Comment: This can also be thought of as a two component splice between a point mass at 0 of
80% and a uniform distribution with weight 20%.
38.46. f(x) = θ−αxα−1 e−x/θ / Γ(α). With α = 4 and θ = 3, f(x) = x2 e-x/3/ 486:
density
0.07
0.06
0.05
0.04
0.03
0.02
0.01
x
10 20 30 40
With α = 6 and θ = 10, f(x) = x5 e-x/10 / 120,000,000:

density
0.015
0.010
0.005
x
20 40 60 80 100 120 140
With 30% weight to the first distribution and 70% weight to the second distribution:
density
0.020
0.015
0.010
0.005
x
20 40 60 80 100 120 140
Comment: In this case, the mixed distribution is bimodal.
Two-point mixtures of distributions each unimodal, can be either unimodal or bimodal.
In this example, with 3% weight to the first Gamma and 97% weight to the second Gamma, the
mixture would have been unimodal.
38.47. B. The distribution of X is a two-point mixture:

with probability 0.7, X has an Exponential distribution with θ = 100,
and with probability 0.3, X has an Exponential distribution with θ = 200.
∞ ∞
E[X | X > 50] = ∫50 {0.7 f1(x) + 0.3 f2(x)} x dx / 50∫ 0.7 f1(x) + 0.3 f2(x) dx
∞ ∞ ∞ ∞
= { 0.7 ∫50 f1(x) x dx + 0.350∫ f2(x) x dx } / { 0.750∫ f1(x) dx + 0.350∫ f2(x) dx }
= {(0.7) S1 (50) (50 + e1 (50)) + (0.3) S2 (50) (50 + e2 (50))} / {(0.7) S1 (50) + (0.3) S2 (50)}
= {(0.7) e-50/100 (50 + 100) + (0.3) e-50/200 (50 + 200)} / {(0.7) e-50/100 + (0.3) e-50/200}
= 122.096/.6582 = 185.5.
Alternately, for the mixture, S(50) = (0.7) e-50/100 + (0.3) e-50/200 = 0.6582.
E[X] = (0.7)(100) + (0.3)(200) = 130.
E[X ∧ 50] = (0.7){(100)(1 - e-50/100)} + (0.3){(200)(1 - e-50/200)} = 40.815.
Average size of those repairs longer than 50 is:
{E[X] - (E[X ∧ 50] - 50S(50))}/S(50) = {130 - (40.815 - 50(0.6582))} / 0.6582 = 185.5.
∞ ∞
Comment: e(x) = ∫x f(t)( t - x) dt / S(x). ⇒ e(x) S(x) = ∫x f(t) t dt - x S(x).
∞
⇒ ∫x f(t) t dt = S(x) {x + e(x)}.
38.48. E. In 2002 the losses are a 70%-30% mixture of two Exponentials with means 1000 and
5000. In 2004 the losses are a 70%-30% mixture of two Exponentials with means 1080 and 5400.
For the Exponential, E[X ∧ x] = θ(1 - e-x/θ).
For θ = 1080, E[X ∧ 3000] = (1080)(1 - e-3000/1080) = 1012.8.
For θ = 5400, E[X ∧ 3000] = (5400)(1 - e-3000/5400) = 2301.7.
In 2004 the average payment per loss: (0.7)(1012.8) + (0.3)(2301.7) = 1399.5.
38.49. E. The LogNormal has mean: exp[10 + 0.72 /2] = 28,141,

and second moment: exp[(2)(10) + (2)(0.72 )] = 1,292,701,433.
The Gamma has mean: (2)(10,000) = 20,000,
and second moment: (2)(2+1)(10,0002 ) = 600,000,000.
The Pareto has mean: 200,000/(5 - 1) = 50,000,
and second moment: (2)200,0002 /{(5 - 1)(5 - 2)} = 6,666,666,667.
The Exponential has mean: 5000, and second moment: (2)(50002 ) = 50,000,000.
The mixed distribution has mean:
(0.1)(28,141) + (0.3)(20,000) + (0.2)(50,000) + (0.4)(5000) = 20,814.
The mixed distribution has second moment:
(0.1)(1,292,701,433) + (0.3)(600,000,000) + (0.2)(6,666,666,667) + (0.4)(50,000,000)
= 1662.60 million.
Standard deviation of the mixed distribution: 1662.60 million - 20,8142 = 35,062.
Comment: Not intended to be a realistic model of Homeowners Insurance. For example, the size
of loss distribution would depend on the value of the insured home. For wind and fire, there
would be point masses of probability at the value of the insured home. The mix of losses by
peril would depend on the location of the insured home.
38.50. C. For Type I, the Single Parameter Pareto has mean αθ/(α - 1) = (4)(10)/3 = 13.3333,
2nd moment αθ2/(α - 2) = (4)(100)/2 = 200, and variance 200 - 13.33332 = 22.222.
For Type II, the Single Parameter Pareto has mean αθ/(α - 1) = (3)(10)/2 = 15,
second moment αθ2/(α − 2) = (3)(100)/1 = 300, and variance 300 - 152 = 75.
Sum of 600 risks of Type I and 400 risks of Type II has variance:
(600)(22.222) + (400)(75) = 43,333.
Comment: We know exactly how many of each type we have, rather than picking a certain number
of risks at random.
38.51. C. This is a 60%-40% mixture of the two Single Parameter Pareto Distributions.
The mixture has mean: (0.6)(13.3333) + (0.4)(15) = 14.
The mixture has second moment: (0.6)(200) + (0.4)(300) = 240.
The mixture has variance: 240 - 142 = 44.
38.52. B. If the risk is of Type I, then the variance of outcomes is 22.222.

If the risk is of Type II, then the variance of outcomes is 75.
The expected value of this variance is: (0.6)(22.222) + (0.4)(75) = 43.333.
Comment: This is the Expected Value of the Process Variance.
The Variance of the Hypothetical Means is: (0.6)(13.3333 - 14)2 + (0.4)(15- 14)2 = 0.667.
Expected Value of the Process Variance + Variance of the Hypothetical Means =
43.333 + 0.667 = 44 = Total Variance. Note that the variance of the sum of one loss from each of
the 1000 risks is: (1000)(43.333) = 43,333, the solution to a previous question.
38.53. E[Y] = (0.3)(0.5) + (0.7)(0) = 0.15.

Second moment of uniform = variance of uniform + (mean of uniform)2 = 62 /12 + 02 = 3.
E[Y2 ] = (0.3)(2)(0.52 ) + (0.7)(3) = 2.25. ⇒ Var[Y] = 2.25 - 0.152 = 2.2275.
Prob[Y ≤ 1] = (0.3)(1 - exp[-1/0.5]) + (0.7)(4/6) = 72.61%.
Comment: Setup taken from 3, 11/02, Q.10 (2009 Sample Q.81), which is about Simulation.
38.54. A. Profit per applicant is a mixed distribution, with 50% weight to heads and 50% weight to
tails. These are each in turn mixed distributions.
Heads is 2/3 weight to good and 1/3 weight to bad,
with mean: (2/3)(300) + (1/3)(-100) = 166.67,
and with second moment: (2/3)(10000 + 3002 ) + (1/3)(90000 + 1002 ) = 100,000.
Tails is 2/3 weight to good and 1/3 weight to zero,
with mean: (2/3)(300) + (1/3)(0) = 200,
and with second moment: (2/3)(10000 + 3002 ) + (1/3)(02 ) = 66,667.
The overall mean profit is: (50%)(166.67) + (50%)(200) = 183.33.
The overall second moment of profit is: (50%)(100,000) + (50%)(66,667) = 83,333.
The variance of the profit per applicant is: 83333 - 183.332 = 49,722.
Comment: Information taken from 3, 11/02, Q.15.
38.55. C. Of the original applicants: (50%)(2/3) = 1/3 are heads and good, (50%)(1/3) = 1/6 are
heads and bad, (50%)(2/3) = 1/3 are tails and good, (50%)(1/3) = 1/6 are tails and bad. Bob
accepts the first three types, 5/6 of the total. Thus profit per accepted applicant is a mixed
distribution, with (1/3 + 1/3)/(5/6) = 80% weight to good and (1/6)/(5/6) = 20% weight to bad.
The mean profit is: (80%)(300) + (20%)(-100) = 220.
The second moment of profit is: (80%)(10000 + 3002 ) + (20%)(90000 + 1002 ) = 100,000.
The variance of the profit per accepted applicant is: 100000 - 2202 = 51,600.
38.56. A. For good students, f(x) = (6 + 2 - 1)!/{(6-1)!(2-1)!} (x/100)6 (1 - x/100)2-1/x

= 42 x5 (1 - x/100)/1012, 0 ≤ x ≤ 100.
65
F(65) = 42 / 1012
∫ x5 - x6 / 100 dx = .2338.
0
65 65
∫ x f(x) dx = 42 / 1012
∫ x6 - x7 / 100 dx = 12.685.
0 0
Average grade for those good students who fail: 12.685/.2338 = 54.26.
For bad students, f(x) = (3 + 2 - 1)!/{(3-1)!(2-1)!} (x/100)3 (1 - x/100)2-1/x
= 12 x2 (1 - x/100)/106 , 0 ≤ x ≤ 100.
65 65 65
F(65) = 12 / 106 ∫ x2 - x 3 / 100 dx = .5630. ∫ x f(x) dx = 12 / 106 ∫ x3 - x 4 / 100 dx = 25.705.
0 0 0
Average grade for those bad students who fail: 25.705/0.5630 = 45.66.
Prob[Good | failed] = Prob[fail | Good] Prob[Good] / Prob[fail] =
(0.2338)(0.75) / {(0.2338)(0.75) + (0.5630)(0.25)} = 55.47%.
Expected grade of a student who fails: (0.5547)(54.26) + (1 - 0.5547)(45.66) = 50.4.
Comment: The distribution of grades is given as continuous, so we integrate from 0 to 65.
38.57. E. E[X] = (0.65)exp[8 + 0.52 /2] + (0.35)exp[9 + .32 /2] = (0.65)(3378) + (0.35)(8476) =
5162.
38.58. B. E[X2 ] = (0.65)exp[(2)(8) + (2)(0.52 )] + (0.35)exp[(2)(9) + (2)(0.32 )] =

(0.65)(14,650,719) + (0.35)(78,609,255) = 37,036,207.
38.59. C. For the LogNormal, E[X-1] = exp[-µ + σ2/2].

E[X-1] = (0.65)exp[-8 + 0.52 /2] + (0.35)exp[-9 + 0.32 /2] =
(0.65)(0.0003801) + (0.35)(.00012909) =
0.00029225. 1/E[1/X] = 1/0.00029225 = 3422.
⎡ ln(x) - µ - σ2 ⎤ ⎡ ln(x) - µ ⎤
38.60. E. E[X ∧ x] = exp(µ + σ2/2) Φ ⎢ ⎥ + x {1 - Φ ⎢ ⎥⎦ }.
⎣ σ ⎦ ⎣ σ
For µ = 8 and σ = 0.5,

E[X ∧ 6000] = 3378 Φ[(ln(6000) - 8 - 0.52 )/0.5] + 6000 (1 - Φ[(ln(6000) - 8) /0.5]) =
3378 Φ[0.90] + 6000 (1 - Φ[1.40]) = (3378)(0.8159) + (6000)(1 - 0.9192) = 3241.
For µ = 9 and σ = 0.3,
E[X ∧ 6000] = 8476 Φ[(ln(6000) - 9 - 0.32 )/0.3] + 6000 (1 - Φ[(ln(6000) - 9) /0.3]) =
8476 Φ[-1.30] + 6000 (1 - Φ[-1.00]) = (8476)(0.0968) + (6000)(0.8413) = 5868.
For the mixture, E[X ∧ 6000] = (65%)(3241) + (35%)(5868) = 4160.
38.61. C. E[(X ∧ x)2 ] = exp[2µ + 2σ2] Φ [ ln(x) − σµ − 2σ2

] + x2 {1 - Φ[ ln(x)σ − µ]}.
For µ = 8 and σ = 0.5,
E[(X ∧ 6000)2 ] = 14,650,719 Φ[{ln(6000) - 8 - (2)(0.52 )}/0.5] + 60002 (1 - Φ[(ln(6000) - 8) /0.5]) =
14,650,719 Φ[0.40] + 60002 (1 - Φ[1.40]) =
(14,650,719)(0.6554) + (36,000,000)(1 - 0.9192) = 12,510,881.
For µ = 9 and σ = 0.3,
E[(X ∧ 6000)2 ] = 78,609,255 Φ[{ln(6000) - 9 - (2)0.32 )}/0.3] + 60002 (1 - Φ[(ln(6000) - 9) /0.3]) =
78,609,255 Φ[-1.60] + 60002 (1 - Φ[-1.00])
= (78,609,255)(0.0548) + (36,000,000)(0.8413) = 34,594,587.
For the mixture, E[(X ∧ 6000)2 ] = (65%)(12,510,881) + (35%)(34,594,587) = 20.240 million.
For the mixture, Var[X ∧ 6000] = 20.240 million - 41602 = 2.934 million.
38.62. C. & 38.63. B. & 38.64. D. & 38.65. C. Mean = (0.5)(8%) + (0.5)(11%) = 9.5%.
For each Normal, its second moment is equal to: σ2 + µ2.
Second Moment of R is: (0.5)(0.22 + 0.082 ) + (0.5)(0.32 + 0.112 ) = 0.07425.
Variance of R is: 0.07425 - 0.0952 = 0.0652. 0.0652 = 0.255.
Third Moment of the first Normal is: 0.083 + (3)(0.08)(0.22 ) = 0.01011.
Third Moment of the second Normal is: 0.113 + (3)(0.11)(0.32 ) = 0.03103.
Third Moment of R is: (0.5)(0.01011) + (0.5)(0.03103) = 0.02057.
Third Central Moment of R is: 0.02057 - (3)(0.095)(0.07425) + (2)(0.0953 ) = 0.001124.
Skewness of R is: 0.001124/0.06521.5 = 0.0675.
Fourth Moment of the first Normal is: 0.084 + (6)(0.082 )(0.22 ) + (3)(0.24 ) = 0.006377.
Fourth Moment of the second Normal is: 0.114 + (6)(0.112 )(0.32 ) + (3)(0.34 ) = 0.030980.
Fourth Moment of R is: (0.5)(.006377) + (0.5)(.030980) = 0.01868.
Fourth Central Moment of R is: 0.01868 - (4)(0.095)(0.02057) + (6)(.0952 )(.07425) - (3)(.0954 ) =
0.01464. Kurtosis of R is: 0.01464/0.06522 = 3.44.
Comment: Note that each Normal has a kurtosis of 3, yet the mixture has a kurtosis greater than 3.
Mixtures tend to have heavier tails.
38.66. Let the CV of each component be c. Let µ1/µ2 = r.
Then the mean of the mixture is: p µ1 + (1-p)µ2 = µ2{r p + 1 - p}.
σ1 = cµ1 = crµ2. σ2 = cµ2.
The second moment of the mixture is: p{σ12 + µ12} + (1-p){σ22 + µ22} =
p{c2 r2 µ22 + r2 µ22} + (1-p){c2 µ22 + µ22} = µ22(1 + c2 )(pr2 + 1 - p).
µ 2 (1 + c 2)(pr2 + 1 - p) 2
2 ) pr + 1 - p .
For the mixture: 1 + CV2 = E[X2 ]/E[X]2 = 2 = (1 + c
µ 22 {r p + 1 - p}2 {r p + 1 - p}2
The relationship of the CV of the mixture to that of each component, c, depends on the ratio:
1 + CV2 pr2 + 1 - p
= .
1 + c2 {r p + 1 - p}2
If this ratio is one, then CV = c. If this key ratio is greater than one, the CV > c.
If p = 0 or p = 1, then this key ratio is one. In this case we really do not have a mixture.
For r = 1, this key ratio is one. Thus if the two components have the same mean, then the CV of the
mixture is equal to the CV of each component.
For p fixed, 0 < p < 1, take the derivative with respect to r of this key ratio:
2pr(rp + 1 - p)2 - (pr2 + 1 - p)2(rp + 1 - p)p r(rp + 1 - p) - (pr2 + 1 - p)
4 = 2p =
{r p + 1 - p} {r p + 1 - p}3
r - 1
2p(1-p) .
{r p + 1 - p}3
Since p > 0, 1 - p > 0, and the denominator of this derivative is positive, the sign of the derivative
depends on r - 1.
For r < 1 this derivative is negative, and for r > 1 this derivative is positive.
Thus for p fixed, the minimum of the key ratio occurs for r = 1.
Thus if the two components have the same mean, then the CV of the mixture is equal to
the CV of each component.
However, if the two components have different means, in other words r ≠ 1, then the CV
of the mixture is greater than the CV of each component.
Alternately, the variance of the mixture = EPV + VHM.
If the means of the two components are equal, then the VHM = 0, and the variance of the mixture =
EPV. If the means of the two components are not equal, then the VHM > 0, and the variance of the
mixture is greater than the EPV.
The EPV = p σ12 + (1 - p)σ22 = pc2 r2 µ22 + (1-p)c2 µ22 = c2 µ22 (pr2 + 1 - p).
Thus if the means of two components differ, in other words if r ≠ 1,

C V2 of mixture = (Variance of mixture) / (mean of mixture)2 > EPV / (mean of mixture)2 =
c2µ 22 (pr2 + 1 - p)
.
µ2 2{r p + 1 - p}2
(CV of mixture)2 pr2 + 1 - p

Thus for r ≠ 1, > .
c2 {r p + 1 - p}2
As before, we can show that this key ratio is greater than one when r ≠ 1.
Thus if the two components have different means, in other words r ≠ 1, then the CV of the
mixture is greater than the CV of each component.
38.67. The mean of the mixture is: (2763)(0.824796) + (24,548)(0.159065) +

(275,654)(0.014444) + (1,917,469)(0.001624) + (10,000,000)(0.000071) = 13,989.
For an Exponential with mean θ, the losses excess of x are:
∞ ∞
∫x S(t) dt = ∫x e- t / θ dt = θ e-x/θ.
For the mixture, the losses excess of x are a weighted average of the excess losses for each
Exponential.
The expected losses excess of 100,000 are: (2763e-100000/2763)(0.824796) +
(24,548e-100000/24548)(0.159065) + (275,654e-100000/275654)(0.014444) +
(1,917,469e-100000/1917469)(0.001624) + (10,000,000e-100000/10000000)(0.000071) = 6495.
The excess ratio at 100,000 is: 6495 / 13,989 = 46.4%.
The expected losses excess of 1 million are: (2763e-1000000/2763)(0.824796) +
(24,548e-1000000/24548)(0.159065) + (275,654e-1000000/275654)(0.014444) +
(1,917,469e-1000000/1917469)(0.001624) + (10,000,000e-1000000/10000000)(0.000071) = 2797.
The excess ratio at 1 million is: 2597 / 13,989 = 18.6%.
The expected losses excess of 10 million are: (2763e-10000000/2763)(0.824796) +
(24,548e-10000000/24548)(0.159065) + (275,654e-10000000/275654)(0.014444) +
(1,917,469e-10000000/1917469)(0.001624) + (10,000,000e-10000000/10000000)(0.000071) =
278.1.
The excess ratio at 10 million is: 278.1 / 13,989 = 1.99%.
Comment: Beyond what you will be asked on your exam.
Similar to the mixture of Exponentials used for Commercial Automobile by I.S.O.
Values taken from “Introduction to Increased Limit Factors” by Li Zhu,
presented at the 2011 CAS Ratemaking and Product Management Seminar.
38.68. B. & 38.69. A. Mean of the mixture is: (50%)(2.5) + (30%)(3.5) + (20%)(4.5) = 3.2.
The variance of the first uniform distribution is: (4 - 1)2 /12 = 3/4.
Thus the second moment of the first uniform distribution is: 3/4 + 2.52 = 7.
The second moment of the second uniform distribution is: 3/4 + 3.52 = 13.
The second moment of the third uniform distribution is: 3/4 + 4.52 = 21.
Second moment of the mixture is: (50%)(7) + (30%)(13) + (20%)(21) = 11.6.
Alternately, each of the process variances is: 32 /12 = 3/4.
Thus the expected value of the process variances is: 3/4 = 0.75.
The first moment of the hypothetical means is: (50%)(2.5) + (30%)(3.5) + (20%)(4.5) = 3.2.
The 2nd moment of the hypothetical means is: (50%)(2.52 ) + (30%)(3.52 ) + (20%)(4.2 5) = 10.85.
Thus the variance of the hypothetical means is: 10.85 - 3.22 = 0.61.
The variance of the mixture is: EPV + VHM = 0.75 + 0.61 = 1.36.
38.70. B. & 38.71. D. F(1) = 0. F(2) = 0.5/3 = 1/6. F(3) = F(2) + 0.8/3 = 13/30.
F(4) = F(3) + 1/3 = 23/30. F(5) = F(4) + 0.5/3 = 28/30. F(6) = F(5) + 0.2/3 = 30/30 = 1.
Since F(4) = 23/30 > 1/2 > 13/30 = F(3), the median is between 3 and 4.
Since the density is constant on the interval from 3 to 4, we can linear interpolate.
The median is: 3 + (1/2 - 13/30) / (23/30 - 13/30) = 3.2.
Since F(5) = 28/30 > 0.9 > 23/30 = F(4), the 90th percentile is between 4 and 5.
Since the density is constant on the interval from 4 to 5, we can linear interpolate.
The 90th percentile is: 4 + (0.9 - 23/30) / (28/30 - 23/30) = 4.8.
fʼ(x)S(x) - f(x)Sʼ(x) f'(x)S(x) + f(x)2

38.72. h(x) = f(x)/S(x). Thus, hʼ(x) = = .
S(x)2 S(x)2
Since the denominator is positive, the sign of hʼ(x) depends on the numerator.
Thus, we wish to show that: fʼ(x) S(x) + f(x)2 < 0.
S(x) = pe-x/µ + (1-p)e-x/θ. f(x) = pe-x/µ/µ + (1-p)e-x/θ/θ. fʼ(x) = -{pe-x/µ/µ2 + (1-p)e-x/θ/θ2}.
fʼ(x) S(x) + f(x)2 = -{pe-x/µ/µ2 + (1-p)e-x/θ/θ2} {pe-x/µ + (1-p)e-x/θ} + {pe-x/µ/µ + (1-p)e-x/θ/θ}2 =
p 2 e-2x/µ/µ2 + 2p(1-p)e-x/µe-x/θ/(µθ) + (1-p)2 e-2x/θ/θ2
-{p2 e-2x/µ/µ2 + p(1-p)e-x/µe-x/θ/µ2 + p(1-p)e-x/µe-x/θ/θ2 + (1-p)2 e-2x/θ/θ2} =
p(1-p)e-x/µe-x/θ {2/(µθ) - 1/µ2 - 1/θ2} = -p(1-p)e-x/µe-x/θ {µ2 - 2µθ + θ2} / (µθ)2 =
-p(1-p)e-x/µe-x/θ (µ -θ)2 / (µθ)2 < 0.

Comment: For a single Exponential Distribution, the hazard rate is constant; the mixture has a
decreasing hazard rate and thus a heavier righthand tail.
38.73. A. f(x) = (0.7)(2e-2x) + (0.3)(3e-3x), a 70%-30% mixture of two Exponentials with means
1/2 and 1/3. E[X] = (0.7)(1/2) + (0.3)(1/3) = 0.45 = 9/20.
38.74. E. A mixture of three Exponentials with means 100, 50, and 25, and equal weights.
For each Exponential, E[X] - E[X ∧ 50] = θe-50/θ, and S(50) = e-50/θ.
e(50) = (E[X] - E[X ∧ 50]) / S(50) = (100e-.5/3 + 50e-1/3 + 25e-2/3) / (e-.5/3 + e-1/3 + e-2/3) =
27.4768 / 0.369915 = 74.28.
38.75. E. Mean = (60 + 80)/2 = 70. Second Moment = {(200 + 602 ) + (300 + 802 )}/2 = 5250.
Variance = 5250 - 702 = 350.
Alternately, Expected Value of the Process Variance = (200 + 300)/2 = 250.
Variance of the Hypothetical Means = {(60 -70)2 + (80 - 70)2 }/2 = 100.
Total Variance = EPV + VHM = 250 + 100 = 350.
38.76. E. 1. True. 2. True. 3. True.

38.77. C. Let F be a mixed distribution: then F(x) = pA(x) + (1-p)B(x). The median is that x such
that F(x) = 0.5. Thus the median of the mixed distribution is the value of x such that:
0.5 = pA(x) + (1-p)B(x). In this case, p = 0.5, A is a Burr and B is a Pareto. Substituting into the
equation for the median: 0.5 = (0.5){1-(1000/(1000+x2 ))} + (0.5){1-1000/(1000+x)}.
Thus 1 = 1000/(1000+x2 ) + 1000/(1000+x). Thus x3 = 1,000,000. Thus x =100.
Comment: Check: for x = 100, (0.5){1-(1000/(1000+1002 ))} + (0.5){1-1000/(1000+100)} =
(0.5)(1-1000/11000) + (0.5)(1-1000/1100) = (0.5)(0.9090) + (0.5)(0.0909) =
0.4545 + 0.0455 = 0.5.
Note that the median of the Burr is 31.62, while the median of the Pareto is 1000. The (weighted)
average of the medians is: (0.5)(31.62) + (0.5)(1000) = 515.8, which is not equal to the median of
the mixed distribution.
38.78. C. The uniform distribution on [0,2] has density function:

0 x < 0, 1/2 0 < x < 2, 0 2<x
The uniform distribution on [1,3] has density function:
0 x < 1, 1/2 1 < x < 3, 0 3<x
X1 has a continuous distribution function and discontinuous density function. For example, let the
weights be 1/3 and 2/3. Then the density function is:
1/6 0 < x < 1, 1/2 1 < x < 2, 1/3 2<x<3
X2 has a continuous distribution function and continuous density function.
It has a “triangle density function”: (x-1)/4, 1 < x < 3, (5-x)/4, 3 < x < 5.
X3 has a discontinuous distribution function. At the censorship point of 1 the distribution jumps up
from Φ[(1-µ)/σ] to 1. Generally censorship leads to a jump discontinuity in the Distribution Function
at the censorship point, provided the survival function of the original distribution at the censorship
point is positive.
Comment: Convolution is discussed in “Mahlerʼs Guide to Aggregate Distributions.”
38.79. C. For mixed distributions the moments are weighted averages of the moments of the
individual distributions. For a Pareto the first moment is θ/(α-1), which in these cases are 100 and
500. For a Pareto the second moment is 2θ2 / {(α-1)(α-2)}, which in these cases are 30,000 and
1,000,000.
Thus the first moment of the mixed distribution is: (0.75)(100) + (0.25)(500) = 200.
The second moment of the mixed distribution is: (0.75)(30,000) + (0.25)(1,000,000) = 272,500.
The variance of the mixed distribution is: 272,500 - 2002 = 232,500.
Alternately, the variance of a Pareto is αθ2 / {(α-1)2 (α-2)}.
Thus the process variances are: Var[X | Liability] = 4(3002 ) / {(32 )(2)} = 20,000 and
Var[X | Property] = 3(10002 ) / {(22 )(1)} = 750,000.
Thus the Expected Value of the Process Variance = (.75)(20000) + (.25)(750000) = 202,500.
Since the mean of Pareto is θ/(α-1), the hypothetical means are:
E[X | Liability] = 300 / 3 = 100 and E[X | Property] = 1000 / 2 = 500.
The overall mean is: (.75)(100) + (.25)(500) = 200.
Thus the Variance of the Hypothetical Means is: (0.75)(100-200)2 + (0.25)(500-200)2 = 30,000.
Thus for the whole portfolio the Total Variance = EPV + VHM = 202,500 + 30,000 = 232,500.
A Priori Square of
Type of Chance of Hypothetical Second Process Hypothetical
Risk This Type Mean Moment Variance Mean
of Risk
Liability 0.750 100 30,000 20,000 10,000
Property 0.250 500 1,000,000 750,000 250,000
Overall 200 202,500 70,000
VHM = 70,000 - 2002 = 30,000. Total Variance = 202,500 + 30,000 = 232,500.
Comment: The variance of a mixed distribution is not the weighted average of the individual
variances. The statement that the risks have identical claim count distributions, allows one to weight
the two distributions with weights 75% and 25%. For example, if instead, liability risks had twice the
mean claim frequency of property risks, then (2)(0.75)/ {(2)(0.75) + (1)(0.25)} = 85.7% of the claims
would come from liability risks. Therefore, in that case one would instead weight the two distributions
together using weights of 85.7% and 14.3%, as follows.
A Priori Relative Chance of a Square of
Type of Chance of Claim Claim from Hypothetical
Second Process Hypothetical
Risk This Type Frequency this Type Mean MomentVariance Mean
of Risk of Risk
Liability 0.750 2 0.857 100 20,000 10,000
Property 0.250 1 0.143 500 750,000 250,000
Overall 157 124,286 44,286
VHM = 44286 - 1572 = 19,367. Total Variance = 124,286 + 19,367 = 143,923.
38.80. A. This question asks for the variance of individual claim amounts.
For a claim picked at random, it has a 2/(2+3) = 40% chance of coming from the first severity
distribution and a 60% chance of coming from the second severity distribution.
This is mathematically the same as a two point mixture.
The first distribution has a mean 1.4 and a second moment 2.2, while the second distribution has a
mean of 2.6 and a second moment of 7.4. The mixed distribution has a weighted average of the
moments; it has a mean of: (2/5)(1.4) + (3/5)(2.6) = 2.12,
and a second moment of: (2/5)(2.2) + (3/5)(7.4) = 5.32.
The variance of the mixed severity distribution is: 5.32 - 2.122 = 0.83.
Alternately, one computes the combined severity distribution, by weighting the individual
distribution together, using their mean frequencies of 2 and 3 as weights.
combined first second
x p1(x) p2(x) severity moment moment
1 0.6 0.1 0.30 0.30 0.30
2 0.4 0.3 0.34 0.68 1.36
3 0 0.5 0.30 0.90 2.70
4 0 0.1 0.06 0.24 0.96
2.12 5.32
The variance of the combined severity is: 5.32 - 2.122 = 0.83.
Comment: In “Mahlerʼs Guide to Aggregate Distributions”, the very similar Course 151 Sample
Exam #2, Q.4 asks instead for the variance of S. The variance of each compound Poisson is its
mean frequency times the second moment of its severity. Since the two compound Poissons are
independent, their variances add to get the variance of S.
38.81. D. Mean of the mixture is: (0.6)(1400) + (0.4)(2000) = 1640.

Second of the mixture is: (0.6)(40,000 + 14002 ) + (0.4)(250,000+ 20002 ) = 2,900,000.
Variance of the mixture is: 2,900,000 - 16402 = 210,400.
Alternately, Expected Value of the Process Variance = (0.6)(40,000) + (0.4)(250,000) = 124,000.
Variance of the Hypothetical Means = (0.6)(1400 - 1640)2 + (0.4)(2000 - 1640)2 = 86,400.
Total Variance = EPV + VHM = 124,000 + 86,400 = 210,400.
38.82. C. Since all payments take place 3 years from now and we used an interest rate of 4%,
present values are taken by dividing by 1.043 = 1.125.
The mean claim payment is (50%)(0) + (50%)(10) = 5.
The insurer's mean payment plus legal expense is: 5 + 5 = 10,
with present value: 10/1.125 = 8.89.
The payment of 5 in claims expense is fixed, so that it does not affect the variance.
The second moment of the LogNormal is: variance + mean2 = 202 + 102 = 500.
The claims payment is a 50%-50% mixture of zero and a LogNormal Distribution. Therefore its
second moment is a weighted average of the second moments: (.5)(0) + (.5)(500) = 250.
Thus the variance of the claim payments is: 250 - 52 = 225. The variance of the present value is:
225/1.1252 = 177.78.
Therefore, the expected present value of the claim and legal expenses plus 0.02 times the variance
of the present value is: 8.89 + (0.02)(177.78) = 12.45.
Comment: Since the time until payment and the interest rate are both fixed, the present values are
easy to take. The present value is gotten by dividing by 1.125, so the variance of the present value
is divided by 1.1252 . There is no interest rate risk or timing risk in this simplified example. We do not
make any use of the fact that the amount of payment specifically follows a LogNormal Distribution.
38.83. Overall mean is: (25%)(500) + (75%)(300) = 350.

Second moment for Type I is: 1002 + 5002 = 260,000.
Second moment for Type II is: 702 + 3002 = 94,900.
Overall second moment is: (25%)(260,000) + (75%)(94,900) = 136,175.
Overall variance is: 136,175 - 3502 = 13,675.
Overall standard deviation is: 13,675 = 116.9.
Alternately, E[Var | Type] = (25%)(1002 ) + (75%)(702 ) = 6175.
Var[Mean | Type] = (0.25)(500 - 350)2 + (0.75)(300 - 350)2 = 7500.
Overall variance is: E[Var | Type] + Var[Mean | Type] = 6175 + 7500 = 13,675.
Overall standard deviation is: 13,675 = 116.9.
38.84. B. Put into thousands, the density is: f(x) = 0.0005003 e-x/2000, 0 < x <15000.
15,000
Expected payment is: (0.02)(15000 - 1000) + (0.04) ∫ (x - 1000) 0.0005003 e- x / 2000 dx =
1000
15,000 15,000
280 + 0.000020012 ∫ x e- x / 2000 dx - 0.020012
∫ e- x / 2000 dx =
1000 1000
x = 15,000
280 + 0.000020012 (-2000x e- x / 2000 2
- 2000 e - x / 2000 ) ]
x = 1000
x = 15,000
- (0.020012) (2000 e - x / 2000 )] = 280 + 23.9 + 48.5 - 24.3 = 328.
x = 1000
38.85. C. For the Exponential Distribution E[X ∧ x] = θ(1 - e-x/θ).

The given distribution is a 80%-20% mixture of two Exponentials, with means 50 and 1000.
F(x) = 0.8(1 - e-x/50) + 0.2(1 - e-x/1000). Therefore, the limited expected value at 1000 is a
weighted average of the LEVs for the individual Exponentials.
E[X ∧ 1000] = (0.8){(50)(1 - e-1000/50)} + (0.2){(1000)(1 - e-1000/1000)} = 166.4.
Alternately, E[X ∧ 1000] =
1000 1000
∫0 S(x) dx =
∫0 0.8e - 0.02x + 0.2e- 0.001x dx = 40(1 - e-20) + (200)(1 - e-1) = 166.4.
38.86. C. E[X] = (m1 + m2 )/2. The second moment of a mixture is the mixture of the second
moments: E[X2 ] = (2m1 2 + 2m2 2 )/2 = m1 2 + m2 2 .
1 + CV2 = E[X2 ]/E[X]2 = 4(m1 2 + m2 2 ) / (m1 + m2 )2 = 4{1 - 2m1 m2 / (m1 + m2 )2 } ≤ 4.
⇒ C V2 ≤ 3. ⇒ CV ≤ 3 .
Comment: The CV is largest when m1 and m2 are significantly different. If m1 = m2 , then the CV is:
(4)(1 - 2m2 /(2m)2 ) - 1 = 1; we would have a single Exponential Distribution with CV = 1.
If we let r = m2 /m1 , then CV2 = 4(m1 2 + m2 2 )/(m1 + m2 )2 - 1 = 4(1 + r2 )/(1 + r)2 - 1.
This is maximized as either r→0 or r→∞ and CV2 → 3 or CV→ 3 .

38.87. D. The future lifetime of smokers is Exponential with mean: 1/.2 = 5.

The future lifetime of non-smokers is Exponential: F(t) = 1 - e-0.1t.
The future lifetime for an individual selected at random is a mixed Exponential:
F(t) = (0.3)(1 - e-0.2t) + (0.7)(1 - e-0.1t) = 1 - 0.3e-0.2t - 0.7e-0.1t.
Want 0.75 = F(t) = 1 - 0.3e-.02t - 0.7e-0.1t.
Let y = e-0.1t. ⇒ 0.3y2 + 0.7y - 0.25 = 0. ⇒ y = 0.315. ⇒ t = 11.56.
Comment: Constant force of mortality λ ⇔ Exponential Distribution with mean 1/λ.
38.88. C. The mean of the Weibull is: 30 Γ[1 + 1/(1/2)] = 30 Γ(3) = (30)(2!) = 60.
Thus the expected cost of a claim is: (40%)(0) + (60%)(60) = 36. #1 is false.
The survival function at $60 is: (60%)(Survival Function at 60 for the Weibull) =
(60%)exp[-(60/30)1/2] = 0.1459. #2 is false.
The density at 60 is: (40%)(0) + (60%) {exp[-(60/30)1/2] (1/2)(60/30)1/2/60} = 0.001719.
h(60) = 0.001719/0.1459 = 0.0118. # 3 is true.
Comment: For the Weibull, h(x) = τxτ−1/θτ. h(60) = (1/2)(60-1/2)/301/2 = 0.0118.
For the mixture, h(60) = {(60%)(f(60) of the Weibull)} / {(60%)(S(60) of the Weibull)} =
hazard rate at 60 for the Weibull. Since in this case, one of the components of the mixture is zero, it
does not affect the hazard rate. For example, if one mixed a Weibull and a Pareto, they would each
affect the numerator and denominator of the hazard rate of the mixture.
38.89. B. S(750) = (30%){1/(1 + (750/400)2 )} + (70%){1/(1 + (750/600)2 )} = 0.3396.

Comment: The Survival Function of the mixture is the mixture of the Survival Functions.
For the loglogistic distribution, F(x) = (x/θ)γ / {1 + (x/θ)γ}. S(x) = 1/{1 + (x/θ)γ }.
38.90. A. For the first Pareto, F(200) = 1 - {100/(100 + 200)}2 = 0.8889.

For the second Pareto, F(200) = 1 - {3000/(3000 + 200)}4 = 0.2275.
Pr(X ≤ 200) = (0.8)(0.8889) + (0.2)(0.2275) = 0.757.
38.91. B. F(6000) = (0.6)Φ[(6000 - 5000)/ 1,000,000 ] + (0.4)Φ[(6000 - 4000)/ 1,000,000 ] =

(0.6)Φ[1] + (0.4)Φ[2] = (0.6)(0.8413) + (0.4)(0.9772) = 0.8957.
S(6000) = 1 - 0.8957 = 10.43%.
Comment: A two-point mixture of Normal Distributions.
38.92. A. Nonsmokers have an Exponential Survival function beyond age 30:

S(x)/S(30) = exp[-0.08(x - 30)]. Similarly, for smokers, S(x)/S(30) = exp[-0.16(x - 30)].
Since we have a 50-50 mixture starting at age 30:
S(x)/S(30) = 0.5 exp[-0.08(x - 30)] + 0.5 exp[-.16(x - 30)].
S(80)/S(30) = 0.5 exp[-(0.08)(50)] + 0.5 exp[-(.16)(50)] = 0.009326.
S(81)/S(30) = 0.5 exp[-(0.08)(51)] + 0.5 exp[-(.16)(51)] = 0.008597.
p 80 = S(81)/S(80) = {S(81)/S(30)} / {S(80)/S(30)} = 0.008597/0.009326 = 0.9218.
q80 = 1 - 0.9218 = 0.0782.
Alternately, assume that there are for example originally 2,000,000 individuals, 1,000,000
nonsmokers and 1,000,000 smokers, alive at age 30.
Then the expected nonsmokers alive at age 80 is: 1,000,000 exp[-(0.08)(80 - 30)] = 18,316.
The expected nonsmokers alive at age 81 is: 1,000,000 exp[-(0.08)(81 - 30)] = 16,907.
The expected smokers alive at age 80 is: 1,000,000 exp[-(0.16)(80 - 30)] = 335.
The expected smokers alive at age 81 is: 1,000,000 exp[-(0.16)(81 - 30)] = 286.
In total we expect 18316 + 335 = 18,651 alive at age 80 and 16907 + 286 = 17913 alive at age
81. Therefore, q80 = (18,651 - 17,193)/18,651 = 0.0782.
Comment: Since the expected number of smokers alive by age 80 is so small, q80 is close to that
for nonsmokers: 1 - exp[-(.08)(51)]exp[-(.08)(50)] = 1 - e-.08 = 0.0769.
38.93. A. For the first LogNormal F(15000) = Φ[(ln(15000) - 7)/2] = Φ[1.31] = 0.9049.
The second LogNormal has µ = 7 + ln2 = 7.693 and σ = 2, and
F(15000) = Φ[(ln(15000) - 7.693)/2] = Φ[0.96] = 0.8315.
For the mixed distribution: S(15000) = 1 - {(0.75)(.9049) + (0.25)(.8315)} = 0.113.
Alternately, Prob[accident ≤ 15000] = Prob[standard stay ≤ 15000/2 = 7500] = Φ[(ln(7500) - 7)/2] =
Φ[0.96] = 0.8315. Proceed as before.
38.94. (i) (a) Let J be the amount paid back to the investor in the junior loan.
If the asset does not sustain a loss then they get paid the 50 plus 8% interest or 54.
If a loss is sustained they get no interest.
If the loss is more than 50 they get paid nothing.
If the loss is less than 50, they get paid 50 minus the loss.
Let U(0, 50) be uniform from 0 to 50.
⎧ 54 with probability 75% ⎧ 54 with probability 75%
⎪ ⎪
J= ⎨ 0 with probability 12.5% = ⎨ 0 with probability 12.5% .
⎪50 - U(0, 50) with probability 12.5% ⎪U(0, 50) with probability 12.5%
⎩ ⎩
J is a mixture.
E[J] = (0.75)(54) + (0.125)(0) + (0.125)(25) = 43.625.
E[J2 ] = (0.75)(542 ) + (0.125)(02 ) + (0.125)(502 /12 + 252 ) = 2291.
Var[J] = 2291 - 43.6252 = 388.
(b) Let S be the amount paid back to the investor in the senior loan.
If the asset does not sustain a loss then they get paid the 50 plus 6% interest or 53.
If a loss is sustained they get no interest.
If the loss is less than 50, they get paid 50.
If the loss L is more than 50 they get paid: 50 - (L - 50) = 100 - L.
⎧ 53 with probability 75% ⎧ 53 with probability 75%
⎪ ⎪
S=⎨ 50 with probability 12.5% = ⎨ 50 with probability 12.5% .
⎪100 - U(50, 100) with probability 12.5% ⎪U(0, 50) with probability 12.5%
⎩ ⎩
S is a mixture.
E[S] = (0.75)(53) + (0.125)(50) + (0.125)(25) = 49.125.
E[S2 ] = (0.75)(532 ) + (0.125)(502 ) + (0.125)(502 /12 + 252 ) = 2523.
Var[S] = 2523 - 49.1252 = 110.
(ii) Prob(J < 50) = 0.25.
Prob(S < 50) = 0.125.
Comment: The variance of a uniform distribution from 0 to 50 is 502 /12.
Thus the second moment of a uniform distribution from 0 to 50 is: 502 /12 + 252 .
38.95. (a) E[R] = (0.8)(0) + (0.2)(30%) = 6%.

E[R2 ] = (0.8)(0.12 + 02 ) + (0.2)(0.12 + 0.32 ) = 0.028.
Var[R] = 0.028 - 0.062 = 0.0244.
(b) Prob[R < 0] = 0.8 Φ[(0 - 0)/0.1] + 0.2 Φ[(0 - 0.3)/0.1] = (0.8) Φ[0] + (0.2) Φ[-3] =
(0.8)(0.5) + (0.2)(0.0013) = 0.40026.
Prob[S < 0] = Φ[(0 - 0.06)/ 0.0244 ] = Φ[-0.38] = 0.3520.
Prob[R < -0.1] = 0.8 Φ[(-0.1 - 0)/0.1] + 0.2 Φ[(-0.1 - 0.3)/0.1] = (0.8) Φ[-1] + (0.2) Φ[-4] =
(0.8)(0.1587) + (0.2)(0) = 0.12696.
Prob[S < -0.1] = Φ[(-0.1 - 0.06)/ 0.0244 ] = Φ[-1.02] = 0.1539.
Comment: Even though R and S have the same mean and variance they have different
probabilities in the lefthand tail. F(0) is greater for R than S, while F(-0.1) is greater for S than R.
2016-C-2, Loss Distributions, §39 Continuous Mixtures HCM 10/21/15, Page 863
Section 39, Continuous Mixtures of Models319
Discrete mixtures can be extended to a continuous case such as the Inverse Gamma - Exponential
situation, to be discussed below. Instead of an n-point mixture, one can take a continuous mixture of
severity distributions.
Continuous mixtures can be performed of either frequency distributions320 or loss distributions.
For example assume that each individual's future lifetime is exponentially distributed with mean 1/λ,
and over the population, λ is uniformly distributed over (0.05, 0.15).
u(λ) = 1/0.1 = 10, 0.05 ≤ λ ≤ 0.15.
Then the probability that a person picked at random lives more than 20 years is:
0.15
S(20) = ∫ S(20; λ) u(λ) dλ = 0.05
∫ e- 20λ (1/ 0.10) dλ = (10/20)(e-1 - e-3) = 15.9%.
The density at 20 of this mixture distribution is:

0.15 λ = 0.15
∫ f(20; λ) u(λ) dλ = 0.05

∫ λ e- 20λ - 20λ
10 dλ = (10)(-λe / 20 - e- 20λ / 400)]
λ = 0.05
= 0.0134.
In general, one takes a mixture of the density functions for specific values of the parameter ζ:
g(x) = ∫ f(x; ζ) π(ζ) dζ .

via some mixing distribution π(ζ).
For example, in the case where the severity is Exponential and the mixing distribution of their means
is Inverse Gamma, we get the Inverse Gamma - Exponential process.
319
320
See the sections on Mixed Frequency Distributions and the Gamma-Poisson in “Mahlerʼs Guide to Frequency
Distributions”.
Inverse Gamma-Exponential:
The sizes of loss for a particular policyholder are assumed to be Exponential with mean δ. Given δ,
the distribution function of the size of loss is 1 - e−x/δ, while the density of the size of loss distribution
is (1/δ)e−x/δ. The mean of this Exponential is δ and its variance is δ2.

Note that I have used δ rather than θ, so as to not confuse the scale parameter of the Exponential
with that of the Inverse Gamma which is θ.
So for example, the density of a loss being of size 8 is (1/δ)e-8/δ. If δ = 2 this density is:
(1/2)e-4 = 0.009, while if δ = 20 this density is: (1/20)e-0.4 = 0.034.
Assume that the values of δ across a portfolio of policyholders are given by an Inverse Gamma
distribution with α = 6 and θ = 15, with probability density function:
θα e- θ / δ 94921.875 e- 15 / δ
π(δ) = α + 1 = , 0 < δ < ∞.321
δ Γ[α ] δ7
Note that this distribution has a mean of: θ / ( α-1) = 15 / (6-1) = 3.
If we have a policyholder and do not know itʼs expected mean severity, in order to get the density
of the next loss being of size 8, one would weight together the densities of having a loss of size 8
given δ, using the a priori probabilities of δ: π(δ) = 94921.875 e−15/δ / δ7, and integrating from zero
to infinity:
∞ ∞ - 8/ δ ∞ - 23/ δ
e- 8 / δ 94921.875 e- 15 / δ
∫
e e
g(8) = π(δ) dδ = ∫ dδ = 94921.875 ∫ dδ
δ 0 δ δ 7
0 δ8
0
= 94921.875 ( 6! ) / (237 ) = 0.0201.
Where we have used the fact that the density of the Inverse Gamma Distribution integrates to unity
∞
e- θ / x Γ[α ] (α − 1)!
over its support and therefore: ∫ α + 1 dx = α =
θ θα
.
0 x
321
The Inverse Gamma Distribution has density: f(x) = θα e−θ/x / {Γ(α) xα+1}.
In this case, the constant in front is: θα/ Γ(α) = 156 / Γ(6) = 11,390,625 / 120 = 94,921.875.
More generally, if the distribution of Exponential means δ is given by an Inverse Gamma distribution
θα e- θ / δ
π(δ) = α + 1 , and we compute the density of having a claim of size x by integrating from zero
δ Γ[α ]
∞ ∞ - x/δ
e- 8 / δ θα e - θ / δ θ α ∞ e- ( θ + x) / δ
∫
e
to infinity: g(x) = π(δ) dδ = ∫ α + 1 Γ[α] dδ = ∫ δα + 2 dδ
δ δ δ Γ[α ]
0 0 0
θα Γ[α + 1] α θα
= = .322
Γ[α ] (θ + x)α+ 1 ( )
θ + x α + 1
Thus the (prior) mixed distribution is in the form of the Pareto distribution. Note that the shape
parameter and scale parameter of the mixed Pareto distribution are the same as those of the
Inverse Gamma distribution. For the specific example: α = 6 and θ = 15. Thus the mixed Pareto has
g(x) = 6(156 )(15+x)-7. g(8) = 6(156 )(23)-7 = 0.0201, matching the previous result.
For the Inverse Gamma-Exponential the (prior) mixed distribution is always a Pareto,
with α = shape parameter of the (prior) Inverse Gamma
and θ = scale parameter of the (prior) Inverse Gamma.323
Note that for the particular case we get a mixed Pareto distribution with parameters of α = 6 and
θ = 15, which has a mean of 15/(6-1) = 3, which matches the result obtained above. Note that the
formula for the mean of an Inverse Gamma and a Pareto are both θ/(α−1).
Exercise: Each insured has an Exponential severity with mean δ. The values of δ are distributed via
an Inverse Gamma with parameters α = 2.3 and θ = 1200. An insured is picked at random.
What is the probability that its next claim will be greater than 1000?
[Solution: The mixed distribution is a Pareto with parameters α = 2.3 and θ = 1200.
⎛ θ ⎞α ⎛ 1200 ⎞ 2.3
S(1000) = ⎜ ⎟ =⎜ ⎟ = 24.8%.]
⎝θ + x ⎠ ⎝ 1000 +1200 ⎠
322
Both the Exponential and the Inverse Gamma have terms involving powers of e−1/δ and 1/δ.
323
See Example 5.4 in Loss Models. See also 4B, 11/93, Q.26.
Hazard Rates of Exponentials Distributed via a Gamma:324
If the hazard rate of the Exponential, λ, is distributed via a Gamma(α, θ), then the mean 1/λ is
distributed via an Inverse Gamma(α, 1/θ), and therefore the mixed distribution is Pareto.
If the Gamma has parameters α and θ, then the mixed Pareto has parameters α and 1/θ.
324
See for example, SOA M, 11/05, Q.17.
Relationship of Inverse Gamma-Exponential to the Gamma-Poisson:
If δ, the mean of each Exponential, follows an Inverse Gamma Distribution with parameters α and θ,
F(δ) = 1 - Γ[α, θ/δ].
If λ = 1/δ, then F(λ) = Γ[α, θλ], and λ follows a Gamma with parameters α and 1/θ.
This is mathematically the same as Exponential interarrival times each with mean 1/λ, or a Poisson
Process with intensity λ.
Prob[X > x] ⇔ Prob[Waiting time to 1st claim > x] = Prob[no claims by time x].
From time 0 to x we have a Poisson Frequency with mean xλ. xλ has a Gamma Distribution with
parameters α and x/θ. This is mathematically a Gamma-Poisson, with mixed distribution that is
Negative Binomial with r = α and β = x/θ.
Prob[X > x] ⇔ Prob[no claims by time x] = f(0) = 1/(1 + x/θ)α = θα/(θ + x)α.
This is the survival function at x of a Pareto Distribution, with parameters α and θ, as obtained
previously.
Exercise: Each insured has an Exponential severity with mean δ.

The values of δ are distributed via an Inverse Gamma with parameters α = 2.3 and θ = 1200.
An insured is picked at random.
What is the probability that the sum of his next 3 claims will be greater than 6000?
[Solution: Prob[sum of 3 claims > 6000] ⇔ Prob[Waiting time to 3rd claim > 6000] =
Prob[at most 2 claims by time 6000].
The mixed distribution is Negative Binomial with r = α = 2.3, and β = x/θ = 6000/1200 = 5.
Prob[at most 2 claims by time 6000] = f(0) + f(1) + f(2)
= 1/62.3 + (2.3)5/63.3 + {(2.3)(3.3)/2}52 /64.3 = 9.01%.
Alternately, the sum of 3 independent Exponential Claims is a Gamma with α = 3 and θ = δ.
As listed subsequently, the mixture of a Gamma by an Inverse Gamma is a Generalized Pareto
Distribution with parameters, α = 2.3, θ = 1200, and τ = 3. F(x) = β[τ, α; x/(x + θ)].
S(6000) = 1 - β[3, 2.3; 6000/(6000 + 1200)] = 1 - β[3, 2.3; 1/1.2] = β[2.3, 3, 1 - 1/1.2] =
β[2.3, 3, 1/6]. Using a computer, β[2.3, 3, 1/6] = 9.01%.
Comment: As shown in “Mahlerʼs Guide to Frequency Distributions,” the distribution function of
a Negative Binomial is: F(x) = β[r, x+1; 1/(1+β)]. In this case, F(2) = β[2.3, 3; 1/6]. Thus, one
can compute β[a, b; x], for b integer, as a sum of Negative Binomial densities.]
Moments of Mixed Distributions:
The nth moment of a mixed distribution is the mixture of the nth moments for specific values of the
parameter ζ:
E[Xn ] = Eζ[E[Xn | ζ]]
Exercise: What is the mean for a mixture of Exponentials, mixed on the mean δ?
[Solution: For a given value of δ, the mean of a Exponential Distribution is δ. We need to weight
these first moments together via the density of delta, π(δ):
∫ δ π(δ) dδ = mean of π(δ), the distribution of δ.]

Thus the mean of a mixture of Exponentials is the mean of the mixing distribution. This result will hold
whenever the parameter being mixed is the mean, as it was in the case of the Exponential.
For the case of a mixture of Exponentials via an Inverse Gamma Distribution with parameters α and
θ, the mean of the mixed distribution is that of the Inverse Gamma, θ/(α-1).
Exercise: What is the Second Moment of Exponentials, mixed on the mean δ?
[Solution: For a given value of δ, the second moment of an Exponential Distribution is 2δ2.
We need to weight these second moments together via the density of delta, π(δ):
∫ 2 δ2 π(δ) dδ = 2(second moment of π(δ), the distribution of δ).]

Exercise: What is the variance of Exponentials mixed on the mean δ via an Inverse Gamma
Distribution, as per Loss Models, with parameters α and θ?
[Solution: The second moment of the mixed distribution is:
θ2
2(second moment of the Inverse Gamma) = 2 .
(α − 1) (α − 2)
The mean of the mixed distribution is the mean of the Inverse Gamma: θ/ (α-1).
θ2 ⎛ θ ⎞2 α θ2
Thus the variance of the mixed distribution is: 2 -⎜ ⎟ = .
(α − 1) (α − 2) ⎝α − 1⎠ (α − 1)2 (α − 2)
Comment: The mixed distribution is a Pareto and this is indeed its variance.]
Normal-Normal:
The sizes of claims a particular policyholder makes is assumed to be Normal with mean m and
known fixed variance s2 . 325
x-m
Given m, the distribution function of the size of loss is: Φ[ ],
s
(x - m)2
exp[- ]
x-m 2s2 .
while the density of the size of loss distribution is: φ[ ]=
s s 2π
(8 - m)2
exp[- ]
18
So for example if s = 3, then the probability density of claim being of size 8 is: .
3 2π
exp[-2] exp[-8]
If m = 2 this density is: = 0.018, while if m = 20 this density is: = 0.000045.
3 2π 3 2π
Assume that the values of m are given by another Normal Distribution with mean 7 and standard
deviation of 2, with probability density function:326
(m - 7)2
exp[- ]
8
π(m) = , -∞ < m < ∞.
2 2π
Note that 7, the mean of this distribution, is the a priori mean claim severity.
325
Note Iʼve used roman letter for parameters of the Normal likelihood, in order to distinguish from those of the
Normal distribution of parameters discussed below.
326
There is a very small but positive chance that the mean severity will be negative.
Below is displayed this distribution of hypothetical mean severities:327
Density
0.20
0.15
0.10
0.05
Mean Severity
2 4 6 8 10 12 14
If we have a risk and do not know what type it is, in order to get the chance of the next claim being of
size 8, one would weight together the chances of having a claim of size 8 given m:
(8 - m)2
exp[- ]
18
, using the a priori probabilities of m:
3 2π
(m - 7)2
exp[- ]
8
π(m) = , and integrating from minus infinity to infinity:
2 2π
∞ (8 - m)2 ∞ (8 - m)2 (m - 7 ) 2
exp[- ] exp[- ] exp[- ]
∫ ∫
18 18 8
π(m) dm = dm =
3 2π 3 2π 2 2π
-∞ -∞
∞ (8 - m)2 (m - 7)2 ∞
exp[-{ + }]
exp[- {13m2 - 190m + 697} / 72 ]
∫ ∫
1 18 8 1
dm = dm =
6 2π 2π 6 2π 2π
-∞ -∞
∞
exp[- {m2 - (190 / 13)m + (95 / 13)2 + 697 / 13 - (95 / 13) 2} / (72 / 13) ]
∫
1
dm =
6 2π 2π
-∞
327
Note that there is a small probability that a hypothetical mean is negative. When this situation is discussed further
in “Mahlerʼs Guide to Conjugate Priors,” this will be called the prior distribution of hypothetical mean severities.
∞
exp[(-36 / 132 )/ (72 / 13)] exp[- (m - 95 / 13)2 / {(2) (6 / 13)2 } ]
6 2π ∫ 2π
dm =
-∞
exp[-1/ 26] exp[-1/ 26]

(6/ 13 ) = = 0.1065.
6 2π 13 2π
Where we have used the fact that a Normal Density integrates to unity:328
∞
exp[- (m - 95 / 13)2 / {(2) (6 / 13)2 } ]
∫ 2π
dm = 6/ 13 .
-∞
More generally, for the Normal-Normal, the mixed distribution is another Normal, with
mean equal to that of the Normal distribution of parameters, and variance equal to the
sum of the variances of the Normal distribution of parameters and the Normal
likelihood.329
For the specific case dealt with previously: s = 3, µ = 7, and σ = 2, the mixed distribution has a
Normal Distribution with a mean of 7 and variance of: 32 + 22 = 13.
(x - 7)2
exp[- ]
26
Thus the chance of having a claim of size x is: .
13 2π
exp[-1/ 26]
For x = 8 this chance is: = 0.1065.
13 2 π
This is the same result as calculated above.
328
With mean of 95/13 and standard deviation of 6/ 13 .
329
The Expected Value of the Process Variance is the variance of the Normal Likelihood, the Variance of the
Hypothetical Means is the variance of the Normal distribution of parameters, and the total variance is the variance of
the mixed distribution. Thus this relationship follows from the general fact that the total variance is the sum of the
EPV and VHM. See “Mahlerʼs Guide to Buhlmann Credibility.”
Derivation of the Mixed Distribution for the Normal-Normal:
The sizes of loss for a particular policyholder is assumed to be Normal with mean m and known fixed
variance s2 .
(x - m)2
exp[- ]
x-m 2s2
Given m, the density of the size of loss distribution is: φ[ ]= .
s s 2π
The distribution of hypothetical means m is given by another Normal Distribution with mean µ and
(m - µ)2
exp[- ]
2σ2
variance σ2 : π(m) = , -∞ < m < ∞.
σ 2π
We compute the mixed density at x, the chance of having a claim of size x, by integrating from
minus infinity to infinity:
∞ exp[- (x - m) ] exp[- (m - µ) ]
2 2 2
∞ exp[- (x - m) ]
∫ ∫
2s2 2s2 2σ 2
π(m) dm = dm =
s 2π s 2π σ 2π
-∞ -∞
∞ (x - m)2 (m - µ)2
exp[-{ + }]
∫
1 2s2 2σ2
dm =
s σ 2π 2π
-∞
∞ (s2 + σ 2 )m2 - (xσ 2 + µs2 )2m + x2 σ2 + µ 2s2

exp[- ]
2s2 σ2
∫
1
dm =
s σ 2π 2π
-∞
s2 σ 2 xσ2 + µs2 x2σ 2 + µ2 s2

Let ξ2 = 2 , ν = , and δ = .
s + σ2 s2 + σ 2 s2 + σ 2
Then the above integral is equal to:
∞ m2 - 2νm + δ ∞ m2 - 2νm + ν2 - ν2 + δ
exp[- ] exp[- ]
2ξ2 2ξ 2
∫ ∫
1 1
dm = dm =
s σ 2π 2π s σ 2π 2π
-∞ -∞
∞ (m - ν)2
exp[- ]
ν2 - δ 2ξ2 ν2 - δ
∫
1 1
exp[- ] dm = exp[- ]ξ=
s σ 2π 2ξ 2 2π s σ 2π 2ξ 2
-∞
ν2 - δ
exp[- ]
2ξ2
.
s2 + σ 2 2π
Where we have used the fact that a Normal Density integrates to unity:330
∞ (m - ν)2
exp[- ]
2ξ2
∫ ξ 2π
dm = 1.
-∞
x2σ 4 + 2xµσ2 s2 + µ 2 s4 - {x2s2 σ 2 + x2 σ4 + µ 2s4 + µ 2 σ2 s2}

Note that ν2 - δ = =
(s2 + σ 2)2
2xµσ2 s2 - x 2s2 σ 2 - µ 2 σ 2s2 (x - µ)2 σ2 s2

= - .
(s2 + σ 2 )2 (s2 + σ2 )2
ν2 - δ (x - µ)2 σ2 s2 s2 + σ 2 (x - µ)2
Thus, = - = - .
ξ2 (s2 + σ2 )2 s2 σ 2 s2 + σ 2
Thus the mixed distribution can be put back in terms of x, s, µ, and σ:
ν2 - δ (x - µ)2
exp[- ] exp[- ]
2ξ2 2(s2 + σ 2)
= .
s2 + σ 2 2π s2 + σ 2 2π
This is a Normal Distribution with mean µ and variance s2 + σ2 .
Thus if the likelihood is a Normal Distribution with variance s2 (fixed and known), and the distribution
of the hypothetical means of the likelihood is also a Normal, but with mean µ and variance σ2, then
the mixed distribution is yet a third Normal Distribution with mean µ and variance s2 + σ2 . The mean
of the likelihood is what is varying among the insureds in the portfolio. Therefore, the mean of the
mixed distribution is equal to that of the prior distribution, in this case µ.
330
With mean of ν and standard deviation of ξ.
Other Mixtures:
There are many other examples of continuous mixtures of severity distributions. Here are some
examples.331 In each case the scale parameter is being mixed, with the other parameters in the
severity distribution held fixed.
Severity Mixing Distribution Mixed Distribution
Exponential Inverse Gamma: α, θ Pareto: α, θ
Inverse Gamma: α, θ Inverse

Exponential Pareto: τ = α, θ
Weibull, τ = t Inverse Transformed Burr: α, θ, γ = t

Gamma: α, θ, τ = t
Inverse Transformed Inverse Burr: τ = α, θ, γ = t

Weibull, τ = t Gamma: α, θ, τ = t
Gamma, α = a Inverse Gamma: α, θ Generalized Pareto: α, θ, τ = a
Inverse Gamma, α = a Exponential: θ Pareto: α = a, θ
Inverse Generalized
Gamma, α = a Gamma: α, θ Pareto: α = a, θ, τ = α
Transformed Inverse Transformed Transformed Beta:

Gamma, α = a, τ = t Gamma: α, θ, τ = t α, θ, γ = t, τ = a
Inverse Transformed Transformed Transformed Beta:

Gamma, α = a, τ = t Gamma: α, θ, τ = t α = a, θ, γ = t, τ = α
331
See the problems for illustrations of some of these additional examples.
Example 5.6 in Loss Models shows that mixing an Inverse Weibull via a Transformed Gamma gives an Inverse Burr.
For example, assume that the amount of an individual claim has an Inverse Gamma distribution with
shape parameter α fixed and scale parameter q (rather than θ to avoid later confusion.)
The parameter q is distributed via an Exponential Distribution with mean µ.
For the Inverse Gamma, f(x | q) = qα e-q/x / {Γ[α] xα+1}. For the Exponential, π(q) = e-q/µ / µ.
∞ ∞
qα e- q / x e-q /µ
f(x) = ∫ f(x | q) π(q) dq = ∫ Γ[α] xα+ 1 µ
0
dq = ∫ qα e- q(1/ x + 1/ µ) dq / {µ Γ[α] xα+1}
0
= {Γ[α+1]/(1/x + 1/µ)α+1} / ({µ Γ[α] xα+1}) = α µα/(x + µ)α+1.

This is the density of a Pareto Distribution with parameters α and θ = µ.
This is an example of an Exponential-Inverse Gamma, an Inverse Gamma Severity with shape
parameter α, with its scale parameter mixed via an Exponential.332
The mixture is a Pareto Distribution, with shape parameter equal to that of the Inverse Gamma
severity, and scale parameter equal to the mean of the Exponential mixing distribution.
Exercise: The severity for each insured is an Inverse Gamma Distribution with parameters
α = 3 and q. Over the portfolio, q varies via an Exponential Distribution with mean 500.
What is the severity distribution for the portfolio as a whole?
[Solution: The mixed distribution is a Pareto Distribution with parameters α = 3, θ = 500.]
Exercise: In the previous exercise, what is the probability that a claim picked at random will be
greater than 400?
[Solution: S(400) = {500/(400 + 500)}3 = 17.1%.]
Exercise: In the previous exercise, what is the expected size of a claim picked at random?
[Solution: Mean of a Pareto with α = 3 and θ = 500 is: 500/(3 - 1) = 250.
Alternately, the mean of each Inverse Gamma is: E[X | q] = q/(3 - 1) = q/2.
E[X] = Eq [E[X | q]] = Eq [q/2] = Eq [q]/2 = (mean of the Exponential Dist.)/2 = 500/ 2 = 250.]
Exercise: The severity for each insured is a Transformed Gamma Distribution with parameters
α = 3.9, q, and τ = 5. Over the portfolio, q varies via an Inverse Transformed Gamma
Distribution with parameters α = 2.4, θ = 17, and τ = 5.
What is the severity distribution for the portfolio as a whole?
[Solution: Using the above chart, the mixed distribution is a Transformed Beta Distribution with
parameters α = 2.4, θ = 17, γ = 5, and τ = 3.9.]
332
This differs from the more common Inverse Gamma-Exponential discussed previously, in which we have an
Exponential severity, whose mean is mixed via the Inverse Gamma.
Frailty Models:333
Frailty models are most commonly applied in the context of survival models, but mathematically
they are just examples of continuous mixtures. They involve a particular form of the hazard rate.
∫0
Recall that the hazard rate, h(x) = f(x) / S(x). Also S(x) = exp[-H(x)], where H(x) = h(t) dt .
Assume h(x | λ) = λ a(x), where λ is a parameter which varies across the portfolio.334
a(x) is some function of x, and let A(x) = ∫ a(x) dx .

Then H(x | λ) = λ A(x).
S(x | λ) = exp[-λ A(x)].
S(x) = Eλ[S(x | λ)] = Eλ[exp[-λ A(x)]] = Mλ[-A(x)],
where Mλ is the moment generating function of the distribution of λ.335 336
For an Exponential Distribution, the hazard rate is constant and equal to one over the mean.
Thus if each individual has an Exponential Distribution, a(x) = 1, and λ = 1/θ.
A(x) = x, and S(x) = Mλ[-x].
We have already discussed mixtures like this. For example, λ could be distributed uniformly from
0 to 2.337 In that case, the general mathematical structure does not help very much.
However, let us assume each individual is Exponential and that λ is Gamma Distributed with
parameters α and β. The Gamma has moment generating function M(t) = (1 - βt)−α.338
Therefore, S(x) = (1 + βx)−α. This is a Pareto Distribution with parameters α and θ = 1/β.339
333
Section 5.2.5 in Loss Models.
334
An individual with lambda larger than average would have a higher than average hazard rate. Lambda is sometimes
called the frailty random variable, while the distribution of lambda is called the frailty distribution.
335
The definition of the moment generating function is My(t) = Ey[exp[yt]].
336
The survival function of the mixture is the mixture of the survival functions.
337
See 3, 5/01, Q.28.
338
339
For the Pareto, S(x) = (1 + x/θ)−α.
This is mathematically equivalent to the Inverse Gamma-Exponential discussed previously.

If λ is Gamma Distributed with parameters α and β, then the means of the Exponentials,
δ = 1/λ are distributed via an Inverse Gamma with parameters α and 1/β.
The mixed distribution is Pareto with parameters α and θ = 1/β.
Exercise: What is the hazard rate for a Weibull Distribution?

[Solution: h(x) = f(x)/S(x) = {τ(x/θ)τ exp(-(x/θ)τ) / x} / exp(-(x/θ)τ) = τ xτ−1 θ−τ.]
Therefore, we can put the Weibull for fixed τ into the form of a frailty model; h(x) = λ a(x),
by taking a(x) = τ xτ−1 and λ = θ−τ. Then, A(x) = xτ.
Therefore, if each insured is Weibull with fixed τ, with λ = θ−τ, then S(x) = Mλ[-A(x)] = Mλ[-xτ].
Exercise: Each insured has a Weibull Distribution with τ fixed. λ = θ−τ is Gamma distributed with
parameters α and β. What is the form of the mixed distribution?340
[Solution: The Gamma has moment generating function M(t) = (1 - βt)−α.341
Therefore, S(x) = (1 + βxτ)−α. This is a Burr Distribution with parameters α, θ = 1/β1/τ, and γ = τ.
Comment: The Burr Distribution has S(x) = (1 + (x /θ)γ)−α.]
If in this case α = 1, then λ has an Exponential Distribution, and the mixed distribution is a Loglogistic,
a special case of the Burr for α = 1.342 If instead τ = 1, then as discussed previously the mixed
distribution is a Pareto, a special case of the Burr.
In general for a frailty model, f(x | λ) = -dS(x | λ)/dx = λ a(x) exp[-λ A(x)].
Therefore, f(x) = Eλ[ λ a(x) exp[-λ A(x)]] = a(x) Eλ[ λ exp[-λ A(x)]] = a(x) Mλʼ[-A(x)], where Mλʼ is the
derivative of the moment generating function of the distribution of λ.343 344
340
See Example 5.7 in Loss Models. This result is mathematically equivalent to mixing the scale parameter of a
Weibull via an Inverse Transformed Gamma, resulting in a Burr, one of the examples listed previously.
341
342
343
Ey[y exp[yt]] = Myʼ(t). See “Mahlerʼs Guide to Aggregate Distributions.”
344
The density function of the mixture is the mixture of the density functions.
For example, in the previous exercise, M(t) = (1 - βt)−α, and Mʼ(t) = αβ (1 - βt)−(α+1).
f(x) = a(x) Mλʼ[-A(x)] = τ xτ−1 αβ (1 - βxτ)−(α+1).
The density of a Burr Distribution is: (αγ θ−γ xγ−1)(1 + (x /θ)γ)−(α + 1).
This is indeed the density of a Burr Distribution with parameters α, θ = 1/β1/τ, and γ = τ.
For a frailty model, h(x) = f(x)/S(x) = a(x) Mλʼ[-A(x)] / Mλ[-A(x)] = a(x) d ln Mλ[-A(x)] / dλ.345
Defining the cumulant generating function as ψX(t) = ln MX(t) = ln E[etx], then
h(x) = a(x) ψλʼ(-A(x)), where ψʼ is the derivative of the cumulant generating function.
For example in the previous exercise, M(t) = (1 - βt)−α, and ψ(t) = ln M(t) = -α ln (1 - βt).
ψʼ(t) = αβ /(1 - βt). h(x) = a(x) ψλʼ(-A(x)) = τ xτ−1 αβ /(1 + βxτ).
Exercise: What is the hazard rate for a Burr Distribution?

[Solution: h(x) = f(x)/S(x) = {(αγ θ−γ xγ−1)(1 + (x/θ)γ)−(α + 1)}/{1 + (x/θ)γ}−α =
(αγ θ−γ xγ−1)/(1 + (x/θ)γ).]
Thus the above h(x) is indeed the hazard rate of a Burr Distribution with parameters α,
θ = 1/β1/τ, and γ = τ.
345
The hazard rate of the mixture is not the mixture of the hazard rates.
Problems:

Assume that the size of claims for an individual insured is given by an Exponential Distribution:
f(x) = e−x/δ /δ, with mean δ and variance δ2.
Also assume that the parameter δ varies for the different insureds, with δ following an
θα e- θ / δ
Inverse Gamma distribution: g(δ) = α + 1 , for 0< δ < ∞.
δ Γ[α ]
39.1 (2 points) An insured is picked at random and is observed until it has a loss.
What is the probability density function at 400 for the size of this loss?
1 θα α θα θα α θα
A. B. C. D. E.
(θ + 400)α (θ + 400)α (θ + 400)α (θ + 400)α+ 1 (θ + 400)α+ 1
39.2 (2 points) What is the unconditional mean severity?

θ θ α - 1 α
A. B. C. D. E. None of A, B, C, or D
α - 1 α θ θ

θ2 2 α θ2 α θ2 α θ2 2 α θ2
A. B. C. D. E.
α - 1 α - 1 (α - 1) (α - 2) (α - 1)2 (α - 2) (α - 1)2 (α - 2)
39.4 (3 points) The severity distribution of each risk in a portfolio is given by a Weibull Distribution,
with parameters τ = 1/3 and θ, with θ varying over the portfolio via an Inverse Transformed Gamma
(72.5 / 3) exp[-7 / θ1/ 3 ]
Distribution: g(θ) = . What is the mixed distribution?
θ11/ 6 Γ(2.5)
A. Burr B. Generalized Pareto C. Inverse Burr D. LogLogistic E. ParaLogistic

• The amount of an individual claim, X, follows an exponential distribution function
with probability density function: f(x | λ) = λ e-λx, x, λ > 0.
• The parameter λ, follows a Gamma distribution with probability density function
π(λ) = (4/3) λ4 e-2λ, λ > 0.

Determine the unconditional probability that x > 7.
A. 0.00038 B. 0.00042 C. 0.00046 D. 0.00050 E. 0.00054
39.6 (2 points) Consider the following frailty model:

• h(x | λ) = λ a(x).
• a(x) = 4 x3 .
• Λ follows an Exponential Distribution with mean 0.007.
Determine S(6).
A. 7% B. 8% C. 9% D. 10% E. 11%
39.7 (3 points) The future lifetimes of a certain population consisting of 1000 people is modeled as
follows:
(i) Each individual's future lifetime is exponentially distributed with constant hazard rate λ.
(ii) Over the population, λ is uniformly distributed over (0.01, 0.11).
For this population, all of whom are alive at time 0, calculate the number of deaths expected
between times 3 and 5.
(A) 75 (B) 80 (C) 85 (D) 90 (E) 95

• The number of miles that an individual car is driven during a year is given by an
Exponential Distribution with mean µ.
• µ differs between cars.
• µ is distributed via an Inverse Gamma Distribution with parameters α = 3 and θ = 25,000.
What is the probability that a car chosen at random will be driven more than 20,000 miles during the
next year?
(A) 9% (B) 11% (C) 13% (D) 15% (E) 17%

• The IQs of actuaries are normally distributed with mean 135 and standard deviation 10.
• Each actuaryʼs score on an IQ test is normally distributed around his true IQ,
with standard deviation of 15.
What is the probability that Abbie the actuary scores between 145 and 155 on his IQ test?
A. 10% B. 12% C. 14% D. 16% E. 18%

• SX|Λ(x | λ) = e−λx.
• Λ follows a Gamma Distribution with α = 3 and θ = 0.01.

Determine S(250).
A. Less than 3%
E. At least 6%
39.11 (3 points) The severity distribution of each risk in a portfolio is given by an Inverse Weibull
Distribution, F(x) = exp[-(q/x)4 ], with q varying over the portfolio via a Transformed Gamma
Distribution with parameters α = 1.3, θ = 11, and τ = 4.
What is the probability that the next loss will be of size less than 10?
⎛ x⎞ τ
Hint: For a Transformed Gamma Distribution, f(x) =

[ ]
τ xτα -1 exp -⎜ ⎟
⎝ θ⎠
.
τα
θ Γ(α)
A. Less than 32%

E. At least 38%

• The amount of an individual loss in 2002, follows an exponential distribution
with mean $3000.
• Between 2002 and 2007, losses will be multiplied by an inflation factor.
• You are uncertain of what the inflation factor between 2002 and 2007 will be,
but you estimate that it will be a random draw from an Inverse Gamma Distribution with
parameters α = 4 and θ = 3.5.
Estimate the probability that a loss in 2007 exceeds $5500.
A. Less than 18%
E. At least 21%
Use the information on the following frailty model for the next two questions:
• Each insured has a survival function that is Exponential with hazard rate λ.
• The hazard rate varies across the portfolio via
an Inverse Gaussian Distribution with µ = 0.015 and θ = 0.005.
39.13 (3 points) Determine S(65).

A. 56% B. 58% C. 60% D. 62% E. 64%
39.14 (2 points) For the mixture what is the hazard rate at 40?
A. 0.0060 B. 0.0070 C. 0.0080 D. 0.0090 E. 0.0100

• The amount of an individual claim has an Inverse Gamma distribution with shape parameter α = 4
and scale parameter q (rather than θ to avoid later confusion.)
• The parameter q is distributed via an Exponential Distribution with mean 100.
What is the probability that a claim picked at random will be of size greater than 15?
A. Less than 50%
E. At least 65%

• X is a Normal Distribution with mean zero and variance v.
• v is distributed via an Inverse Gamma Distribution with α = 10 and θ = 10.
Determine the form of the mixed distribution.

• The amount of an individual claim has an exponential distribution given by:
p(y) = (1/δ) e-y/δ, y > 0, δ > 0
• The parameter δ has a probability density function given by:
f(δ)= (4000/δ4 ) e-20/δ, δ > 0

Determine the variance of the claim severity distribution.
A. 150 B. 200 C. 250 D. 300 E. 350

• h(x | λ) = λ a(x).
0.004
• a(x) = .
1 + 0.008x
• Λ follows a Gamma Distribution with α = 6 and θ = 1.

Determine S(11).
A. 70% B. 72% C. 74% D. 76% E. 78%

• The amount of an individual loss this year, follows an Exponential Distribution
with mean $8000.
• Between this year and next year, losses will be multiplied by an inflation factor.
• The inflation factor follows an Inverse Gamma Distribution with parameters
α = 2.5 and θ = 1.6.
Estimate the probability that a loss next year exceeds $10,000.
A. Less than 21%
E. At least 24%
39.20 (2 points) Severity is LogNormal with parameters µ and 0.3.

µ varies across the portfolio via a Normal Distribution with parameters 5 and 0.4.
What is probability that a loss chosen at random exceeds 200?
(A) 25% (B) 27% (C) 29% (D) 31% (E) 33%

For each class, sizes of loss are Exponential with mean µ.
Across a group of classes µ varies via an Inverse Gamma Distribution with parameters
α = 3 and θ = 1000.
39.21 (2 points) For a class picked at random, what is the expected value of the loss elimination
ratio at 500?
A. 50% B. 55% C. 60% D. 65% E. 70%
39.22 (6 points) What is the correlation across classes of the loss elimination ratio at 500 and the
loss elimination ratio at 200?
A. Less than 96%
E. At least 99%
39.23 (3 points) Severity is uniformly distributed from 0 to β.

β in turn is uniformly distributed from 10 to 50.
Determine the variance of the mixed distribution.
A. 2.2 B. 2.4 C. 2.6 D. 2.8 E. 3.0

• h(x | λ) = λ a(x).
• a(x) = 1.09x.
• Λ follows a Gamma Distribution with α = 4 and θ = 1/50,000.
Determine S(90).
A. 14% B. 16% C. 18% D. 20% E. 22%
39.25 (2 points) A randomly chosen transformer has a lifetime in hours that is normally distributed
with a mean of α and a standard deviation of β.
While β is fixed, α is normally distributed with a mean of 6000 hours and a standard deviation of σ.
You are given that the 90th percentile of the lifetime of a randomly chosen transformer is 9200 hours.
Find the probability that a randomly selected transformer has a lifetime of at most 5000 hours.
A. 29% B. 32% C. 35% D. 38% E. 41%
• The amount of an individual claim has an exponential distribution given by:
p(y) = (1/µ) e-y/µ, y > 0, µ > 0.
• The parameter µ has a probability density function given by: f(µ)= (400/µ3 )e-20/µ, µ > 0.
Determine the mean of the claim severity distribution.
A. 10 B. 20 C. 200 D. 2000 E. 4000
• The amount of an individual claim, Y, follows an exponential distribution function
with probability density function f(y | δ) = (1/ δ) e-y/δ, y, δ > 0.
• The conditional mean and variance of Y given δ are E[Y | δ] = δ and Var[Y | δ] = δ2.
• The mean claim amount, δ, follows an Inverse Gamma distribution with density function
p(δ) = 4e-2/δ / δ4, δ > 0.

Determine the unconditional density of Y at y = 3.
A. Less than 0.01
E. At least 0.08
39.28 (3, 5/00, Q.17) (2.5 points)

The future lifetimes of a certain population can be modeled as follows:
(i) Each individual's future lifetime is exponentially distributed with constant hazard rate θ.
(ii) Over the population, θ is uniformly distributed over (1, 11).
Calculate the probability of surviving to time 0.5, for an individual randomly selected at time 0.
(A) 0.05 (B) 0.06 (C) 0.09 (D) 0.11 (E) 0.12
39.29 (3, 5/01, Q.28) (2.5 points) For a population of individuals, you are given:
(i) Each individual has a constant force of mortality.
(ii) The forces of mortality are uniformly distributed over the interval (0, 2).
Calculate the probability that an individual drawn at random from this population dies
within one year.
(A) 0.37 (B) 0.43 (C) 0.50 (D) 0.57 (E) 0.63
39.30 (SOA M, 5/05, Q.10 & 2009 Sample Q.163) The scores on the final exam in Ms. Bʼs Latin
class have a normal distribution with mean θ and standard deviation equal to 8.
θ is a random variable with a normal distribution with mean equal to 75 and standard deviation equal
to 6.
Each year, Ms. B chooses a student at random and pays the student 1 times the studentʼs
score. However, if the student fails the exam (score ≤ 65), then there is no payment.
Calculate the conditional probability that the payment is less than 90, given that there is a
payment.
(A) 0.77 (B) 0.85 (C) 0.88 (D) 0.92 (E) 1.00
The length of time, in years, that a person will remember an actuarial statistic is modeled by an
exponential distribution with mean 1/Y.
In a certain population, Y has a gamma distribution with α = θ = 2.
Calculate the probability that a person drawn at random from this population will remember
an actuarial statistic less than 1/2 year.
(A) 0.125 (B) 0.250 (C) 0.500 (D) 0.750 (E) 0.875
39.32 (SOA M, 11/05, Q.20) (2.5 points) For a group of lives age x, you are given:
(i) Each member of the group has a constant force of mortality that is drawn from the
uniform distribution on [0.01, 0.02].
(ii) δ = 0.01.
For a member selected at random from this group, calculate the actuarial present value of a
continuous lifetime annuity of 1 per year.
(A) 40.0 (B) 40.5 (C) 41.1 (D) 41.7 (E) 42.3
39.1. E. The conditional probability of a loss of size 400 given δ is: e−400/δ /δ.
The unconditional probability can be obtained by integrating the conditional probabilities versus the
distribution of δ:
∞ ∞
f(400) = ∫0 f(400 | δ) g(δ) dδ = ∫0 {e- 400 / δ / δ } θα δ - (α + 1) e- θ / δ / Γ(α) dδ =
∞
{θα/Γ(α)} ∫0 δ- (α + 2) e - (400 + θ) / δ dδ = {{θα/Γ(α)} Γ(α+1) / (400 + θ)α+1 = αθα/ (θ+400)α+1.
39.2. A. The conditional mean given δ is: δ. The unconditional mean can be obtained by integrating
the conditional means versus the distribution of δ:
∞ ∞ ∞
∫0 E[X | δ] g(δ) dδ = ∫0 δ θ ∫0 δ - α e- θ / δ dδ
α
e- θ / δ
- (α + 1)
E[X] = δ / Γ(α) dδ = {θα/Γ(α)}
= {θα/Γ(α)} Γ(α-1) / θα−1 = θ /(α-1).

Comment: The mean of a Pareto Distribution; the mixed distribution is a Pareto with scale parameter
θ and shape parameter α.
39.3. D. The conditional mean given δ is δ. The conditional variance given δ is δ2.
Thus the conditional second moment given δ is: δ2 + δ2 = 2δ2.

The unconditional second moment can be obtained by integrating the conditional second moments
versus the distribution of δ:
∞ ∞
E[X2 ] =
∫0 E[X2 | δ] g(δ) dδ = ∫0 (2δ2) θα δ - (α + 1) e- θ / δ / Γ(α) dδ =
∞
{2θα/Γ(α)} ∫0 δ- (α + 1) e- θ / δ dδ = {2θα/Γ(α)} Γ(α-2) / θα−2 = 2θ2 / {(α-1)(α-2)}.
Since the mean is θ/(α-1), the variance is: 2θ2 / {(α−1)(α-2)} - θ2 / (α-1)2 =
( θ2 / {(α-1)2(α-2)}) {2(α-1) - (α-2)} = θ2α / { (α−2)(α−1)2 }.

Comment: The variance of a Pareto Distribution.
39.4. A. Weibull has density, τ(x/θ)τ exp(-(x/θ)τ) /x = x-2/3θ−1/3 exp(-x1/3/θ1/3)/3.

The density of the mixed distribution is obtained by integrating the Weibull density times g(θ):
∞ ∞
∫0 f(x) g(θ) dθ = ∫0 {x- 2 / 3 θ - 1/ 3 exp[-x1/ 3 / θ1/ 3] / 3} 72.5(1/ 3) exp[-7 / θ1/ 3] / {θ11/ 6 Γ(2.5)} dθ
∞
72.5 x - 2 / 3
=
9 Γ(2.5) ∫0 θ - 13 / 6 exp[-(7 + x1/ 3) / θ1/ 3 ] dθ =
Make the change of variables, y = (7+x1/3)/θ1/3. θ = (7+x1/3)y-3. dθ = -3(7+x1/3)y-4dy.

∞
72.5 x - 2 / 3
3 Γ(2.5) ∫0 (7 + x1/ 3) - 13 / 2 y13 / 2 exp(-y) (7 + x1/ 3)y - 4 dy =
∞
72.5 x - 2 / 3 (7 + x1/ 3 )- 7 / 2 72.5 x - 2 / 3 (7 + x1/ 3 )- 7 / 2
3 Γ(2.5) ∫0 y5/ 2 exp(-y) dy =
3 Γ(2.5)
Γ(3.5)
= (2.5)(1/3)(1/7) x-2/3 / {1+(x/343)1/3}-7/2.

This is the density of a Burr Distribution, αγ(x/θ)γ(1+(x/θ)γ)−(α + 1) /x, with parameters:
α = 2.5, θ = 343 = 73 , and γ = 1/3.
Comment: In general, if one mixes a Weibull with τ = t fixed, with its scale parameter varying via an
Inverse Transformed Gamma Distribution, with parameters: α, θ, and τ = t,
then the mixed distribution is a Burr with parameters: α, θ, and γ = t.
This Inverse Transformed Gamma Distribution has parameters: α = 2.5, θ = 343 = 73 , and τ = 1/3.
39.5. E. If the hazard rate of an Exponential Distribution, λ, is distributed via a Gamma(α, θ),
then the mixed Pareto has parameters α and 1/θ.
π(λ) = (4/3) λ4 e-2λ, λ > 0. Substituting x for λ, this is proportional to: x5-1 e-x/(1/2).
Thus the distribution of lambda is Gamma with α = 5 and θ = 1/2.
Thus the mixed distribution is Pareto with α = 5 and θ = 2.
Thus, S(7) = 25 / (2+7)5 = 32 / 95 = 0.00054.
Alternately, this is the Exponential - Inverse Gamma, parameterized somewhat differently.
The mean of the Exponential is δ = 1/λ, and δ follows an Inverse Gamma.
dλ dλ
Since =- 1/δ2, g(δ) = π(λ) | | = (4/3) δ−4 e-2/δ / δ2 = (4/3) δ−6 e-2/δ.
dδ dδ
Substituting x for δ, this is proportional to: e-2/x / x(5+1).
Thus we have an Inverse Gamma has parameters α = 5 and θ = 2. Thus the mixed distribution is a
Pareto, with α = 5 and θ = 2. ⇒ S(x) = {2/(2+x)}5 . ⇒ S(7) = (2/9)5 = 0.00054.

Alternately, one can compute the unconditional survival function at x = 7 via integration:
∞ ∞ ∞
S(7) = ∫0 S(x | λ) π(λ) dλ = ∫0 exp(-7λ) (4 / 3) λ4 e- 2λ dλ = (4/3) ∫0 λ4 e- 9λ dλ .
This is a “Gamma type” integral and thus: S(7) = (4/3) Γ(5) / 95 = (4/3) (4!) / 95 = 0.00054.
Alternately, one can compute the unconditional survival function at x via integration:
∞ ∞ ∞
∫0
S(x) = S(7 | λ) π(λ) dλ = ∫0 exp(-xλ) (4 / 3) λ4 e- 2λ dλ = (4/3) ∫0 λ4 e- (2 + x)λ dλ
= (4/3) Γ(5) / (2+x)5 = (4/3)(4!) / (2+x)5 = 32 / (2+x)5 . ⇒ S(7) = 32 / 95 = 0.00054.
Comment: If one recognizes the mixed distribution as a Pareto with scale parameter of 2 and shape
parameter of 5, then one can determine the constant by looking in Appendix A of Loss Models
without doing the Gamma type integral.
For α integer, Γ[α] = (α-1)!.
The Gamma density in the Appendix of Loss Models is: θ−α xα−1 e−x/θ / Γ(α), x > 0.
Since this probability density function must integrate to unity from zero to infinity:
∞ ∞
∫0 tα - 1 e- t / θ dt = Γ(α) θα, or for integer n: ∫ tn e- c t

0
dt = n! / cn+1.
x
39.6. D. A(x) = ∫ a(t) dt = x4 .
0
S(x) = Mλ[-A(x)].
1
The moment generating function of this Exponential Distribution is: M(t) = .
1 - 0.007t
1
Thus S(x) = .
1 + 0.007x 4
1
Thus, S(6) = = 9.9%.
1 + 0.007 (64)
Comment: See Exercise 5.13 in Loss Models.
The mixture is Loglogistic.
For the Weibull as per Loss Models, h(x) = τ xτ−1 θ−τ.
Therefore, we can put the Weibull for fixed τ into the form of a frailty model;
h(x) = λ a(x), by taking a(x) = τ xτ−1 and λ = θ−τ. A(x) = xτ.

Here τ = 4, a(x) = 4 x3 , and λ = θ-4.
Thus for a given value of theta or lambda, S(x | λ) = exp[-(x/θ)4 ] = exp[-λx4 ].
Therefore, S(6 | λ) = exp[-λ64 ] = exp[-1296λ].
∞ ∞
Thus, S(6) = ∫0 exp[-1296λ] exp[-λ / 0.007]/ 0.007 dλ = ∫0 exp[-1438.86λ] dλ / 0.007
= (1/1438.86)/0.007 = 9.9%.
39.7. D. The hazard rate for an Exponential is one over its mean. Therefore, the survival function is
S(t; λ) = e-λt. Mixing over the different values of λ:
0.11 0.11
t = 0.11
S(t) = ∫ S(t; λ) f(λ) dλ =
∫ e- tλ (1/ 0.1) dλ = (-10 / t) e - tλ ]
t = 0.01
= (10/t)(e-0.01t - e-0.11t).
0.01 0.01
S(3) = (10/3)(e-0.03 - e-0.33) = 0.8384. S(5) = (10/5)(e-0.05 - e-0.55) = 0.7486.

The number of deaths expected between time 3 and time 5 is:
(1000){S(3) - S(5)} = (1000)(0.8384 - 0.7486) = 89.8.
39.8. E. For this Inverse Gamma-Exponential, the mixed distribution is a Pareto with α = 3 and
θ = 25,000. S(20000) = {25000/(25000 + 20000)}3 = (5/9)3 = 17.1%.
39.9. D. If the severity is Normal with fixed variance s2 , and the mixing distribution of their means is
also Normal with mean µ and variance σ2, then the mixed distribution is another Normal, with mean µ
and variance: s2 + σ2.

In this case, the mixed distribution is Normal with mean 135 and variance: 152 + 102 = 325.
Prob[145 ≤ score ≤ 155] = Φ[(155 - 135)/ 325 ] - Φ[(145 - 135)/ 325 ] = Φ[1.109] - Φ[0.555] =
0.8663 - 0.7106 = 0.156.
39.10. A. λ is the hazard rate of each Exponential, one over the mean. We are mixing λ via a
Gamma. Therefore, the mixed distribution is Pareto with parameters α = 3 and θ = 1/0.01 = 100.
(This is mathematically the same as the Inverse-Gamma Exponential.)
⎛ 100 ⎞3
Thus, S(250) = = 2.3%.
⎝ 100 + 250 ⎠
Alternately, h(x | λ) = λ a(x). For the Exponential, h(x) = λ.

Thus, this is a frailty model with a(x) = 1 and A(x) = x.
S(x) = Mλ[-A(x)].
⎛ 1 ⎞3
The moment generating function of this Gamma Distribution is: M(t) = .
⎝ 1 - 0.01t⎠
⎛ 1 ⎞ 3 ⎛ 100 ⎞ 3
Thus S(x) = = .
⎝ 1 + 0.01x⎠ ⎝ 100 + x ⎠
⎛ 100 ⎞3
Thus, S(250) = = 2.3%.
⎝ 100 + 250 ⎠
39.11. A. Inverse Weibull has density: 4x-5q4 exp[-(q/x)4 ].

The density of q is that of the Transformed Gamma Distribution:
τ(q/θ)τα exp(-(q/θ)τ) / {q Γ(α)} = 4 q4.2 11-5.2exp[-(q/11)4 ] / Γ(1.3).
The density of the mixed distribution is obtained by integrating the Inverse Weibull density times
the density of q:
∞
∫0 4x- 5 q4 exp[-(q / x)4] 4 q4.2 11- 5.2 exp[-(q / 11)4] dq / Γ(1.3) =

∞
(16) 11- 5.2 x - 5
Γ(1.3) ∫0 q8.2 exp[-(q4(11- 4 + x - 4)] dq =
Make the change of variables, y = q4 (11-4 + x-4). q = (11-4 + x-4)-1/4y 1/4.
dq = (1/4)(11-4 + x-4)-1/4y -3/4dy.
∞
(16) 11- 5.2 x - 5
Γ(1.3) ∫0 (11- 4 + x - 4 )- 2.05 y2.05 exp(-y) (1/ 4) (11 - 4 + x - 4) - 1/ 4 y - 3 / 4 dy =
∞
(4) 11- 5.2 x - 5 (11 - 4 + x - 4) - 2.3
Γ(1.3) ∫0 y1.3 exp(-y) dy =
(4) 11- 5.2 x - 5 (11 - 4 + x - 4) - 2.3
Γ(2.3) = (4)(1.3) 11-5.2 x-5 (11-4 + x-4)-2.3 =
Γ(1.3)
(4)(1.3) 11-5.2 x-5 x9.2(1 + (x/11)4 )-2.3 = (4)(1.3) (x/11)5.2 2(1 + (x/11)4 )-2.3 / x.
This is the density of an Inverse Burr Distribution, τγ(x/θ)γτ(1+(x/θ)γ)−(τ + 1) / x ,
with parameters τ = 1.3, θ = 11, and γ = 4. Therefore, the mixed distribution is:
F(x) = {(x /θ)γ / (1+(x /θ)γ)}τ = {1+(11 /x )4 }-1.3. F(10) = {1 + (11/10)4 }-1.3 = 31.0%.
Comment: In general, if one mixes an Inverse Weibull with τ = t fixed, with its scale parameter
varying via a Transformed Gamma Distribution, with parameters α, θ, and τ = t, then the mixed
distribution is an Inverse Burr with parameters τ = α, θ, and γ = t.
For each Inverse Weibull, S(10) = exp[-(q / 10)4 )]. One could instead average S(10) over the
Inverse Weibulls, in order to get S(10) for the mixed distribution:
∫0 exp[-(q / 10)4] 4 q4.2 11- 5.2 exp[-(q / 11)4] dq / Γ(1.3) =

∞
(4) (11 - 5.2)
Γ(1.3) ∫0 exp[-0.0001683q4] q4.2 dq =
∞
11- 5.2 11- 5.2
Γ(1.3) ∫0 exp(-0.0001683y) y0.3 dy =
Γ(1.3)
Γ(1.3) (0.0001683)-1.3 = 31.0%.
39.12. B. Let the inflation factor be y. Then given y, in the year 2007 the losses have an
Exponential Distribution with mean 3000y. Let z = 3000y. Then since y follows an Inverse Gamma
with parameters α = 4 and scale parameter θ = 3.5, z follows an Inverse Gamma with parameters
α = 4 and θ = (3000)(3.5) = 10,500.
Thus in the year 2007, we have a mixture of Exponentials each with mean z, with z following an
Inverse Gamma. This is the (same mathematics as the) Inverse Gamma-Exponential.
For the Inverse Gamma-Exponential the mixed distribution is a Pareto, with α = shape parameter of
the Inverse Gamma and θ = scale parameter of the Inverse Gamma.
In this case the mixed distribution is a Pareto with α = 4 and θ = 10,500.
For this Pareto, S(5500) = {1 + (5500/10500)}-4 = 18.5%.
Comment: This is an example of “parameter uncertainty.” We assume that the loss distribution in
year 2007 will also be an Exponential, we just are currently uncertain of its parameter.
x
39.13. B. h(x | λ) = λ a(x). For an Exponential a(x) = 1, and A(x) = ∫ a(t) dt = x.
0
The moment generating function of this Inverse Gaussian Distribution is:

M(t) = exp[(θ / µ) (1 - 1 - 2tµ2 / θ )] = exp[(1/3)(1 - 1 - 0.09t )].
S(x) = Mλ[-A(x)] = exp[(1/3)(1 - 1 + 0.09x )].
Thus, S(65) = exp[(1/3)(1 - 1 + (0.09)(65))] = 58%.

39.14. B. From the previous solution, S(x) = exp[(1/3)(1 - 1 + 0.09x )].

S(40) = exp[(1/3)(1 - 1 + (0.09)(40))] = 0.683.
Differentiating, f(x) = exp[(1/3)(1 - 1 + 0.09x )] (1/3)(0.09)(1/2)/ 1 + 0.09x .

f(40) = exp[(1/3)(1 - 1 + (0.09)(40))] 0.015 / 1 + (0.09)(40) = 0.00478.
h(40) = f(40)/S(40) = 0.00478/0.683 = 0.0070.
Alternately, for a frailty model, h(x) = a(x) d ln Mλ[-A(x)] / dλ.
M(t) = exp[(θ / µ) (1 - 1 - 2tµ2 / θ )] = exp[(1/3)(1 - 1 - 0.09t )].
ln M(t) = (1/3) (1 - 1 - 0.09t ).

d ln M[t] / dt = (1/3)(0.09)(1/2) / 1 - 0.09t .
x
h(x | λ) = λ a(x). For an Exponential a(x) = 1, and A(x) = ∫ a(t) dt = x.
0
h(x) = (1) (1/3)(0.09)(1/2) / 1 + 0.09x .

h(40) = 0.015 / 1 + (0.09)(40) = 0.0070.
39.15. C. For the Inverse Gamma, f(x | q) = qα e-q/x / {Γ[α] xα+1} = q4 e-q/x / {6 x5 }.
For the Exponential, u(q) = e-q/100/100.
∞ ∞
q4 e- q / x e-q / 100
f(x) = ∫ f(x | q) u(q) dq = ∫0 6x5 100
dq = ∫0 q4 e- q(1 / x + 1/ 100 ) dq / (600x5)
= {Γ[5] / (1/x + 1/100)5 } / (600x5 ) = {(24)1005 /(x + 100)5 } / 600 = (4)1004 /(x + 100)5 .
This is the density of a Pareto Distribution with parameters α = 4 and θ = 100.
Therefore, F(x) = 1 - {θ/(x+θ)}α = 1 - {100/(x+ 100)}4 . S(15) = (100/115)4 = 57.2%.

Comment: An example of an Exponential-Inverse Gamma.
39.16. For a Normal Distribution with mean zero and variance v:

f(x | v) = exp[-x2 /(2v)] / 2 πv .
An Inverse Gamma with α = 10 and θ = 10 has density:
g(v) = 1010 e-10/v v-11 / Γ(10), v > 0.
The mixed distribution has density:
∞ 10 10 ∞
exp[-x2 / (2v)] 10 e - 10 / v v - 11
∫0 ∫0 exp[-(10 + x2 / 2) / v] v - 11.5 dv
10
dv =
2πv Γ(10) 9! 2π
10
10 Γ(10.5) Γ(10.5)
= = (1 + x2 /20)-10.5 .
9! 2π (10 + x2 / 2)10.5 Γ(10) Γ(1/ 2) 2
This is a Studentʼs t distribution with 20 degrees of freedom.
Comment: Difficult! Since the Inverse Gamma Density integrates to one over its support,
∞
∫0 exp[-θ/ x] x- (α + 1) dx = Γ(α) / θα. Also, Γ(1/2) = π.
A Studentʼs t distribution with ν degrees of freedom, has density:

1 1 1
f(t) = , where β[ν/2, 1/2] = Γ[1/2]Γ[ν/2] / Γ[(ν+1)/2].
β[ν/ 2, 1/ 2] (t2 / ν + 1)( ν + 1) / 2 ν
39.17. D. The mixed distribution is a Pareto with shape parameter = α = 3 and

scale parameter = θ = 20, with variance: (2)(202 )/{{(3-1)(3-2)} - {20 / (3-1)}2 = 400 - 100 = 300.
Alternately,Var[X | δ] = Variance[Exponential Distribution with mean δ] = δ2 .
f(δ) is an Inverse Gamma Distribution, with θ = 20 and α = 3.
E[Var[X | δ]] = E[δ2 ] = 2nd moment of Inverse Gamma = 202 /{(3-1)(3-2)} = 200.
Var[E[X | δ]] = Var[δ] = variance of Inverse Gamma =
2nd moment of Inverse Gamma - (mean Inverse Gamma)2 = 200 - {20 / (3-1)}2 = 100.
Var[X] = E[Var[X | δ]] + Var[E[X | δ]] = 200 + 100 = 300.
x
39.18. E. A(x) = ∫0 a(t) dt = 1 + 0.008x - 1.
The moment generating function of a Gamma Distribution with α = 6 and θ = 1 is:

⎛ 1 ⎞6
M(t) = .
⎝ 1 - t⎠
⎛ 1 ⎞6 ⎛ 1 ⎞6 ⎛ 1 ⎞3
S(x) = Mλ[-A(x)] = ⎜ ⎟ = ⎜⎝ ⎟ = .
⎝ 1 + A(x)⎠ 1 + 0.008x ⎠ ⎝ 1 + 0.008x ⎠
⎛ 1 ⎞3
Thus, S(11) = ⎜ ⎟ = 77.6%.
⎝ 1 + (0.008)(11)⎠

The mixture is a Pareto Distribution with α = 3, and θ = 1/0.008 = 125.
39.19. D. Let the inflation factor be y. Then given y, in the next year the losses have an
Exponential Distribution with mean 8000y. Let z = 8000y. Then since y follows an Inverse Gamma
with parameters α = 2.5 and scale parameter θ = 1.6, z follows an Inverse Gamma with parameters
α = 2.5 and θ = (8000)(1.6) = 12,800. Thus next year, we have a mixture of Exponentials each with
mean z, with z following an Inverse Gamma. This is the (same mathematics as the) Inverse Gamma-
Exponential. For the Inverse Gamma-Exponential the mixed distribution is a Pareto, with α = shape
parameter of the Inverse Gamma, and
θ = scale parameter of the Inverse Gamma.
In this case the mixed distribution is a Pareto with α = 2.5 and θ = 12,800.
For this Pareto Distribution, S(10,000) = {1 + (10000/12800)}-2.5 = 23.6%.
39.20. B. ln[x] follows a Normal with parameters µ and 0.3.

Therefore, we are mixing a Normal with fixed variance via another Normal.
Therefore, the mixture of ln[x] is Normal with parameters 5, and 0.32 + 0.42 = 0.5.
Thus the mixture of x is LogNormal with parameters 5 and 0.5.
S(200) = 1 - Φ[{ln(200) - 5}/0.5] = 1 - Φ[0.60] = 27.43%.
39.21. E. & 39.22. B.

For the Exponential, the loss elimination ratio is equal to the distribution function: LER(x) = 1 - e-x/µ.
The mixed distribution of the size of loss is Pareto with the same parameters, α = 3 and θ = 1000.
Eµ[LER(x)] = Eµ[1 - e-x/µ] = Eµ[F(x ; µ)] =

∫ π[µ] F(x ; µ) dµ
3
⎛ 1000 ⎞
= distribution function of the mixture = distribution function of the Pareto = 1 - .
⎝ x + 1000 ⎠
Eµ[LER(500)] = 1 - (10/15)3 = 0.70370.
Eµ[LER(200)] = 1 - (10/12)3 = 0.42130.
Eµ[LER(x) LER(y)] = Eµ[(1 - e-x/µ) (1 - e-y/µ)] = Eµ[(1 - e-x/µ - e-y/µ + e-(x+y)/µ]
= 1 - Eµ[S(x ; µ)] - Eµ[S(x ; µ)] + Eµ[S(x + y ; µ)]

3 ⎛ 3
⎛ 1000 ⎞ 1000 ⎞ ⎛ 1000 ⎞3
=1- -⎜ ⎟ +⎜ ⎟ .
⎝ x + 1000 ⎠ ⎝ y + 1000⎠ ⎝ x + y + 1000 ⎠
Eµ[LER(200) LER(500)] = 1 - (10/12)3 - (10/15)3 + (10/17)3 = 0.32854.

C o vµ[LER(200) , LER(500)] = Eµ[LER(200) LER(500)] - Eµ[LER(200)] Eµ[LER(500)] =
0.32854 - (0.42130)(0.70370) = 0.03207.
3 3
⎛ 1000 ⎞ ⎛ 1000 ⎞
Eµ[LER(x)2 ] = Eµ[LER(x) LER(x)] = 1 - 2 + .
⎝ x + 1000 ⎠ ⎝ 2x + 1000 ⎠
Eµ[LER(200)2 ] = 1 - (2)(10/12)3 + (10/14)3 = 0.20702.
Varµ[LER(200)] = 0.20702 - 0.421302 = 0.02953.
Eµ[LER(500)2 ] = 1 - (2)(10/15)3 + (10/20)3 = 0.53241.
Varµ[LER(500)] = 0.53241 - 0.703702 = 0.03722.

0.03207
Corrµ[LER(200) , LER(500)] = = 96.73%.
(0.02953) (0.03722)
Comment: The loss elimination ratios for deductibles of somewhat similar sizes are highly correlated
across classes.
For a practical example for excess ratios, see Tables 3 and 4 in “NCCIʼs 2007 Hazard Group
Mapping,” by John P. Robertson, Variance, Vol. 3, Issue 2, 2009, not on the syllabus of this exam.
39.23. D. Given β the mean severity is: β /2.

50 b = 50
Thus the mean of the mixed distribution is: ∫10 ( β / 2) / 40 dβ = β3/ 2 / 120 ]
b = 10
= 2.683.
Given β the second moment of severity is: ( β )2 /12 + ( β /2)2 = β/3.

50 b = 50
Thus the 2nd moment of the mixed distribution is: ∫10 (β / 3) / 40 dβ = β2 / 240 ]
b = 10
= 10.
The variance of the mixed distribution is: 10 - 2.6832 = 2.80.
x
39.24. C. A(x) = ∫0 a(t) dt = 1.09x / ln(1.09) - 1 / ln(1.09) = (1.09x - 1) / ln(1.09).
The moment generating function of a Gamma Distribution with α = 4 and θ = 1/50,000 is:
⎛ 1 ⎞ 4 ⎛ 50,000 ⎞ 4
M(t) = ⎜ ⎟ =⎜ ⎟ .
⎝ 1 - t / 50,000⎠ ⎝ 50,000 - t⎠
⎛ 50,000 ⎞4 ⎛ 50,000 ⎞4
S(x) = Mλ[-A(x)] = ⎜ ⎟ =⎜ ⎟ .
⎝ 50,000 + A(x)⎠ ⎝ 50,000 + (1.09x - 1) / ln(1.09)⎠
⎛ 50,000 ⎞4
Thus, S(90) = ⎜ ⎟ = 17.70%.
⎝ 50,000 + (1.0990 - 1) / ln(1.09)⎠
Comment: The form of the hazard rate is that for Gompertz law; h(x) = Bcx, with B = λ and c = 1.09.
The B parameter varies across a group of individuals via a Gamma Distribution; some individuals in
the group have higher hazard rates while others have lower hazard rates. Although not mentioned in
Loss Models, this “Gamma-Gompertz” frailty model is used in Survival Analysis.
In general, the mixed distribution has survival function: S(x) = {1 + θ (cx - 1)/ ln[c]}−α.
The mean value of B is: 4/50,000 = 0.00008.
For Gompertz Law: S(x) = exp[(1 - cx) B/ ln(c)].
For B = 0.00008 and c = 1.09, S(90) = exp[(1 - 1.0990) (0.00008) / ln(1.09)] = 11.45%.
This is less than S(90) for the mixture; mixing has made the righthand tail heavier.
39.25. C. The mixed distribution is Normal with mean 6000 and variance: σ2 + β2.
Thus the 90th percentile of the lifetime of a randomly chosen transformer is:
9200 = 6000 + 1.282 σ 2 + β2 . ⇒ σ 2 + β2 = 2496.
Thus the mixed distribution is Normal with mean 6000 and standard deviation 2496.
Thus Prob[lifetime ≤ 5000] = Φ[(5000 - 6000) / 2496] = Φ[-0.40] = 34.5%.
39.26. B. f(µ) is an Inverse Gamma Distribution, with θ = 20 and α = 2.

p(y) is an Exponential Distribution with E[Y | µ] = µ.
Therefore the mean severity = Eµ[ E[Y | µ] ] = Eµ[µ] =

∫ µ f(µ) dµ =
mean of Inverse Gamma = θ / (α-1) = 20 / (2-1) = 20.
Alternately, the mixed distribution is a Pareto with shape parameter = α = 2,
and scale parameter = θ = 20. Therefore this Pareto has mean 20 / (2-1) = 20.
Comment: One can do the relevant integral via the substitution x = 1/µ, dx = -dµ / µ2:
∞ ∞ ∞
e- 20 / µ
∫ µ f(µ) dµ = ∫0 µ (400 / µ 3) e- 20 / µ dµ = 400 ∫0 µ2 ∫0
dµ = 400 e- 20x dx = 400/20 = 20.
39.27. C. This is an Exponential mixed via an Inverse Gamma. The Inverse Gamma has
parameters α = 3 and θ = 2. Therefore the (prior) mixed distribution is a Pareto, with α = 3 and θ = 2.
Thus f(x) = (3) (23 ) (2+x)-4. f(3) = (3)(8) / 54 = 0.0384.

Alternately, one can compute the unconditional density at y = 3 via integration:
∞ ∞ ∞
f(3) = ∫0 f(3 | δ) p(δ) dδ = ∫0 (1/ δ) exp(-3 / δ) (4 / δ 4) exp(-2 / δ) dδ = ∫0 4δ- 5 exp(-5 / δ) dδ .
Let x = 5/δ and dx = (-5/δ2)dδ in the integral:
∞
f(3) = (4 / 54) ∫0 x3 exp(-x ) dx = (4/625) Γ(4) = (4/625)(3!) = (4/625)(6) = 0.0384.
Alternately, one can compute the unconditional density at y via integration:
∞ ∞
f(y) = ∫0 (1/ δ) exp(-y / δ) (4 / δ4) exp(-2 / δ) dδ = ∫0 4δ- 5 exp(- (2 + y) / δ) dδ .
Let x = (2+y)/δ and dx = -((2+y)/δ2)dδ in the integral:
∞
f(y) = (4 / (2+y)4) ∫0 x3 exp(-x ) dx = (4 / (2+y)4) Γ(4) = (4/ (2+y)4)(3!) = 24(2+y)-4. f(3) = 0.0384.
Comment: If one recognizes this as a Pareto with θ = 2 and α = 3, then one can determine the
constant by looking in Appendix A of Loss Models, rather than doing the Gamma integral.
39.28. E. The hazard rate for an Exponential is one over its mean. The mean is 1/θ not θ. The
survival function is S(t; θ) = e-θt. S(.5; θ) = e-0.5θ. Mixing over the different values of θ:
11 11 θ = 11
S(.5) = ∫1 S(.05; θ) f(θ) dθ =
∫1 e - 0.5θ (1/ 10) dθ = (-1/ 5)e - 0.5θ ]
θ=1
=
(e-0.5 - e-5.5 )/5 = (0.607 - 0.004)/5 = 0.12.

Comment: The mean future lifetime given θ is 1/θ. The overall mean future lifetime is:
11 11 θ = 11
∫1 (1/ θ) f(θ) dθ =
∫1 (1/ θ) (1/ 10) dθ = ln(θ) / 10 ]
θ=1
= 0.24.
39.29. D. For a constant force of mortality, λ, the distribution function is Exponential:
F(t | λ) = 1 - e-λt. F(1 | λ) = 1 - e-λ.

The forces of mortality are uniformly distributed over the interval (0, 2). ⇒ π(λ) = 1/2, 0 ≤ λ ≤ 2.
Taking the average over the values of λ:
2 2
F(1) = ∫0 F(1 | λ) π(λ) dλ = ∫0 (1 - e- λ )/ 2 dλ = 1 - (1 - e-2)(1/2) = 0.568.
Alternately, one can work with the means θ = 1/λ, which is harder.
λ is uniform from 0 to 2. ⇒ The distribution function of λ is: Fλ(λ) = λ/2, 0 ≤ λ ≤ 2.
⇒ The distribution function of θ is:

Fθ(θ) = 1 - Fλ(λ) = 1 - Fλ(1/θ) = 1 - 1/(2θ), 1/2 ≤ θ ≤ ∞.
⇒ The density function of θ is: 1/(2θ2), 1/2 ≤ θ ≤ ∞.
Given θ, the probability of death by time 1 is: 1 - e-1/θ.

Taking the average over the values of θ:
∞ ∞
x =∞
F(1) = ∫ (1 - e- 1/ θ )/ (2θ2 ) dθ = 1 - ∫ e- 1/ θ / (2θ2 ) dθ = 1 - (1/2) ( e-1/ θ ]
x = 1/ 2
)
1/ 2 1/ 2
= 1 - (1 - e-2)/2 = 0.568.
Comment: F(1 | λ = 1) = 1 - e-1 = 0.632. Thus choices A and B are unlikely to be correct.
39.30. D. If the severity is Normal with fixed variance s2 , and the mixing distribution of their means
is also Normal with mean µ and variance σ2, then the mixed distribution is another Normal, with mean
µ and variance: s2 + σ2.

In this case, the mixed distribution is Normal with mean 75 and variance: 82 + 62 = 100.
Prob[there is a payment] = Prob[Score > 65] = 1 - Φ[(65 - 75)/10] = 1 - Φ[-1] = .8413.
Prob[90 > Score > 65] = Φ[(90 - 75)/10] - Φ[(65 - 75)/10] = Φ[1.5] - Φ[-1] = .9332 - .1587 = .7745.
Prob[payment < 90 | payment > 0] = Prob[90 > Score > 65 | Score > 65] =
Prob[90 > Score > 65]/Prob[Score > 65] = .7745/.8413 = 0.9206.
39.31. D. Y is Gamma with α = θ = 2. Therefore, F(y) = Γ[2; y/2].

Let the mean of each Exponential Distribution be δ = 1/y. Then F(δ) = Γ[2; (1/2)/δ].
Therefore, δ has an Inverse Gamma Distribution with α = 2 and θ = 1/2.
This is an Inverse Gamma - Exponential with mixed distribution a Pareto with α = 2 and θ = 1/2.
F(x) = 1 - {θ/(x + θ)}α = 1 - {0.5/(x + 0.5)}2 . F(1/2) = 1 - (0.5/1)2 = 0.75.
Alternately, Prob[T > t | y] = e-yt. Prob[T > t] = ∫ e - yt f(y) dy = MY[-t].

The moment generating function of a Gamma Distribution is: 1/(1 - θt)α.
Therefore, the moment generating function of Y is: 1/(1 - 2t)2 .
Prob[T > 1/2] = MY[-1/2] = 1/22 = 1/4. Prob[T < 1/2] = 1 - 1/4 = 3/4.
Alternately, f(y) = y e-y/2 /(Γ(2) 22 ) = y e-y/2 /4. Therefore, the mixed distribution is:
∞ ∞
F(x) = ∫0 (1 - e - xy ) y e- y / 2 / 4 dy = 1 - (1/4) ∫0 y e- y(x + 0.5 ) dy = 1 - 0.25/(x + 0.5)2.
F(1/2) = 1 - 0.25 = 0.75.
Alternately, the length of time until the forgetting is analogous to the time until the first claim.
This time is Exponential with mean 1/Y and is mathematically the same as a Poisson Process with
intensity Y. Since Y has a Gamma Distribution, this is mathematically the same as a
Gamma-Poisson. Remembering less than 1/2 year, is analogous to at least one claim by time 1/2.
Over 1/2 year, Y has a Gamma Distribution with α = 2 and instead θ = 2/2 = 1.
The mixed distribution is Negative Binomial, with r = α = 2 and β = θ = 1.
1 - f(0) = 1 - 1/(1 + 1)2 = 3/4.
39.32. B. The present value of a continuous annuity of length t is: (1 - e-δt)/δ.
Given constant force of mortality λ, the lifetimes are exponential with density f(t) = λe−λt.
∞
1 - e - δt
For fixed λ, APV = ∫0 δ
λe - λt dt = {1 - λ/(λ + δ)}/δ = 1/(λ + δ) = 1/(λ + 0.01).
λ in turn is uniform from 0.01 to 0.02 with density 100.

0.02
∫
1
Mixing over λ, Actuarial Present Value = 100 dλ = 100ln(3/2) = 40.55.
λ + 0.01
0.01
2016-C-2, Loss Distributions, §40 Splices HCM 10/21/15, Page 905
Section 40, Spliced Models346
A spliced model allows one to have different behaviors for different sizes of loss. For example as
discussed below, one could splice together an Exponential Distribution for small losses and a Pareto
Distribution for large losses. This would differ from a two-point mixture of an Exponential and Pareto,
in which each distribution would contribute its density to all sizes of loss.
A Simple Example of a Splice:
Assume f(x) = 0.01 for 0 < x < 10, and f(x) = 0.009 e0.1 e-x/100 for x > 10.
Exercise: Show that this f(x) is a density.

[Solution: f(x) ≥ 0.
∞ 10 ∞
∫0 f(x) dx = ∫0 0.01 dx + 10∫ 0.009 e0.1 e- x / 100 dx = 0.1 + 0.9 e.1 e-10/100 = 1.]
Here is a graph of this density:
density
0.010
0.008
0.006
0.004
0.002
x
50 100 150 200
We note that this density is discontinuous at 10.
This is an example of a 2-component spliced model. From 0 to 10 it is proportional to a uniform

density and above 10 it is proportional to an Exponential density.
346
Two-Component Splices:
In general a 2-component spliced model would have density:

f(x) = w1 f1 (x) on (a1 , b1 ) and f(x) = w2 f2 (x) on (a2 , b2 ), where f1 (x) is a density with support (a1 ,
b 1 ), f2 (x) is a density with support (a2 , b2 ), and w1 + w2 = 1.
In the example, f1 (x) = 1/10 on (0, 10), f2 (x) = e0.1e-x/100/100 on (10, ∞), w1 = 0.1, and w2 = 0.9.
f1 is the uniform distribution on (0, 10). f2 is proportional to an Exponential with θ = 100.
In order to make f2 a density on (10, ∞), we have divided by S(10) = e-10/100 = e-0.1.347
A Splice of an Exponential and a Pareto:
Assume an Exponential with θ = 50.

On the interval (0, 100) this would have probability F(100) = 1 - e-2 = 0.8647.
In order to turn this into a density on (0, 100), we would divide this Exponential density by 0 .8647:
(e-x/50/50) / 0.8647 = 0.02313 e-x/50.
This integrates to one from 0 to 100.
Assume a Pareto Distribution with α = 3 and θ = 200, with density (3)(2003 ) / (200 + x)4 =
0.015/(1 + x/200)4 .
On the interval (100, ∞) this would have probability S(100) = (θ/(θ + x))α = (200/300)3 = 8/27.
In order to turn this into a density on (100, ∞), we would multiply by 27/8:
(27/8) {.015/(1 + x/200)4 } = 0.050625 / (1 + x/200)4 .
This integrates to one from 100 to ∞.
So we would have f1 (x) = 0.02313e-x/50 on (0, 100), f2 (x) = 0.050625/(1 + x/200)4 on (100, ∞).
We could use any weights w1 and w2 as long as they add to one, so that the spliced density will
integrate one.
If we took for example, w1 = 70% and w2 = 30%, then the spliced density would be:
(0.7)(0.02313 e-x/50) = 0.01619 e-x/50 on (0, 100), and
(0.3) {0.050625/(1 + x/200)4 } = 0.0151875 / (1 + x/200)4 on (100, ∞).
347
This is how one alters the density in the case of truncation from below.
This 2-component spliced density looks as follows, with 70% of the probability below 100,
and 30% of the probability above 100:
Density
0.015
0.010
0.005
Size
100 200 300 400
It is not continuous at 100.
This spliced density is: 0.01619e-x/50 on (0, 100), and 0.01519/(1 + x/200)4 on (100, ∞) ⇔
(0.7)(0.02313e-x/50) on (0, 100), and (0.3)(0.050625/(1 + x/200)4 ) on (100, ∞) ⇔
(0.7)(e-x/50/50)/(1 - e-100/50) on (0, 100), (0.3){(3)2003 /(200 + x)4 }/{(200/300)3 } on (100, ∞) ⇔
(0.8096) (Exponential[50]) on (0, 100), and (1.0125) (Pareto[3, 200]) on (100, ∞).
Exercise: What is the distribution function of this splice at 80?

[Solution: (0.8096)(Exponential Distribution function at 80) = (0.8096) (1 - e-80/50) = 0.6461.]
Exercise: What is the survival function of the splice at 300?

[Solution: (1.0125)(Pareto Survival function at 300) = (1.0125) {200/(200 + 300)}3 = 0.0648.]
In general, it is easier to work with the distribution function of the first component of the splice below
the breakpoint and the survival function of the second component above the breakpoint.
Note that at the breakpoint of 100, the distribution function of the splice is:
(0.8096)(Exponential Distribution function at 100) = (0.8096) (1 - e-100/50) = 0.700 =
1 - (1.0125)(Pareto Survival function at 100) = 1 - (1.0125) {200/(200 + 100)}3 .
Assume we had originally written the splice as:348

c1 Exponential[50] on (0, 100), and c2 Pareto[3, 200] on (100, ∞).
Since we chose weights of 70% & 30%, we want:
100
70% = ∫0 c 1 Exponential[50] dx = c1 (1 - e-100/50).
⇒ c1 = 70% / (1 - e-100/50) = 0.8096.
∞
⎛ ⎞3
∫
200
30% = c 2 Pareto[3, 200] dx = c2 = 0.29630 c2 .
⎝ 200 + 100 ⎠
100
⇒ c2 = 30% / 0.29630 = 1.0125.
Therefore, as shown previously, the spliced density can be written as:

Density
0.015
0.010
0.005
Size
100 200 300 400
348
As discussed, while this is mathematically equivalent, this is not the manner in which the splice would be written in
Loss Models, which uses f1 , f2 , w1 , and w2 .
Continuity:
With appropriate values of the weights, a splice will be continuous at its breakpoint.
Exercise: Choose w1 and w2 so that the above spliced density would be continuous at 100.
[Solution: f1 (100) = 0.02313e-100/50 = 0.00313. f2 (100) = 0.050625 / (1 + 100/200)4 = 0.01.
In order to be continuous at 100, we need w1 f1 (100) = w2 f2 (100) = (1- w1 )f2 (100).
w1 = f2 (100) / {f1 (100) + f2 (100)} = 0.01 / (0.00313 + 0.01) = 0.762. w2 = 1 - w1 = 0.238.]
If we take f(x): (0.762) (0.02313e-x/50) = 0.01763e-x/50 on (0, 100), and

(0.238) {0.050625 / (1 + x/200)4 } = 0.01205 / (1 + x/200)4 on (100, ∞), then f(x) is continuous:
Density
0.015
0.010
0.005
Size
100 200 300 400
The density of an Exponential Distribution with mean 50 is: e-x/50 / 50.

(3) (2003) 3 / 200
The density of a Pareto Distribution with α = 3 and θ = 200 is: = .
(200 + x) 4 (1 + x / 200)4
(0.01763)(50) = 0.8815. (0.01205)(200/3) = 0.8033.
Therefore, similar to the noncontinuous splice, we could rewrite this continuous splice as:
In general, a 2-component spliced density will be continuous at the breakpoint b,

provided the weights are inversely proportional to the component densities at the breakpoint:
w 1 = f2 (b) / {f1 (b) + f2 (b)}, w2 = f1 (b) / {f1 (b) + f2 (b)}.349
349
While this spliced density will be continuous at the breakpoint, it will not be differentiable at the breakpoint.
Moments:
One could compute the moments of a spliced density by integrating xn f(x).
For example, the mean of this continuous 2-component spliced density is:
100 ∞
∫0 ∫
x 0.01205
x 0.01763 e- x / 50 dx + dx .
(1 + x / 200)4
100
100
Exercise: Compute the integral ∫0 x 0.01763 e- x / 50 dx .
100 x = 100
[Solution: ∫0 x 0.01763 e- x / 50 dx = -50 x 0.01763 e- x / 50 - 502 0.01763 e- x / 50 ]
x =0
= 26.18.
Alternately, as discussed previously, the first component of the continuous splice is:
(0.8815) Exponential[50], on (0, 100).
100
Now ∫0 x fexp(x) dx = E[X ∧ 100] - 100 Sexp(100) = 50(1 - e-100/50) - 100 e-100/50 = 29.700.
100 100
Thus ∫0 x 0.01763 e- x / 50 dx = 0.8815 ∫0 x fexp(x) dx = (0.8815)(29.700) = 26.18.
Comment: ∫ x e- x / θ dx = -θ x e-x/θ - θ2 e-x/θ.]
∫100 (1 + x / 200)4 dx .
x 0.01205
Exercise: Compute the integral
[Solution: One can use integration by parts.

x =∞
]
∞ ∞
∫ ∫
x 0.01205 -200 / 3 -200 / 3
4 dx = 0.01205 x - 0.01205 dx =
(1 + x / 200) (1 + x / 200)3 (1 + x / 200)3
100 100
x = 100
1 1
(0.01205) (20,000/3) ( 3 + ) = 59.51.
1.5 1.52
Alternately, as discussed previously, the second component of the continuous splice is:
(0.8033) Pareto[3, 200], on (100, ∞).
∞ ∞ ∞
Now ∫100 x fPareto(x) dx = 100∫ (x - 100) fPareto(x) dx + 100 100∫ fPareto(x) dx =
3
100 + 200 ⎛ 200 ⎞
ePareto(100) SPareto(100) + 100 SPareto(100) = ( + 100) = 74.074.
3 - 1 ⎝ 200 + 100 ⎠
∞ ∞
∫ ∫100 x fPareto(x) dx = (0.8033)(74.074) = 59.51.

x 0.01205
Thus dx = 0.8033
(1 + x / 200)4
100
∞
x + θ
Comment: e(x) = ∫x (t - x) f(t) dt / S(x). For a Pareto Distribution, e(x) = α - 1 .]
Thus the mean of this continuous splice is:
100 ∞
∫0 x 0.01763 ∫100 (1 + x / 200)4 dx = 26.18 + 59.51 = 85.69.

x 0.01205
e- x / 50 dx +
More generally, assume we have a splice which is w1 h1 (x) on (0, b) and w2 h2 (x) on (b, ∞),
where h1 (x) = f1 (x) / F1 (b) and h2 (x) = f2 (x) / S2 (b). Then the mean of this spliced density is:
b ∞ b ∞
∫0 x w1 h1(x) dx + ∫b ∫0 ∫b x f2(x) dx =
w1 w2
x w2 h2 (x) dx = x f1(x) dx +
F1(b) S2 (b)
∫0 x f2(x) dx } =
w1 w2
{ E[X1 ∧ b] - bS1 (b)} + {E[X2 ] -
F1(b) S2 (b)
w1 w2
{ E[X1 ∧ b] + bF1 (b) - b} + {E[X2 ] + bS2 (b) - E[X2 ∧ b]} =
F1(b) S2 (b)
w1 w2
{ E[X1 ∧ b] - b} + bw1 + bw2 + {E[X2 ] - E[X2 ∧ b]} =
F1(b) S2 (b)
w1 w2
b+ { E[X1 ∧ b] - b} + {E[X2 ] - E[X2 ∧ b]}.
F1(b) S2 (b)
For the example of the continuous splice, b = 100, F1 (100) = 1 - e-100/50 = 0.8647,
E[X1 ∧ b] = 50(1 - e-100/50) = 43.235,350 S 2 (b) = (200/300)3 = 8/27, E[X2 ] = 200/(3-1) = 100,
E[X2 ∧ b] = 100(1 - (2/3)2 ) = 55.556.351 Therefore, for w1 = 0.762 and w2 = 0.238, the mean is:
100 + (0.762/0.8647)(43.235 - 100) + (0.238)(27/8)(100 - 55.556) = 85.68,
matching the previous result subject to rounding.
n-Component Splices:
In addition to 2-component splices, one can have 3-components, 4-components, etc.
In a three component splice there are three intervals and the spliced density is:
w1 f1 (x) on (a1 , b1 ), w2 f2 (x) on (a2 , b2 ), and w3 f3 (x) on (a3 , b3 ), where f1 (x) is a density with
support (a1 , b1 ), f2 (x) is a density with support (a2 , b2 ), f3 (x) is a density with support (a3 , b3 ), and
w1 + w2 + w3 = 1.
Previously, when working with grouped data we had discussed assuming a uniform distribution on
each interval. This is an example of an n-component splice, with n equal to the number of intervals
for the grouped data, and with each component of the splice uniform.
350
For the Exponential Distribution, E[X ∧ d] = θ(1 - e-d/θ).
351
For the Pareto Distribution, E[X ∧ d] = (θ/(α−1))(1 - (θ/(d+θ))α-1).
Using the Empirical Distribution:
One common use of splicing, is to use the Empirical Distribution function (or a smoothed version of
it) for small losses, and some parametric distribution to model large losses.352
For example, take the ungrouped data in Section 1. We could model the losses of size less than
100,000 using the Empirical Distribution Function, and use a Pareto Distribution to model the losses
of size greater than 100,000. There are 57 out of 130 losses of size less than 100,000.
Therefore, the Empirical Distribution Function at 100,000 is: 57/130 = 0.4385.
A Pareto Distribution with α = 2 and θ = 298,977, has F(100000) = 1 - (298,977/398,977)2 =
0.4385, matching the Empirical Distribution Function.
Thus one could splice together this Pareto Distribution from 100,000 to ∞, and the Empirical
Distribution Function from 0 to 100,000. Here is what this spliced survival function looks like:
1.0
Empirical
0.8
0.6
0.4
Pareto
0.2
100000 500000 900000
352
A variation of this technique is used in “Workersʼ Compensation Excess Ratios, an Alternative Method,” by
Howard C. Mahler, PCAS 1998.
Using a Kernel Smoothed Density:353
Rather than use the Empirical Distribution, one could use a kernel smoothed version of the Empirical
Distribution. For example, one could splice together the same Pareto Distribution above 100,000,
and below 100,000 the kernel smoothed density for the ungrouped data in Section 1, using a
uniform kernel with a bandwidth of 5000.
Here is one million times this spliced density:
10
8 Kernel Smoothed
Pareto
100000 500000 900000
353
Kernel Smoothing is discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
Problems:

f(x) = 0.12 for x ≤ 5, and f(x) = 0.06595e-x/10 for x > 5.
40.1 (1 point) What is the Distribution Function at 10?

A. less than 0.60
E. at least 0.75
40.2 (3 points) What is the mean?

A. less than 5
E. at least 8
40.3 (4 points) What is the variance?

A. less than 70
E. at least 85
40.4 (5 points) What is the skewness?

A. less than 2.6
E. at least 2.9
40.5 (3 points) What is E[X ∧ 3]?

A. less than 2.2
E. at least 2.5
40.6 (3 points) What is the loss elimination ratio at 5?

A. less than 40%
E. at least 55%
40.7 (3 points) What is E[(X-20)+]?

A. less than 0.75
E. at least 0.90
40.8 (2 points) What is e(20)?

A.7 B. 8 C. 9 D. 10 E. 11
40.9 (1 point) What is the median?

A. 4.2 B. 4.4 C. 4.6 D. 4.8 E. 5.0
40.10 (2 points) What is the 90th percentile?

A. 18 B. 19 C. 20 D. 21 E. 22
40.11 (3 points) The size of loss for the Peregrin Insurance Company follows the given f(x).
The average annual frequency is 138.
Peregrin Insurance buys reinsurance from the Meriadoc Reinsurance Company for 5 excess of 15.
How much does Meriadoc expect to pay per year for losses from Peregrin Insurance?
A. 60 B. 70 C. 80 D. 90 E. 100
40.12 (3 points) For a two-component spliced model:

(i) Up to 500 it is proportional to a Weibull Distribution with θ = 600 and τ = 3.
(ii) Above 500 it is proportional to a Weibull Distribution with θ = 400 and τ = 2.
(iii) It is continuous.
Calculate the probability in the interval from 400 to 600.
A. 39% B. 41% C. 43% D. 45% E. 47%

One has a two component splice, which is proportional to an Exponential Distribution with mean 3
for loss sizes less than 5, and is proportional to a Pareto Distribution with α = 4 and θ = 60 for loss
sizes greater than 5. The splice is continuous at 5.
40.13 (2 points) What is the distribution function of the splice at 2?

A. less than 20%
E. at least 35%
40.14 (2 points) What is the survival function of the splice at 10?

A. less than 35%
E. at least 50%
40.15 (3 points) What is the mean of this splice?

A. less than 14.0
E. at least 15.5
40.16 (4 points) In 2008, the size of monthly pension payments for a group of retired municipal
employees follows a Single Parameter Pareto Distribution, with α = 2 and θ = $1000.
The city announces that for 2009, there will be a 5% cost of living adjustment (COLA.)
However. the COLA will only apply to the first $2000 in monthly payments.
What is the probability density function of the size of monthly pension payments in 2009?
40.17 (2 points) X follows a two-component splice with a constant density from zero to 200.
From 200 to infinity the splice is proportional to an Exponential Distribution with mean 500.
If the splice is continuous at 200, what is the probability that X is less than or equal to 300?
A. 40% B. 42% C. 44% D. 46% E. 48%
40.18 (3 points) You are given the following grouped data:

Range # of claims loss
0-1 6300 3000
1-2 2350 3500
2-3 850 2000
3-4 320 1000
4-5 110 500
over 5 70 500
10,000 10,500
What is the mean of a 2-component splice between the empirical distribution below 4 and an
Exponential with θ = 1.5?
A. less than 1.045
E. at least 1.060

⎧ 617,400 ,0 < x ≤ 4
⎪⎪218 (10 + x) 4
f(x) = ⎨ .
⎪ 3920 ,x > 4
⎩⎪ 25 (10 + x)3
40.19 (2 points) Determine the probability that X is greater than 2.

A. 54% B. 56% C. 58% D. 60% E. 62%
40.20 (4 points) Determine E[X].

A. 4 B. 5 C. 6 D. 7 E. 8
An actuary for a medical device manufacturer initially models the failure time for a particular device with
an exponential distribution with mean 4 years.
This distribution is replaced with a spliced model whose density function:
(i) is uniform over [0, 3]
(ii) is proportional to the initial modeled density function after 3 years
(iii) is continuous
Calculate the probability of failure in the first 3 years under the revised distribution.
(A) 0.43 (B) 0.45 (C) 0.47 (D) 0.49 (E) 0.51
40.22 (CAS3, 11/06, Q.18) (2.5 points) A loss distribution is a two-component spliced model
using a Weibull distribution with θ1 = 1,500 and τ = 1 for losses up to $4,000, and a Pareto
distribution with θ2 = 12,000 and α = 2 for losses $4,000 and greater.

The probability that losses are less than $4,000 is 0.60.
Calculate the probability that losses are less than $25,000.
A. Less than 0.900
E. At least 0.975
5 10
∫0
40.1. E. F(10) = 0.12 dx + ∫5 0.06595 e- x / 10 dx = (5)(0.12) + 0.6595(e-5/10 - e-10/10)
= 0.757.
∞
Alternately, S(10) = ∫10 0.06595 e- x / 10 dx = 0.6595e-10/10 = 0.243.
⇒ F(10) = 1 - 0.243 = 0.757.
5 ∞
∫0 ∫5
40.2. D. mean = x 0.12 dx + x 0.06595 e- x / 10 dx =
x= 5 x =∞
0.06 x ]
2 - 0.06595 {10xe- x / 10 + 100e- x / 10 } ] = 1.5 + 6.0 = 7.5.
x =0 x= 5
5 ∞
40.3. C. 2nd moment = ∫0 x2 0.12 dx + ∫5 x2 0.06595 e- x / 10 dx =
x= 5 x =∞
0.04 x3 ] - 0.06595 {10x2 e- x / 10 + 200xe - x / 10 + 2000e - x / 10 }] = 5 + 130 = 135.
x =0 x= 5
Variance = 135 - 7.52 = 78.75.
5 ∞
40.4. A. 3rd moment = ∫0 x3 0.12 dx + ∫5 x3 0.06595 e- x / 10 dx =
x= 5 x =∞
0.03 x4 ] - 0.06595 {10x3 e- x / 10 + 300x 2e- x / 10 + 6000xe- x / 10 + 60,000e - x / 10 ]
}
x =0 x= 5
= 18.75 + 3950 = 3968.75.

3968.5 - (3)(7.5)(135) + (2)(7.53)
Skewness = = 2.54.
78.751.5
3 x= 3
∫0
40.5. D. E[X ∧ 3] = x 0.12 dx + 3S(3) = 0.06 x ]
2
x =0
- (3){1 - (0.12)(3)} = 0.54 + 1.92 = 2.46.
5 x= 5
40.6. C. E[X ∧ 5] = ∫0 x 0.12 dx + 5S(5) = . 0.06 x ]
2
x =0
- (5){1 - (0.12)(5)} = 1.5 + 2 = 3.5.
E[X ∧ 5] / E[X] = 3.5 / 7.5 = 46.7%.

Alternately, since for x > 5, the density is proportional to an Exponential,
f(x) = 0.06595e-x/10 for x > 5, S(x) = 0.6595e-x/10 for x > 5.
∞
The layer from 5 to infinity is: ∫5 0.6595 e - x / 10 dx = 4.00.
The loss elimination ratio at 5 is: 1 - 4/7.5 = 46.7%.
Comment: 1 - (0.12)(5) = S(5) = 0.6595e-5/10 = 0.400.
5 20
40.7. D. E[X ∧ 20] = ∫ x 0.12 dx + ∫ x 0.06595 e- x / 10 dx + 20S(20) =
0 5
x= 5 x = 20
0.06 x ]
2 - 0.06595 {10xe- x / 10 + 100e- x / 10 } ] + (20)(0.06595)(10e-2)
x =0 x= 5
= 1.5 + 3.323 + 1.785 = 6.608. E[(X-20)+] = E[X] - E[X ∧ 20] = 7.5 - 6.608 = 0.892.
∞
Alternately, E[(X-20)+] =
∫20 0.06595 e- x / 10 (x - 20) dx =
x= ∞
0.06595 {-10xe- x / 10 + 100e- x / 10 } ] = 0.8925.
x = 20
∞ ∞ x =∞
Alternately, E[(X-20)+] = ∫ S(x) dx =
∫ 0.06595 e- x / 10 dx = -6.595e- x / 10 ]
x = 20
= 0.8925.
20 20
40.8. D. Beyond 5 the density is proportional to an Exponential density with mean 10, and
therefore, beyond 5, the mean residual life is a constant 10. Alternately,
S(20) = (0.06595)(10e-2) = 0.08925. e(20) = E[(X-20)+] /S(20) = 0.8925/0.08925 = 10.0.
40.9. A. f(x) = 0.12 for x ≤ 5. ⇒ F(5) = 0.60. ⇒ median = 0.5 / 0.12 = 4.167.
40.10. B. f(x) = 0.12 for x ≤ 5. ⇒ F(5) = 0.6.

f(x) = 0.06595e-x/10 for x > 5.
x
⇒ F(x) = 0.6 + ∫5 0.06595 e- t / 10 dt = 0.6 + 0.6595e-5/10 - 0.6595e-x/10, x > 5.
Require that: 0.9 = 0.6 + 0.6595e-5/10 - 0.6595e-x/10. ⇒ e-x/10 = 0.15164. ⇒ x = 18.86.
20 20
x = 20
40.11. C. E[X ∧ 20] - E[X ∧ 15] = ∫15 S(x) dx = 15∫ 0.6595 e- x / 10 dx = -6.595e - x / 10 ]
x = 15
= 0.579. Meriadoc reinsures the layer from 15 to 20, so it expects to pay: (138)(0.579) = 79.9.
40.12. D. Let the splice be: α Weibull[600, 3] below 500, and β Weibull[400, 2] above 500.
We will need to solve for these proportionality constants.
For a Weibull Distribution with θ = 600 and τ = 3: F(500) = 1 - exp[-(500/600)3 ] = 0.4394.
For a Weibull Distribution with θ = 400 and τ = 2: S(500) = exp[-(500/400)2 ] = 0.2096.
Thus, for the total probability to be one: α 0.4394 + β 0.2096 = 1.
For a Weibull Distribution with θ = 600 and τ = 3:
f(500) = (3)(5002 )exp[-(500/600)3 ] / 6003 = 0.0019466.
For a Weibull Distribution with θ = 400 and τ = 2:
f(500) = (2)(500) exp[-(500/400)2 ] / 4002 = 0.0013101.
Thus, for the splice to be continuous at 500: α 0.0019466 = β 0.0013101. ⇒ β = 1.4858α.
⇒ α 0.4394 + (1.4858α) 0.2096 = 1. ⇒ α = 1.332. ⇒ β = 1.979.

For a Weibull Distribution with θ = 600 and τ = 3: F(400) = 1 - exp[-(400/600)3 ] = 0.2564.
For a Weibull Distribution with θ = 400 and τ = 2: S(600) = exp[-(600/400)2 ] = 0.1054.
Thus for the splice, the probability in the interval from 400 to 600 is:
(1.332)(0.4394 - 0.2564) + (1.979)(0.2096 - 0.1054) = 45.0%.
Alternately, for the splice: F(400) = (1.332)(0.2564) = 0.3415.
S(600) = (1.979)(0.1054) = 0.2086. ⇒ F(600) = 0.7914
⇒ Probability in the interval from 400 to 600 is: 0.7914 - 0.3415 = 45.0%.
40.13. C. Let the splice be: a(Exponential), x < 5, and b(Pareto), x > 5.
The splice must integrate to unity from 0 to ∞:
1 = a(Exponential Distribution at 5) + b(1 - Pareto Distribution at 5). ⇒
1 = a(1 - e-5/3) + b(60/65)4 . ⇒ 1 = 0.8111a + 0.7260b.

The density of the Exponential is: e-x/3/3. f(5) = e-5/3/3 = 0.06296.
The density of the Pareto is: (4)(604 )/(x+60)5 . f(5) = (4)(604 )/(655 ) = 0.04468
Also in order for the splice to be continuous at 5:
a(Exponential density @ 5) = b(Pareto density @5) ⇒ a(0.06296) = b(0.04468).
⇒ b = 1.4091a. ⇒ 1 = 0.8111a + 0.7260(1.4091a). ⇒ a = 0.545.

⇒ the distribution function at 2 is: .545(Exponential Distribution at 2) = 0.545(1 - e-2/3) = 0.265.
40.14. C. Continuing the previous solution, b = 1.4091a = 0.768.

⇒ the survival function at 10 is: 0.768(Pareto survival function at 10) = 0.768(60/70)4 = 0.415.
Alternately, the distribution function at 10 is:
5 10
e- x / 3 (4)(604)
0.545 ∫0 3
dx + 0.768 ∫5 (x + 60)5
dx =
0.545 (Exponential distribution function at 5) +

0.768 (Pareto distribution function at 10 - Pareto distribution function at 5) =
0.545 (1 - e-5/3) + 0.768{(1 - (60/70)4 ) - (1 - (60/65)4 )}
= (0.545)(0.811) + (0.768)(0.186) = 0.585.
Therefore, the survival function at 10 is: 1 - 0.585 = 0.415.
5 ∞
e- x / 3 (4)(604 )
40.15. E. mean = 0.545 x
3 ∫0 dx + 0.768 ∫5 x
(x + 60)5
dx =
The first integral is for an Exponential Distribution: E[X ∧ 5] - 5S(5) = 3(1 - e-5/3) - 5e-5/3 = 1.49.
5
The second integral is for an Pareto Distribution: E[X] - ∫0 x fPareto(x) dx =
E[X] - {E[X ∧ 5] - 5S(5)} = E[X] - E[X ∧ 5] + 5S(5) = (60/3)(60/65)3 + (5)(60/65)4 = 19.36.
Thus the mean of the splice is: (0.545)(1.49) + (0.768)(19.36) = 15.68.
Alternately, the second integral is for an Pareto:
∞ ∞ ∞
∫5 x fPareto(x) dx = ∫5 (x- 5) fPareto(x) dx + 5 ∫5 fPareto(x) dx = SPareto(5) ePareto(5) + 5 SPareto(5)

5 + 60
= (60/65)4 + (5)(60/65)4 = 19.36. Proceed as before.
4 - 1
∞
x + θ
Comment: e(x) = ∫x (t - x) f(t) dt / S(x). For a Pareto Distribution, e(x) = α - 1 .
40.16. In 2008, S(2000) = (1000/2000)2 = 1/4.

For a Single Parameter Pareto Distribution, f(x) = αθα / xα+1, x > θ.
For those whose payments are less than $2000 per month, the payment is multiplied by 1.05.
Thus in 2009 they follow a Single Parameter Pareto with α = 2 and θ = $1050.
The density is proportional to: f(x) = 2 (10502 ) / x3 , x > 1050.
For those whose payments are $2000 or more per month, their payment is increased by
(2000)(5%) = 100. S(x) = {1000/(x-100)}2 , x > 2100.
The density is proportional to: f(x) = 2 (10002 ) / (x - 100)3 .
In 2009, the density is a splice, with 3/4 weight to the first component and 1/4 weight to the second
component. Someone with $2000 in 2008, will get $2100 in 2009; $2100 is the breakpoint of the
splice.
f(x) = 2 (10502 ) / x3 , x > 1050, would integrate from 1050 to 2100 to the distribution function of
at 2100 of a Single Parameter Pareto with α = 2 and θ = $1050, 1 - (1050/2100)2 = 3/4.
This is the desired weight for the first component, so this is OK.
f(x) = 2 (10002 ) / (x - 100)3 , would integrate from 2100 to ∞: (10002 ) / (2100 - 100)2 = 1/4.
This is the desired weight for the second component, so this is OK.
The probability density function of the size of monthly pension payments in 2009 is a splice:
⎧ 2 (10502 ) / x3 , 2100 > x > 1050
f(x) = ⎨ .
⎩ 2 (10002 ) / (x - 100) 3, x > 2100
Comment: Coming from the left, f(2100) = 2 (10502 ) / 21003 = 1/4200.

Coming from the right, f(2100) = 2 (10002 ) / (2100 - 100)3 = 1/4000. Thus the density of this splice
is not (quite) continuous at the breakpoint of 2100. A graph of this splice:
f(x)
0.0015
0.0010
0.0005
x
1050 2100
40.17. B. In order to integrate to one from 200 to infinity, the density of the exponential piece
would have to be: (exp[-x/500] / 500) / exp[-200/500] = exp[-(x-200)/500] / 500.
Let w be the weight given to the uniform piece and 1- w be the weight given to the exponential
piece. To be continuous at 200: w (1/200) = (1-w) exp[-(200-200)/500] / 500 = (1-w)/500. ⇒
w = 0.2857. ⇒ S(300) = (1 - 0.2857) exp[-(300-200)/500] = 0.5848. ⇒ F(300) = 0.4152.
40.18. B. The empirical survival function at 4 is: (110 + 70)/10,000 = 0.018.

Above 4 the splice is proportional to an Exponential with survival function e-x/1.5.
Let w be the weight applied to this Exponential. Matching S(4), set 0.018 = we-4/1.5. w = 0.259.
The contribution to the mean from the losses of size less than 4 is:
(3000 + 3500 + 2000 + 1000)/10,000 = 0.9500.
The contribution to the mean from the losses of size greater than 4 is:
∞ x =∞
∫4
0.259 x e- x / 1.5 / 1.5 dx = -0.259 (x e - x / 1.5 + 1.5e - x / 1.5 ]
)
x =4
= 0.0990.
mean = 0.9500 + 0.0990 = 1.0490.
∫0 218 (10 + x)4 dx =

617,400 617,400 1 1
40.19. D. F(2) = { - } = 0.3977.
218 (3) (10 ) (3) (123 )
3
S(2) = 1 - 0.3977 = 0.6023.

Comment: Via integration, one can determine that the first component of the splice has a total
probability of 3/5, while the second component of the splice has a total probability of 2/5.
40.20. E. One can use integration by parts.

x= 4
4 4
-x ⎤
∫ ∫
x 1 4 1 1
4 dx = 3 (10 + x)3 ⎥ + 3 dx = - - +
2 (2) (3) (102 )
(10 + x) ⎦ 3 (10 + x) 3
(3) (14 ) (2) (3) (14 )
0 x= 0 0
= 0.00033042.
4
∫0
617,400 617,400
Thus x dx = 0.00033042 = 0.9358.
218 (10 + x)4 218
x= ∞
]
∞ ∞
∫
4 1
∫4
x -x 1
dx = + dx = + = 9/196.
(10 + x)3 2 (10 + x)2 2 (10 + x)2 (2) (142 ) (2) (14)
4
x= 4
∞
3920
∫4
3920
Therefore, x dx = (9/196) = 7.2000.
25 (10 + x)3 25
4 ∞
∫0 ∫4 x 25 (10 + x)3 dx = 0.9358 + 7.2000 = 8.1358.

617,400 3920
Thus, E[X] = x dx +
218 (10 + x)4
Alternately, each component of the splice is proportional to the density of a Pareto Distribution.
(3) (103 )
The density of a Pareto with α = 3 and θ = 10 is: .
(10 + x)4
617,400 1 2058
Thus the first component of the splice is: Pareto[3, 10] = Pareto[3, 10].
218 3000 2180
4
Now ∫0 x fPareto(x) dx = E[X ∧ 4] - 4 S(4) = (10/2) {1 - (10/14)2} - (4)(10/14)3 = 0.99125.
4
∫0 x 218 (10 + x)4 dx = 2180 0.99125 = 0.9358.

617,400 2058
Therefore,
(2) (102 )
The density of a Pareto with α = 2 and θ = 10 is: .
(10 + x)4
3920 1
Thus the second component of the splice is: Pareto[2, 10] = 0.784 Pareto[2, 10].
25 200
∞ ∞ ∞
Now ∫4 x fPareto(x) dx = ∫4 (x - 4) fPareto(x) dx + ∫4 4 fPareto(x) dx = e(4) S(4) + 4 S(4)
= (14/1) (10/14)2 + (4) (10/14)2 = 9.1837.
∞
∫4 x 25 (10 + x)3 dx = (0.784) (9.1837) = 7.2000.

3920
Therefore,
4 ∞
∫0 ∫4 x 25 (10 + x)3 dx = 0.9358 + 7.2000 = 8.1358.

617,400 3920
Thus, E[X] = x dx +
218 (10 + x)4
∞
x + θ
Comment: e(x) = ∫x (t - x) f(t) dt / S(x). For a Pareto Distribution, e(x) = α - 1 .
40.21. A. A uniform on [0, 3] has density of 1/3.

On the interval 3 to ∞, we want something proportional to an Exponential with θ = 4.
From 3 to ∞ this Exponential density would integrate to S(3) = e-3/4.
Therefore, something proportional that would integrate to one is: 0.25e-x/4/e-3/4 = 0.25e-(x-3)/4.
Thus the density of the splice is: w(1/3) from 0 to 3, and (1 - w)0.25e-(x-3)/4 from 3 to ∞.
In order to be continuous, the two densities must match at 3:
w(1/3) = (1 - w)0.25e-(3-3)/4. ⇒ 4w = 3(1 - w). ⇒ w = 3/7 = 0.429.
Probability of failure in the first 3 years is the integral of the splice from 0 to 3: w = 0.429.
40.22. C. For the Pareto, S(4000) = {12/(12 + 4)}2 = 9/16.

The portion of the splice above $4000 totals 1 - 0.6 = 40% probability.
Therefore, the portion of the splice above 4000 is: (0.4) Pareto[2, 12,000] / (9/16).
For the Pareto, S(25000) = {12/(12 + 25)}2 = 0.1052.
Therefore, for the splice the probability that losses are greater than $25,000 is:
(.4)(0.1052)/(9/16) = 0.0748. 1 - 0.0748 = 0.9252.
Comment: A Weibull with τ = 1 is an Exponential.
It is easier to calculate S(25,000), and then F(25,000) = 1 - S(25,000).
When working above the breakpoint of $4000, work with the Pareto.
If working below the breakpoint of $4000, work with the Weibull.
It might have been better if the exam question had read instead:
“A loss distribution is a two-component spliced model using a density proportional to a Weibull
distribution with θ1 = 1,500 and τ = 1 for losses up to $4,000, and a density proportional to a Pareto
distribution with θ2 = 12,000 and α = 2 for losses $4,000 and greater.”

The density above 4000 is proportional to a Pareto.
The original Pareto integrates to 9/16 from 4000 to infinity.
In order to get the density from 4000 to infinity to integrate to the desired 40%, we need to multiply
the density of the original Pareto by: 40% / (9/16).
2016-C-2, Loss Distributions, §41 Extreme Value Distributions HCM 10/21/15, Page 930
Section 41, Extreme Value Distributions354
This section discusses two results in extreme value theory.

First I will go over some preliminary material on the maximum of a sample.
Distribution Function of the Maximum:
Assume one has a sample of N independent, identically distributed variables.

Then one might be interested in either the minimum or the maximum of the sample.
The minimum and maximum are examples of what are called order statistics.355
Maximum ≤ 1000. ⇔ All items are ≤ 1000.
If one draws 7 independent claims from the distribution function F(x), then the chance that all the
claims will be less than or equal to 1000, is F(1000)7 .
⇒ The chance that the maximum of the 7 claims is less than or equal to 1000, is F(1000)7 .
In general, given N claims, the chance that the maximum claim is less than or equal to x is: F(x)N.
The distribution for the maximum of N claims sizes is F(x)N .
Distribution Function of the Minimum:
Minimum ≥ 100. ⇔ All items are ≥ 100.
If one draws 7 independent claims from the distribution function F(x), then the chance that all the
claims will be greater than 100, is S(100)7 .
⇒ The chance that the minimum of the 7 claims is greater than 100, is S(100)7 .
Therefore, the chance that the minimum of the 7 claims is less than or equal to 100 is:
1 - S(100)7 .
The distribution of the minimum of the N claims sizes is: 1 - S(x)N.
354
See Section 5.3.4. of Loss Models, Two Heavy-Tailed Distributions, which was added to the syllabus for the
October 2013 sitting.
There is one corresponding new learning outcome, A8: Identify and describe two extreme value distributions.
ThIs subsection in the fourth edition of Loss Models is less than a page. I do not think that they can ask very much.
The 3rd edition had section 5.6, about 18 pages long, that covered this material in detail; it was not on the syllabus.
355
Order Statistics is on the syllabus of CAS Exam 3ST.
Inverse Weibull Distribution:356
The Inverse Weibull has scale parameter θ and shape parameter τ.

The Inverse Weibull is heavier-tailed than the Weibull; the moments of the Inverse Weibull only
exist for k < τ, while the Weibull has all of its (positive) moments exist.
If X follows an Weibull Distribution with parameters 1 and τ, then θ/X follows an Inverse Weibull with
parameters θ and τ. The Inverse Weibull Distribution is a special case of the Inverse Transformed
Gamma Distribution with α = 1.
⎛ θ ⎞τ
F(x) = exp -⎜ ⎟[ ]
⎝ x⎠
, x > 0.
f(x) =
xτ + 1
[ ]
exp - ⎜ ⎟ .
⎝ x⎠
E[X] = θ Γ[1 - 1/τ], 1 < τ.

E[Xk] = θk Γ[1 - k/τ], k < τ.
VaRp [X] = θ (-ln[p])-1/τ.
⎛ τ ⎞ 1/ τ
mode = θ .
⎝ τ + 1⎠
Exercise: What is distribution of the maximum of a sample of size N from an Inverse Weibull?
⎛ θ⎞ τ N ⎛ θ⎞ τ ⎛ θ N1/ τ ⎞ τ
[Solution: F(x)N = exp - [ ⎝ x⎠] [
= exp -N
⎝ x⎠ ]
= exp - [
⎝ x ⎠
. ]
Another Inverse Weibull but with τ and N1/τ θ.]
The Frechet Distribution is an Inverse Weibull, with the addition of a location parameter, µ:357
⎛ θ ⎞τ
F(x) = exp - ⎜ [ ⎟
⎝ x - µ⎠
, x > µ.]
356
357
If µ = 0, then this reduces to the Inverse Weibull Distribution as per Appendix A of Loss Models.
Fisher-Tippett Theorem:
The limit as the sample size increases of the maximum (properly scaled) is one of three
possible distributions; for actuarial applications the one of interest is the Frechet
Distribution, which is the Inverse Weibull Distribution from Appendix A of Loss Models.
This result can be applied to either severity or aggregate losses.

Even if you do not know exactly what distribution severity (or aggregate loss) follows, as the
sample size gets large, the maximum is approximately distributed as per an Inverse Weibull.
Thus, the heavy-tailed Inverse Weibull Distribution is a good candidate to model the maximum.
Fisher-Tippett Theorem, More Detail:358
According to the 3rd edition of Loss Models, for a sample from an Exponential Distribution, in the
limit the maximum (properly scaled) follows a Gumbel Distribution, F(x) = exp[-exp(-x/θ)], not the
Frechet Distribution.
Samples from lighter tailed distributions such as the Exponential, Gamma, Weibull, LogNormal, and
Inverse Gaussian have the distribution of their maximum (properly scaled) approach the Gumbel
Distribution.
Samples from heavier tailed distributions such as the Pareto and Inverse Gamma have the
distribution of their maximum (properly scaled) approach the Frechet Distribution, in other words the
Inverse Weibull Distribution. We saw that for the Inverse Weibull Distribution, the distribution of the
maximum was another Inverse Weibull.
358
Not on the syllabus.
Fisher-Tippett Theorem, An Example:359
⎛ θ ⎞α -α
Take a sample of size n from a Pareto Distribution: F(x) = 1 - ⎜ ⎟ = 1 - (1 + x / θ) .
⎝ θ + x⎠
Let Mn be the maximum for a sample of size n.

We will take the limit of the distribution function of: (Mn - bn ) / an ,
where an and bn are normalizing constants.
For the Pareto Distribution, we take: an = θ n1/α / α, and bn = θ n1/α - θ.
We will show that the limit as n approaches infinity of the distribution of (Mn - bn ) / an
is a Frechet Distribution.
Prob[(Mn - bn ) / an ≤ x] = Prob[Mn ≤ an x + bn ] = F(an x +bn )n .
-α
(
F(an x + bn ) = 1 - 1 + {an x + bn } / θ ) = 1 - {1 + x n1/α / α + n1/α - 1}−α
= 1 - {x n1/α / α + n1/α }−α = 1 - (1/n) {1 + x / α }−α.
Thus F(an x +bn )n = (1 - (1/n) {1 + x / α }−α)n .
Now in general the limit as n approaches infinity of (1 + c/n)n is exp[c].

Thus the limit as n approaches infinity of Prob[(Mn - bn ) / an ≤ x] = F(an x +bn )n , is:
⎛ α ⎞α
exp[-{1 + x / α }−α] = exp[- ].
⎝x + α⎠
This is a Frechet Distribution, including location parameter, as was to be shown.
359
The 4th edition of Loss Models provides no details or examples. This example is from the 3rd edition.
Excess Loss Variable:
As discussed previously, the Excess Loss Variable for d is defined for X > d as X-d,
and is undefined for X ≤ d.
Excess Loss Variable for d. ⇔ The nonzero payments excess of a deductible of d.
If the ground-up losses follow an Exponential, then the Excess Loss Variable follows the same
Exponential.
If the ground-up losses follow a Pareto Distribution, then the Excess Loss Variable follows another
Pareto Distribution, but with θ + d rather than θ.
The mean of the excess loss variable is the mean excess loss, e(d).
The Tail Value at Risk, TVaRp [X] = πp + e[πp ], where πp is the pth percentile.360
Balkema-de Haan-Pickands Theorem:
As the truncation point d increases towards infinity, the excess loss variable (properly
scaled) approaches one of three distributions; for insurance applications this will be
either the Exponential Distribution or the Pareto Distribution.361 362
If we take samples from a lighter-tailed distribution, then the limit of the excess loss variable is an
If we instead take samples from a heavier-tailed distribution, then the limit of the excess loss variable
is a Pareto Distribution.
Therefore, for excess-of-loss reinsurance contracts over a high retention (deductible),

a Pareto Distribution is likely to be a good model of the payments made by the reinsurer prior to the
effect of any limit.
360
See “Mahlerʼs Guide to Risk Theory.”
361
The third possibility is a Beta Distribution, including a location parameter added.
362
Note that the Exponential and Pareto each had the property that they were preserved when one took the excess
loss variable; in other words the excess loss variable was the same type of distribution.
Problems:
41.1 (2 points) You take a sample of size 10 from an Exponential Distribution with θ = 50.
Determine the probability that the maximum is less than 90.
A.12% B. 14% C. 16% D. 18% E. 20%
41.2 (2 points) You take a sample of size N from a distribution.

Discuss the distribution of the maximum of the sample as N approaches infinity.
41.3 (1 point) You model the maximum of a sample via an Inverse Weibull Distribution with
θ = 1000 and τ = 3. Determine the probability that the maximum is greater than 2000.
A.4% B. 6% C. 8% D. 10% E. 12%
41.4 (2 points) Discuss the distribution of the excess loss variable as the left truncation point
41.5 (2 points) For a sample of size 6 from a Pareto Distribution with α = 3 and θ = 80,
determine the probability that the maximum is more than 100.
A.42% B. 44% C. 46% D. 48% E. 50%
41.1. C. For this Exponential, F(90) = 1 - e-90/50 = 0.8347.

Prob[maximum < 90] = Prob[all 10 items are less than 90] = F(90)10 = 16.4%.
41.2. According to the Fisher-Tippett Theorem, the limit as the sample size increases of the
maximum (properly scaled) is one of three possible distributions; for actuarial applications the one of
interest is the Frechet Distribution, which is the Inverse Weibull Distribution from Appendix A of
Loss Models.
Thus, the heavy-tailed Inverse Weibull Distribution is a good candidate to model the maximum.
41.3. E. F(2000) = exp[-(1000/2000)3 ] = 0.8825. S(2000) = 1 - 0.8825 = 11.75%.
41.4. According to the Balkema-de Haan-Pickands Theorem, as the left truncation point increases
towards infinity, the excess loss variable (properly scaled) approaches one of three distributions; for
insurance applications this will be either the Exponential Distribution or the Pareto Distribution. If we
take samples from a lighter-tailed distribution, then the limit is an Exponential.
If we instead take samples from a heavier-tailed distribution, then the limit is a Pareto.
41.5. A. For this Pareto, F(100) = 1 - (80/180)3 = 0.9122.

Prob[maximum ≤ 100] = Prob[all 6 items are ≤ 100] = F(100)6 = 0.5762.
Prob[maximum > 100] = 1 - 0.5762 = 42.4%.
2016-C-2, Loss Distributions, §42 Relationship to Life Con. HCM 10/21/15, Page 937
Section 42, Relationship to Life Contingencies
Many of the ideas discussed with respect to Loss Distributions apply to Life Contingencies and
vice-versa. For example, as discussed previously, the mean residual life (complete expectation of
life) and the mean excess loss are mathematically equivalent. Similarly, as discussed previously, the
hazard rate and force of mortality are two names for the same thing.
One can relate the notation used in Life Contingencies to that used in Loss Distributions.
ps and qs:363
The probability of survival past time 70 + 10 = 80, given survival past time 70, is 10p 70.
The probability of failing at or before time 70 + 10 = 80, given survival past time 70, is 10q70.
10p 70 + 10q70 = 1.
In general, y-xpx ≡ Prob[Survival past y | Survival past x] = S(y)/S(x).
y-xqx ≡ Prob[Not Surviving past y | Survival past x] = {S(x) - S(y)}/S(x) = 1 - y-xpx .
Also px ≡ 1 px = Prob[Survival past x+1 | Survival past x] = S(x+1)/S(x).
qx ≡ 1 qx = Prob[Death within one year | Survival past x] = 1 - S(x+1)/S(x).
Exercise: Estimate 100p 50 and 300q100, given the following 10 values:

22, 35, 52, 69, 86, 90, 111, 254, 362, 746.
[Solution: S(50) = 8/10. S(150) = 3/10. 100p 50 = S(150)/S(50) = (3/10)/(8/10) = 3/8.
S(100) = 4/10. S(400) = 1/10. 300q100 = 1 - S(400)/S(100) = 3/4.]
t|uqx ≡ Prob[x+t < time of death ≤ x+t+u | Survival past x]

= {S(x+t) - S(x+t+u)}/S(x).
Note that t is the time delay, while u is the length of the interval whose probability we measure.
Exercise: In the previous exercise, estimate 100|200q70.

[Solution: 100|200q70 = {S(170) - S(370)} / S(70) = {(3/10) - (1/10)} / (6/10) = 1/3.]
363
See Section 3.2.2 of Actuarial Mathematics.
Variance of ps and qs:364
With the 10 values: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746, the estimate of
100p 50 = S(150) / S(50) = (3/10) / (8/10) = 3/8 = (number > 150)/(number > 50).
Conditional on having 8 values greater than 50, the number of values greater than 150 is Binomial
with m = 8 and q = 100p 50, and variance: 8 100p 50(1 - 100p 50) = 8 100p 50 100q50.
However, given 8 values greater than 50, 100p 50 = (number > 150)/8. ⇒
Var[100p 50 | S(50) = 8/10] = 8 100p 50 100q50 /82 = 100p 50 100q50 / 8 = (3/8)(5/8)/8 = (3)(5)/83 .
Let nx ≡ number of values greater than x.
Then by the above reasoning, y-xp x = ny/nx, and Var[y-xp x | nx] = ny(nx - ny)/nx3 .
Since y-xqx = 1 - y-xp x, Var[y-xqx | nx] = Var[y-xp x | nx] = ny(nx - ny)/nx3 .
Exercise: Estimate Var[30q70 | 6 values greater than 70], given the following 10 values:
22, 35, 52, 69, 86, 90, 111, 254, 362, 746.
[Solution: 30p 70 = S(100)/S(70) = (4/10)/(6/10) = 2/3. 30q70 = 1/3.
Var[30q70 | 6 values greater than 70] = 30p 70 30q70 /6 = (2/3)(1/3)/6 = 1/27.
Alternately, Var[30q70 | n70 = 6] = n100(n70 - n100)/n703 = (4)(6 - 4)/63 = 1/27.]
Central Death Rate:365
Probability of dying from age x to age x + 1

The central death rate, mx =
expected years lived from x to x + 1
S(x) - S(x +1) Probability of loss of size x to x + 1

= = .
x+1 layer of loss from x to x + 1
∫x S(t) dt
Probability of dying from age x to age x + n S(x) - S(x + n)
n mx = = x+n
expected years lived from x to x + n
S(t) dt
∫x
Probability of loss of size x to x + n
= .
layer of loss from x to x + n
364
365
Problems:

Mortality follows a Weibull Distribution with θ = 70 and τ = 4.
42.1 (1 point) Determine q60.

A. 0.028 B. 0.030 C. 0.032 D. 0.034 E. 0.036
42.2 (1 point) Determine p80.

A. 0.90 B. 0.92 C. 0.94 D. 0.96 E. 0.98
42.3 (1 point) Determine 10q65.

A. 0.42 B. 0.44 C. 0.46 D. 0.48 E. 0.50
42.4 (1 point) Determine 13p 74.

A. 0.28 B. 0.30 C. 0.32 D. 0.34 E. 0.36
42.5 (2 points) Determine 10|5q62.

A. 0.18 B. 0.20 C. 0.22 D. 0.24 E. 0.26
42.6 (165, 5/87, Q.9) (2.1 points) Mortality follows a Weibull Distribution with parameters θ and τ.
q0 = 0.09516. q1 = 0.25918. Determine q2 .
A. 0.37 B. 0.39 C. 0.41 D. 0.43 E. 0.45
42.7 (CAS3, 11/07, Q.30) (2.5 points) Survival follows a Weibull Distribution.
Given the following:
• µ(x) = kx2 , k > 0, x ≥ 0 defines the hazard rate function.
• 3q2 = 0.68963.
Calculate 2|q2 .
42.1. E. q 60 = 1 - p60 = 1 - S(61)/S(60) = 1 - exp[-(61/70)4 ]/exp[-(60/70)4 ] =
1 - e-0.03689 = 0.0362.
42.2. B. p 80 = S(81)/S(80) = exp[-(81/70)4 ]/exp[-(80/70)4 ] = e-0.08691 = 0.917.
42.3. B. 10q65 = 1 - 10p 65 = 1 - S(75)/S(65) = 1 - exp[-(75/70)4 ]/exp[-(65/70)4 ] =
1 - e-.5743 = 0.437.
42.4. C. 13p 74 = S(87)/S(74) = exp[-(87/70)4 ]/exp[-(74/70)4 ] = e-1.1372 = 0.321.
42.5. A. 10|5q62 = {S(72) - S(77)}/S(62) = {exp[-(72/70)4 ] - exp[-(77/70)4 ]}/exp[-(62/70)4 ] =

(0.32652 - 0.23129) / 0.54041 = 0.176.
42.6. B. S(x) = exp[-(x/θ)τ].
q0 = {S(0) - S(1)}/S(0) = 1 - exp[ -1/θτ]. ⇒ exp[ -1/θτ] = 1 - 0.09516 = 0.90484.
q1 = {S(1) - S(2)}/S(1) = 1 - exp[-(2/θ)τ]/exp[ -1/θτ].
⇒ exp[-(2/θ)τ]/exp[ -1/θτ] = 1 - 0.25918 = 0.74082.

⇒ exp[-(2/θ)τ] = (0.90484)(0.74082) = 0.67032.
Therefore, 1/θτ = -ln(0.90484) = 0.100, and (2/θ)τ = -ln(0.67032) = 0.400.
Dividing the two equations: 2τ = 4. ⇒ τ = 2. ⇒ θ = 10 .

S(x) = exp[-x2 /10]. q2 = 1 - S(3)/S(2) = 1 - e-0.9/e-0.4 = 1 - e-0.5 = 0.3935.
∫
42.7. D. H(x) = h(x) = k x3 /3. S(x) = exp[-H(x)] = exp[- k x3 /3].
0.68963 = 3 q2 = 1 - S(5)/S(2) = 1 - exp[-125k/3]/exp[-8k/3].
⇒ 0.31037 = exp[-117k/3]. ⇒ k = 0.03. ⇒S(x) = exp[-x3 /100].

2|q 2 = {S(4) - S(5)}/S(2) = (e-.64 - e-1.25)/ = e-0.08 = 0.261.
2016-C-2, Loss Distributions, §43 Gini Coefficient HCM 10/21/15, Page 941
Section 43, Gini Coefficient366
The Gini Coefficient or coefficient of concentration is a concept that comes up for example in
economics, when looking at the distribution of incomes. This section will discuss the Gini coefficient
and relate it to the Relative Mean Difference.
The Gini coefficient is a measure of inequality. For example if all of the individuals in a group have the
same income, then the Gini coefficient is zero. As incomes of the individuals in a group became
more and more unequal, the Gini coefficient would increase towards a value of 1. The Gini coefficient
has found application in many different fields of study.
Mean Difference:
Define the mean difference as the average absolute difference between two random draws from a
distribution.
Mean Difference = ∫ ∫ | x - y | f(x) f(y) dx dy ,

where the double integral is taken over the support of f.
For example, for a uniform distribution from 0 to 10:

10 10
Mean Difference = ∫0 ∫0 | x - y | (1/ 10) (1/ 10) dx dy =
10 10 10 y
(1/100) ∫0 x=y∫ x - y dx dy + (1/100) ∫0 x=0∫ y - x dx dy =
10 10
(1/100) ∫0 50 - y2 / 2 + y2 - 10y dy + (1/100) ∫0 y2 - y2 / 2 dy =
(1/100) (500 + 1000/6 - 500) + (1/100)(1000/6) = 10/3.
In a similar manner, in general for the continuous uniform distribution, the mean difference is:
(width)/3.367
366
Not on the syllabus of your exam.
367
For a sample of size two from a uniform, the expected value of the minimum is the bottom of the interval plus
(width)/3, while the expected value of the maximum is the top of the interval - (width)/3. Thus the expected absolute
difference is (width)/3. This is discussed in order statistics, on the Syllabus of Exam CAS ST.
Exercise: Compute the mean difference for an Exponential Distribution.
[Solution: Mean difference = ∫ ∫ | x - y | e- x / θ / θ dx e - y / θ / θ dy =
∞ y ∞ ∞
(1/θ2)
∫0 e- y / θ
∫x=0 (y - x) e- x / θ dx dy + (1/θ2) ∫0 e -y / θ
∫x=y (x - y) e- x / θ dx dy =
∞
(1/θ) ∫0 e- y / θ {y (1- e- y / θ ) + θe - y / θ + y e- y / θ - θ} dy +
∞
(1/θ) ∫0 e- y / θ {θe - y / θ + y e- y / θ - y e- y / θ } dy =
∞
∫0 y e- y / θ / θ + 2 e - 2y / θ - e - y / θ dy = θ + θ - θ = θ.
Alternately, by symmetry the contributions from when x > y and when y > x must be equal.
∞ y
Thus, the mean difference is: (2) (1/θ2) ∫0 e- y / θ
∫x=0 (y - x) e- x / θ dx dy =
∞
(2/θ) ∫0 e- y / θ {y (1- e- y / θ ) + θe - y / θ + y e- y / θ - θ} dy =
∞
2 ∫0 y e- y / θ / θ + e - 2y / θ - e- y / θ dy = (2)(θ + θ/2 - θ) = θ.
Comment: ∫ x e - x / θ dx = -θ (x + θ) e-x/θ.
For a sample of size two from an Exponential Distribution, the expected value of the minimum
is θ/2, while the expected value of the maximum is 3θ/2.
Therefore, the expected value of the difference is θ.]
Mean Relative Difference:
mean difference
The mean relative difference of a distribution is defined as: .
mean
(width) / 3
For the uniform distribution, the mean relative difference is: = 2/3.
(width) / 2
θ
For the Exponential Distribution, the mean relative difference is: = 1.
θ
Exercise: Derive the form of the Mean Relative Difference for a Pareto Distribution.
x (a- 1) + θ
∫
x -1
Hint: dx =
(x+ θ)a (x + θ)a - 1 (a - 1)(a - 2)
α θα
[Solution: For α > 1, E[X] = θ/(α-1). f(x) = .
(θ + x)α + 1
α θα α θα
Mean difference = ∫∫ |x-y |
(θ + x ) α + 1
dx
(θ + y)α + 1
dy .
By symmetry the contributions from when x > y and when y > x must be equal.
∞ ∞
∫ ∫
(x - y) 1
Therefore, mean difference = 2 α2 θ2α dx dy .
(θ + x)α + 1 (θ + y)α + 1
y=0 x=y
∞
yα + θ
∫
x
Now using the hint: dx = .
(θ + x)α + 1 α (α - 1) (θ + y)α
x=y
∫x=y (θ + x)α + 1 dx = α (θ + y)α .

1 1
∞
yα + θ
∫
x - y y 1
Therefore, dx = - = .
(θ + x)α + 1 α (α - 1) (θ + y)α α (θ + y)α α (α - 1) (θ + y)α -1
x=y
∞
2 α θ2α 2 α θ2α 2αθ
∫0
1 1
Thus, mean difference = α dy = = .
α -1 (θ + y) 2 α -1 (2α - 1) θ2α - 1 (α - 1) (2α - 1)
E[X] = θ/(α-1). Thus, the mean relative difference is: 2 / (2α - 1), α > 1.]
Lorenz Curve:
Assume that the incomes in a country follow a distribution function F(x).368

Then F(x) is the percentage of people with incomes less than x.
x x
The income earned by such people is: ∫0 t f(t) dt = E[X ∧ x] - x S(x) = ∫0 S(t) dt .
The percentage of total income earned by such people is:

x
∫ y f(y) dy E[X ∧ x] - x S(x)

0 = .
E[X] E[X]
∫ y f(y) dy E[X ∧ x] - x S(x) 369

Define G(x) = 0 = .
E[X] E[X]
For example, assume an Exponential Distribution.

Then F(x) = 1 - e-x/θ.
E[X ∧ x] - x S(x) θ (1 - e- x / θ ) - x e - x / θ
G(x) = = = 1 - e-x/θ - (x/θ) e-x/θ.
E[X] θ
Let t = F(x) = 1 - e-x/θ. Therefore, x/θ = - ln(1 - t).370

Then, G(t) = t - {-ln(1-t)} (1-t) = t + (1-t) ln(1-t).
368
Of course, the mathematics applies regardless of what is being modeled.
The distribution of incomes is just the most common context.
369
This is not standard notation. I have just used G to have some notation.
370
This is just the VaR formula for the Exponential Distribution.
Then we can graph G as a function of F:

G(x)
1.0
0.8
0.6
0.4
0.2
F(x)
0.2 0.4 0.6 0.8 1.0
This curve is referred to as the Lorenz curve or the concentration curve.
Since F(0) = 0 = G(0) and F(∞) = 1 = G(∞), the Lorenz curve passes through the points (0, 0) and
(1, 1). Usually one would also include in the graph the 45° reference line
connecting (0, 0) and (1, 1), as shown below:
% of income
1.0
0.8
0.6
0.4
Lorenz Curve
0.2
% of people
0.2 0.4 0.6 0.8 1.0
∫ y f(y) dy
G(t) = G[F(x))] = 0 .
E[X]
dG dG dF x f(x) x
= / = / f(x) = > 0.
dt dx dx E[X] E[X]
d2 G 1 dx dF 1
2 = / = > 0.
dt E[X] dx dx E[X] f(x)
Thus, in the above graph, as well as in general, the Lorenz curve is increasing and concave up.
The Lorenz curve is below the 45° reference line, except at the endpoints when they are equal.
The vertical distance between the Lorenz curve and the 45° comparison line is: F - G.
dF dG
Thus, this vertical distance is a maximum when: 0 = - .
dF dF
dG x
⇒ = 1. ⇒ = 1. ⇒ x = E[X].
dF E[X]
Thus the vertical distance between the Lorenz curve and the 45° comparison line is a maximum at
the mean income.
Exercise: If incomes follow an Exponential Distribution, what is this maximum vertical distance
between the Lorenz curve and the 45° comparison line?
[Solution: The maximum occurs when x = θ.
F(x) = 1 - e-x/θ. From previously, G(x) = 1 - e-x/θ - (x/θ) e-x/θ.
F - G = (x/θ) e-x/θ. At x = θ, this is: e-1 = 0.3679.]

Exercise: Determine the form of the Lorenz Curve, if the distribution of incomes follows a Pareto
Distribution, with α > 1.
⎛ θ ⎞α θ θ ⎧ ⎛ θ ⎞ α− 1⎫
[Solution: F(x) = 1 - ⎜ ⎟ . E[X] = . E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬.
⎝ θ + x⎠ α −1 α −1 ⎩ ⎝ θ + x⎠ ⎭
α -1
θ ⎛ θ ⎞
{1 - } - x S(x) α -1
E[X ∧ x] - x S(x) α - 1 ⎝ θ + x⎠ ⎛ θ ⎞ x
G(x) = = =1 - - (α-1) S(x).
E[X] θ / (α -1) ⎝ θ + x⎠ θ
⎛ θ ⎞α ⎛ θ ⎞α
Let t = F(x) = 1 - ⎜ ⎟ .⇒ = S(x) = 1 - t. Also, x/θ = (1 - t)-1/α - 1.371
⎝ θ + x⎠ ⎝ θ+x ⎠
Therefore, G(t) = 1 - (1 - t)(α-1)/α - (α-1){(1 - t)-1/α - 1} (1 - t) = t + α - tα - α (1-t)1-1/α, 0 ≤ t ≤ 1.

Comment: G(0) = α - α = 0. G(1) = 1 + α - α - 0 = 1.]
Here is graph comparing the Lorenz curves for Paretos with α = 2 and α = 5:
% of income
1.0
0.8
0.6
0.4
alpha = 5
alpha = 2
0.2
% of people
0.2 0.4 0.6 0.8 1.0
371
This is just the VaR formula for the Pareto Distribution.
The Pareto with α = 2 has a heavier righthand tail than the Pareto with α = 5. If incomes follow a
Pareto with α = 2, then there are more extremely high incomes compared to the mean, than if
incomes follow a Pareto with α = 5. In other words, if α = 2, then income is more concentrated in the
high income individuals than if α = 5.372
The Lorenz curve for α = 2 is below that for α = 5. In general, the lower curve corresponds to a
higher concentration of income. In other words, a higher concentration of income corresponds to a
smaller area under the Lorenz curve. Equivalently, a higher concentration of income corresponds to a
larger area between the Lorenz curve and the 45° reference line.
Gini Coefficient:
This correspondence between areas on the graph of the Lorenz curve the concentration of income is
the idea behind the Gini Coefficient.
Let us label the areas in the graph of a Lorenz Curve, in this case for an Exponential Distribution:
% of income
1.0
0.8
0.6
A
0.4
B
0.2
% of people
0.2 0.4 0.6 0.8 1.0
Area A
Gini Coefficient = .
Area A + Area B
372
An Exponential Distribution has a lighter righthand tail than either Pareto. Thus if income followed an Exponential,
it would less concentrated than if it followed any Pareto.
However, Area A + Area B add up to a triangle with area 1/2.

Area A
Therefore, Gini Coefficient = = 2A = 1 - 2B.
Area A + Area B
For the Exponentials Distribution, the Lorenz curve was: G(t) = t + (1-t) ln(1-t).
1 1
Thus, Area B = area under Lorenz curve = ∫0 t + (1- t) ln(1- t) dt = 1/2 + ∫0 s ln(s) ds .
Applying integration by parts,
1 s =1 1
∫0 s ln(s) ds = (s2 / 2) ln(s)s ]= 0 - ∫0 (s2 / 2) (1/ s) ds = 0 - 1/4 = -1/4.

Thus Area B = 1/2 - 1/4 = 1/4.
Therefore, for the Exponential Distribution, the Gini Coefficient is: 1 - (2)(1/4) = 1/2.
Recall that for the Exponential Distribution, the mean relative difference was 1.
As will be shown subsequently, in general, Gini Coefficient = (mean relative difference)/2.
Therefore, for the Uniform Distribution, the Gini Coefficient is: (1/2)(2/3) = 1/3.
Similarly, for the Pareto Distribution, the Gini Coefficient is: (1/2){2 / (2α - 1)} = 1 / (2α - 1), α > 1.
We note that the Uniform with the lightest righthand tail of the three has the smallest Gini coefficient,
while the Pareto with the heaviest righthand tail of the three has the largest Gini coefficient.
Among Pareto Distributions, the smaller alpha, the heavier the righthand tail, and the larger the
Gini Coefficient.373
The more concentrated the income is among the higher earners, the larger the Gini coefficient.
373
As alpha approaches one, the Gini coefficient approaches one.
LogNormal Distribution:
For the LogNormal Distribution: E[X] = exp[µ + σ2/2].

⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤
E[X ∧ x] = exp(µ + σ2/2) Φ ⎢ ⎥ + x {1 - Φ ⎢ ⎥⎦ }
⎣ σ ⎦ ⎣ σ
⎡ ln(x) − µ − σ2 ⎤
= E[X] Φ ⎢ ⎥⎦ + x S(x).
⎣ σ
E[X ∧ x] - x S(x) ⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) - µ ⎤

Therefore, G(x) = = Φ⎢ ⎥ = Φ⎢ - σ⎥.
E[X] ⎣ σ ⎦ ⎣ σ ⎦
⎡ ln(x) - µ ⎤
Let t = F(x) = Φ ⎢ ⎥⎦ .
⎣ σ
Then the Lorenz Curve is: G(t) = Φ[ Φ-1[t] - σ].
For example, here a graph of the Lorenz curves for LogNormal Distributions with σ =1 and σ = 2:
% of income
1.0
0.8
0.6
0.4
sigma = 1
0.2
sigma = 2
% of people
0.2 0.4 0.6 0.8 1.0
As derived subsequently, for a LogNormal Distribution, the Gini Coefficient is: 2Φ[σ/ 2 ] - 1.
Here is a graph of the Gini Coefficient as a function of sigma:

Gini Coef
1.0
0.8
0.6
0.4
0.2
sigma
1 2 3 4 5
As sigma increases, the LogNormal has a heavier tail, and the Gini Coefficient Increases towards 1.
The mean relative distance is twice the Gini Coefficient: 4Φ[σ/ 2 ] - 2.

Derivation of the Gini Coefficient for the LogNormal Distribution:
In order to compute the Gini Coefficient, we need to compute area B.
% of income
1.0
0.8
0.6
A
0.4
B
0.2
% of people
0.2 0.4 0.6 0.8 1.0
1 1
B= ∫0 G(t) dt = ∫0 Φ[ Φ - 1[t] - σ] dt .
Let y = Φ-1[t]. Then t = Φ[y]. dt = φ[y] dy.
∞
B= ∫-∞ Φ[y - σ] φ[y] dy .
Now B is some function of σ.

∞
B(σ) = ∫-∞ Φ[y - σ] φ[y] dy .
∞ y =∞
∫-∞ Φ[y] φ[y] dy = Φ[y] /2]

2
B(0) = = 1/2.
y = -∞
∞
B(σ) = ∫-∞ Φ[y - σ] φ[y] dy . Taking the derivative of B with respect to sigma:
∞ ∞
∫ ∫-∞ exp[-(y - σ) 2 / 2] exp[-y 2 / 2] dy

1
Bʼ(σ) = - φ[y - σ] φ[y] dy = -
2π
-∞
∫-∞ exp[-(2y2 - 2σy) / 2] dy

1
=- exp[-σ2/2]
2π
∫-∞ exp[-{(
1
=- exp[-σ2/4] 2 y) 2 - 2( 2 y)(σ / 2 ) + (σ / 2 ) 2} / 2] dy
2π
∫-∞ exp[-{
1
=- exp[-σ2/4] 2 y - σ / 2 } 2 / 2] dy .
2π
Let x = 2 y-σ/ 2 . ⇒ dy = dx / 2.
∞
exp[-x 2 / 2]
∫
1 1
Bʼ(σ) = - exp[-σ2/4] dx = - exp[-σ2/4].374
2 π 2π 2 π
-∞
Now assume that B(σ) = c - Φ[σ/ 2 ], for some constant c.

1 1
Then Bʼ(σ) = -φ[σ/ 2 ] / 2 = - exp[-(σ/ 2 )2 /2] / 2 =- exp[-σ2/4], matching above.
2π 2 π
Therefore, we have shown that B(σ) = c - Φ[σ/ 2 ].
However, B(0) = 1/2. ⇒ 1/2 = c - 1/2. ⇒ c = 1. ⇒ B(σ) = 1 - Φ[σ/ 2 ]. 375
Thus the Gini Coefficient is: 1 - 2B = 2Φ[σ/ 2 ] - 1.
374
Where I have used the fact that the density of the Standard Normal integrates to one over its support from -∞ to ∞.
∞
375
In general, ∫-∞ Φ[a + by] φ[y] dy = Φ[a / 1 + b2 ).
For a list of similar integrals, see http://en.wikipedia.org/wiki/List_of_integrals_of_Gaussian_functions

Proof of the Relationship Between the Gini Index and the Mean Relative Difference:376
I will prove that: Gini Coefficient = (mean relative difference) / 2.
As a first step, let us look at a graph of the Lorenz Curve with areas labeled:
% of income
1.0
0.8
C
0.6
A
0.4
B
0.2
% of people
0.2 0.4 0.6 0.8 1.0
A + B = 1/2 = C.
B is the area on the Lorenz curve: ∫ G dF .

Area B is the area between the Lorenz curve and the horizontal axis.
We can instead look at: C + A = area between the Lorenz curve and the vertical axis = ∫ F dG .
Therefore, we have that:∫ F dG - ∫ G dF = C + A - B = 1/2 + A - (1/2 - A) = 2 A.
⇒ Area A = (1/2) { ∫ F dG - ∫ G dF }.
∫ F dG - ∫ G dF .
Area A
⇒ Gini Coefficient = = 2A =
Area A + Area B
376
Based on Section 2.25 of Volume I of Kendallʼs Advanced Theory of Statistics, not on the syllabus.
∫ y f(y) dy x f(x) dx
Recall that G(x) = 0 . ⇒ dG = .
E[X] E[X]
∞ ∞
∫ F dG - ∫ ∫0 F(s) s f(s) ds - ∫0 G(s) f(s) ds =

1
Therefore, Gini Coefficient = G dF =
E[X]
∞ s ∞s ∞s
∫0 ∫0 ∫0 ∫0 ∫0 ∫0 (s - t) f(t) dt f(s) ds .
1 1 1
s f(t) dt f(s) ds - t f(t) dt f(s) ds =
E[X] E[X] E[X]
∞s
∫0 ∫0 (s - t) f(t) dt f(s) ds is the contribution to the mean distance from when s > t.
By symmetry it is equal to the contribution to the mean distance from when t > s.
∞s
Therefore, 2 ∫0 ∫0 (s - t) f(t) dt f(s) ds = mean distance.
(mean difference) / 2
⇒ Gini Coefficient = = (mean relative difference) / 2.
E[X]
Problems:
43.1 (15 points) The distribution of incomes follows a Single Parameter Pareto Distribution, α > 1.
a. (3 points) Determine the mean relative distance.
b. (3 points) Determine the form of the Lorenz curve.
c. (3 points) With the aid of a computer, draw and compare the Lorenz curves for α = 1.5 and α = 3.
d. (3 points) Use the form of the Lorenz curve to compute the Gini coefficient.
e. (3 point) If the Gini coefficient is 0.47, what percent of total income is earned
by the top 1% of earners?
43.2 (5 points) For a Gamma Distribution with α = 2, determine the mean relative distance.
Hint: Calculate the contribution to the mean difference from when x < y.
∫ x e- x / θ dx = -x e-x/θ θ - e-x/θ θ2.

∫ x2 e- x / θ dx = -x2 e-x/θ θ - 2x e-x/θ θ2 - 2e-x/θ θ3.
α
αθ
43.1. a. f(x) = , x > θ.
xα + 1
The contribution to the relative difference from when x > y is:
∞y ∞
α θα α θα ⎛ 1 -1 y ⎞
∫θ ∫0 ∫θ
1 1
(x - y) α + 1 dx α + 1 dy = α2 θ2α ⎜ + ⎟ dy =
x y ⎝ α -1 yα - 1 α yα ⎠ yα + 1
∞
⎛ 1 1⎞ α α
∫θ
1 1
α 2 θ2α - dy = θ2α = θ.
⎝α - 1 α⎠ y2 α α - 1 (2α - 1) θ2α - 1 (α -1) (2α - 1)
By symmetry this is equal to the contribution to the mean distance from when y > x.
α
Therefore, the mean distance is: 2 θ.
(α -1) (2α - 1)
αθ
E[X] = , α > 1.
α-1
2
Therefore, the mean relative difference is: , α > 1.
2α -1
x x
α θα
∫ y f(y) dy ∫ y α + 1 dy
y x
θα − 1
∫θ
1
b. G(x) = θ = θ = (α-1) θα−1 dy = 1 - , x > θ.
E[X] αθ yα xα − 1
α-1
θα
Now let t = F(x) = 1 - , x > θ. ⇒ θ/x = (1-t)1/α.
xα
Then G(t) = 1 - (1-t)1-1/α , 0 ≤ t ≤ 1.

c. For α = 1.5, G(t) = 1 - (1-t)1/3, 0 ≤ t ≤ 1. For α = 3, G(t) = 1 - (1-t)2/3, 0 ≤ t ≤ 1.

Here is a graph of these two Lorenz curves:
% of income
1.0
0.8
0.6
alpha = 3
0.4
alpha = 1.5
0.2
% of people
0.2 0.4 0.6 0.8 1.0
The Lorenz curve for α = 1.5 is below that for α = 3.
The incomes are more concentrated for α = 1.5 than for α = 3.
d. The Lorenz curve is: G(t) = 1 - (1-t)1-1/α , 0 ≤ t ≤ 1.

Integrating, the area under the Lorenz curve is: B = 1 - 1/(2 - 1/α) = 1 - α/(2α-1) = (α-1)/(2α-1).
1
Gini coefficient is: 1 - 2B = 1 - 2(α-1)/(2α-1) = , α > 1.
2α -1
2 1
Note that the Gini Coefficient = (mean relative difference) / 2 = (1/2) = .
2α -1 2α -1
1
e. 0.47 = . ⇒ α = 1.564. E[X] = θ α / (α-1) = 2.773 θ.
2α -1
The 99th percentile is: θ (1 - 0.99)-1/1.564 = 19.00 θ.
The income earned by the top 1% is:
∞
1.564 θ1.564
∫ x
x2.564
dx = (1.564/0.564) θ1.564 / (19θ)0.564 = 0.527 θ.
19θ
Thus the percentage of total income earned by the top 1% is: 0.527 θ / (2.773θ) = 19.0%.
Comment: The mean relative distance and the Gini coefficient have the same form as for the
two-parameter Pareto Distribution.
The distribution of incomes in the United States has a Gini coefficient of about 0.47.
For a sample of size two from a Single Parameter Pareto Distribution with α > 1, it turns out that:
2α θ 2α 2 θ
E[Min] = . E[Max] = .
2α -1 (α - 1)(2α - 1)
2α 2 θ 2α θ α
Therefore, the mean difference is: - = 2θ .
(α - 1)(2α - 1) 2α -1 (α -1) (2α - 1)
αθ 2
Since, E[X] = , the mean relative distance is .
α-1 2α -1
x α -1 e - x / θ
43.2. f(x) = = x e-x/θ / θ2.
θ α Γ(α)
The contribution to the relative difference from when x < y is:
∞∞ ∞⎛ ∞ ∞ ⎞
(1/θ4) ∫0 ∫y (y - x) x e - x/ θ dx y e- y / θ dy = (1/θ4) ∫ ∫
⎜
⎜
y xe- x / θ
∫
dx - x2 e- x / θ dx⎟ y e- y / θ dy
⎟
0⎝ y y ⎠
∞
= (1/θ4) ∫0 {y(-ye- y / θ θ - e- y / θ θ2) + y2e- y / θ θ + 2ye- y / θ θ2 + 2e- y / θ θ3 } y e- y / θ dy =
∞
(1/θ4) ∫0 y2 e- 2y / θ θ2 + 2y e- 2y / θ θ3 dy = (1/θ4) {θ2 2(θ/2)3 + 2θ3 (θ/2)2} = θ 3/4.
By symmetry this is equal to the contribution to the mean distance from when x > y.
Therefore, the mean distance is: θ 3/2. E[X] = αθ = 2 θ.
θ 3 /2
Therefore, the mean relative difference is: = 3/4.
2θ
Comment: The Gini Coefficient is half the mean relative difference or 3/8.
One can show in general that for the Gamma the mean relative distance is 2 - 4 β(α, α+1; 1/2).
⎛2 α⎞
Then in turn it can be shown that for alpha integer, the mean relative distance is: ⎜ ⎟ / 22α-1.
⎝α⎠
⎛8⎞
For example, for α = 4, the mean relative difference is: ⎜ ⎟ / 27 = 70/128 = 35/64.
⎝4⎠
The Gini Coefficient is half the mean relative difference, and is graphed below as a function of alpha:
Gini Coef
1.0
0.8
0.6
0.4
0.2
alpha
2 4 6 8 10
2016-C-2, Loss Distributions, §44 Important Ideas HCM 10/21/15, Page 961
Section 44, Important Ideas & Formulas
the exam.
Statistics Ungrouped Data (Section 2):
Average of X = 1st moment = E[X].

Average of X2 = 2nd moment about the origin = E[X2 ].
Mean = E[X].
Mode = the value most likely to occur.

Median = the value at which the distribution function is 50% = 50th percentile.
Variance = second central moment = E[(X - E[X])2 ] = E[X2 ] - E[X]2 .

Standard Deviation = Variance .
Var[kX] = k2 Var[X].
For independent random variables the variances add.
The average of n independent, identically distributed variables has a variance of

Var[X] / n.
Var[X+Y] = Var[X] + Var[Y] + 2Cov[X,Y].
Cov[X,Y] = E[XY] - E[X]E[Y]. Corr[X, Y] = Cov[X ,Y] / Var[X]Var[Y] .
Sample Mean = ∑ Xi / N = X .
The sample variance is an unbiased estimator of the variance of the distribution from which a data set
∑ (Xi - X )2
was drawn: Sample variance ≡ .
N - 1
Coefficient of Variation and Skewness (Section 3):
Coefficient of Variation (CV) = Standard Deviation / Mean.

1 + CV2 = E[X2 ] / E2 [X] = 2nd moment divided by the square of the mean
Average of X3 = 3rd moment about the origin = E[X3 ].

Third Central Moment = E[(X - E[X])3 ] = E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 .
E[(X - E[X]) 3]
Skewness = γ1 = . A symmetric distribution has zero skewness.
STDDEV 3

Kurtosis = = .
Variance2 Variance2
When computing the empirical coefficient of variation, skewness, or kurtosis, we use the biased
estimate of the variance, with n in the denominator, rather than the sample variance.
Empirical Distribution Function (Section 4):
The Empirical Distribution Function at x: (# of losses ≤ x)/(# losses).

The Empirical Distribution Function has mean of F(x) and a variance of: F(x){1-F(x)}/N.
S(x) = 1 - F(x) = the Survival Function.
Limited Losses (Section 5):
X ∧ L ≡ Minimum of x and L = Limited Loss Variable.
The Limited Expected Value at L = E[X ∧ L] = E[Minimum[L, x]].
L
E[X ∧ L] = ∫0 x f(x) dx + LS(L)
= contribution of small losses + contribution of large losses.
mean = E[X ∧ ∞]. E[X ∧ x] ≤ x. E[X ∧ x] ≤ mean.

Losses Eliminated (Section 6):
N = the total number of accidents or loss events.

d
Losses Eliminated by a deductible of size d = N ∫ x f(x) dx + N d S(d) = N E[X ∧ d].
0
Losses Eliminated by a deductible of size d

Loss Elimination Ratio (LER) = .
Total Losses
E[X∧ x]
LER(x) = .
E[X]
Excess Losses (Section 7):
(X - d)+ ≡ 0 when X ≤ d, X - d when X > d ⇔ left censored and shifted variable at d ⇔

amounts paid to insured with a deductible of d.
Excess Ratio = R(x) = (Losses Excess of x) / (total losses) = E[(X - d)+] / E[X].
R(x) = 1 - LER(x) = 1 - { E[X ∧ x] / mean }.
Total Losses = Limited Losses + Excess Losses: X = (X ∧ d) + (X - d)+.

E[(X - d)+] = E[X] - E[(X ∧ d)].
Excess Loss Variable (Section 8):
Excess Loss Variable for d ≡ X - d for X > d, undefined for X ≤ d ⇔

the nonzero payments excess of deductible d.
Mean Residual Life or Mean Excess Loss = e(x)

= the average dollars of loss above x on losses of size exceeding x.
E[X] − E[X ∧x]

e(x) = .
S(x)
e(x) = (average size of those claims of size greater than x) - x.
Failure rate, force of mortality, or hazard rate = h(x) = f(x)/S(x) = - d ln(S(x)) / dx .

Layers of Loss (Section 9):
The percentage of losses in the layer from d to u =

u
∫ (x - d) f(x) dx + S(u) (u - d)
d E[X ∧ u] − E[X ∧ d]
∞ = = LER(u) - LER(d) = R(d) - R(u).
E[X]
∫x f(x) dx
0
Layer Average Severity (LAS) for the layer from d to u =

The mean losses in the layer from d to u = E[X ∧ u] - E[X ∧ d] =
{LER(u) - LER(d)} E[X] = {R(d) - R(u)} E[X].
Average Size of Losses in an Interval (Section 10):
The average size of loss for those losses of size between a and b is:
b
∫ x f(x) dx {E[X ∧b] - b S(b)} - {E[X ∧a] - a S(a)}
a = .
F(b) - F(a) F(b) - F(a)

{E[X ∧b] - b S(b)} - {E[X ∧a] - a S(a)}
.
E[X]
Working with Grouped Data (Section 12):
For Grouped Data, if one is given the dollars of loss for claims in each interval,
then one can compute E[X ∧ x], LER(x), R(x), and e(x), provided x is an endpoint of an interval.
Uniform Distribution (Section 13):
Support: a ≤ x ≤ b Parameters: None

D. f. : F(x) = (x-a) / (b-a) P. d. f. : f(x) = 1/ (b-a)
bn + 1 - an + 1
Moments: E[Xn] =
(b - a) (n + 1)
Mean = (b+a)/2 Variance = (b-a)2 /12

Statistics of Grouped Data (Section 14):
One can estimate moments of Grouped Data by assuming the losses are uniformly distributed on
each interval and then weighting together the moments for each interval by the number of claims
observed in each interval.
Policy Provisions (Section 15):
An ordinary deductible is a provision which states that when the loss is less than or
equal to the deductible, there is no payment and when the loss exceeds the deductible,
the amount paid is the loss less the deductible.
The Maximum Covered Loss is the size of loss above which no additional payments are
made.
A coinsurance factor is the proportion of any loss that is paid by the insurer after any other
modifications (such as deductibles or limits) have been applied.
A coinsurance is a provision which states that a coinsurance factor is to be applied.
The order to operations is:

1. Limit the size of loss to the maximum covered loss.
2. Subtract the deductible. If the result is negative, set the payment equal to zero.
3. Multiply by the coinsurance factor.
A policy limit is maximum possible payment on a single claim.

Policy Limit = c(u - d). Maximum Covered Loss = u = d + (Policy Limit)/c.
With no deductible and no coinsurance, the policy limit ⇔ the maximum covered loss.
Under a franchise deductible the insurer pays nothing if the loss is less than the deductible
amount, but ignores the deductible if the loss is > the deductible amount.
Name Description
ground-up loss Losses prior to the impact of any deductible or maximum covered loss;
the full economic value of the loss suffered by the insured
regardless of how much the insurer is required to pay
in light of any deductible, maximum covered loss, coinsurance, etc.
Truncated Data (Section 16):
Ground-up, unlimited losses have distribution function F(x).

G(x) is what one would see after the effects of either a deductible or maximum covered loss.
Left Truncated ⇔ Truncation from Below at d ⇔

deduct. d & record size of loss when size > d.
F(x) - F(d)
G(x) = ,x>d 1 - G(x) = S(x) / S(d), x > d
S(d)
g(x) = f(x) / S(d), x > d x ⇔ the size of loss.

Truncation & Shifting from Below at d ⇔
deductible d & record non-zero payment ⇔ amount paid per (non-zero) payment.
F(x + d) - F(d)
G(x) = , x > 0. g(x) = f(x+d) / S(d), x > 0
S(d)
When data is truncated from above at the value L, claims of size greater than L are not in the
reported data base. G(x) = F(x) / F(L), x ≤ L g(x) = f(x) / F(L), x ≤ L.
Censored Data (Section 17):
Right Censored ⇔ Censored from Above at u ⇔

Maximum Covered Loss u & donʼt know exact size of loss, when ≥ u.
⎧F(x) x < u
G(x) = ⎨
⎩1 x = u
⎧ f(x) x < u
g(x) = ⎨
⎩point mass of probability S(u) x = u
The revised Distribution Function and density under censoring from above at u and truncation from
below at d is:
⎧F(x) - F(d)
⎪ d < x < u
G(x) = ⎨ S(d)
⎩⎪ 1 x = u
⎧f(x) / S(d) d < x < u

g(x) = ⎨
⎩ point mass of probability S(u)/ S(d) x = u
Left Censored and Shifted at d ⇔ (X - d)+ ⇔ losses excess of d ⇔
0 when X ≤ d, X - d when X > d ⇔ amounts paid to insured with a deductible of d

⇔ payments per loss, including when the insured is paid nothing due to the deductible
⇔ amount paid per loss.
G(0) = F(d) ; G(x) = F(x+d), x > 0. g(0) point mass of F(d) ; g(x) = f(x+d), x > 0.
Average Sizes (Section 18):
Type of Data Average Size

Ground-up, Total Limits E[X]
Censored from Above at u E[X ∧ u]
Truncated from Below at d e(d) + d = {E[X] - E[X ∧ d]}/S(d) + d
Truncated and Shifted from Below at d e(d) = {E[X] - E[X ∧ d]}/S(d)
Left Censored and Shifted E[(X - d)+] = E[X] - E[X ∧ d]
Censored from Above at u and {E[X ∧ u] - E[X ∧ d]} / S(d)
Truncated and Shifted from Below at d
With Maximum Covered Loss of u and an (ordinary) deductible of d, the average amount
paid by the insurer per loss is: E[X ∧ u] - E[X ∧ d].
With Maximum Covered Loss of u and an (ordinary) deductible of d, the average amount
paid by the insurer per non-zero payment to the insured is:
E[X ∧ u] - E[X ∧ d]
= e(d).
S(d)
A coinsurance factor of c, multiplies the average payment, either per loss or per non-zero payment
by c.
Percentiles (Section 19):
For a continuous distribution, the 100pth percentile is the first value at which F(x) = p.
For a discrete distribution, take the 100pth percentile as the first value at which F(x) ≥ p.
Definitions (Section 20):
A loss event or claim is an incident in which an insured or group of insureds suffers damages which
are potentially covered by their insurance contract.
The loss is the dollar amount of damage suffered by an insured or group of insureds as a result of a
loss event. The loss may be zero.
A payment event is an incident in which an insured or group of insureds receives a payment as a

result of a loss event covered by their insurance contract.
The amount paid is the actual dollar amount paid to the policyholder(s) as a result of a loss event or a
payment event. If it is as the result of a loss event, the amount paid may be zero.
A loss distribution is the probability distribution of either the loss or the amount paid from a loss
event or of the amount paid from a payment event.
The severity can be either the loss or amount paid random variable.
The exposure base is the basic unit of measurement upon which premiums are determined.
The frequency is the number of losses or number of payments random variable.
Parameters of Distributions (Section 21):
For a given type of distribution, in addition to the size of loss x, F(x) depends on
what are called parameters. The numerical values of the parameter(s) distinguish among the
members of a parametric family of distributions.
It is useful to group families of distributions based on how many parameters they have.
A scale parameter is a parameter which divides x everywhere it appears in the distribution function.
A scale parameter will appear to the nth power in the formula for the nth moment of the distribution.
A shape parameter affects the shape of the distribution and appears in the coefficient of variation
and the skewness.
Exponential Distribution (Section 22):
Support: x > 0 Parameter: θ > 0 ( scale parameter)
F(x) = 1 - e-x/θ f(x) = e-x/θ / θ
Mean = θ Variance = θ2 2nd moment = 2θ2

Coefficient of Variation = 1. Skewness = 2.
e(x) = Mean Excess Loss = θ
When an Exponential Distribution is truncated and shifted from below,

one gets the same Exponential Distribution, due to its memoryless
property.
Single Parameter Pareto Distribution (Section 23):
Support: x > θ Parameter: α > 0 (shape parameter)

⎛θ⎞ α α θα αθ
F(x) = 1 - ⎜ ⎟ f(x) = α + 1 Mean = , α > 1.
⎝x⎠ x α −1
Common Two Parameter Distributions (Section 24):
Pareto: α is a shape parameter and θ is a scale parameter.

The Pareto is a heavy-tailed distribution. Higher moments may not exist.
⎛ θ ⎞α α θα θ
F(x) = 1 - ⎜ ⎟ = 1 - (1 + x / θ)−α f(x) = Mean = , α > 1.
⎝θ + x⎠ (θ + x) α + 1 α −1
n! θ n θ+ x
E[Xn ] = , α > n. Mean Excess Loss = , α > 1.
(α − 1)...(α − n) α −1
If losses prior to any deductible follow a Pareto Distribution with parameters α and θ, then after
truncating and shifting from below by a deductible of size d, one gets another Pareto Distribution,
but with parameters α and θ + d.
Gamma: α is a shape parameter and θ is a scale parameter. Note the factors of θ in the moments.
For α = 1 you get the Exponential.
The sum of n independent identically distributed variables which are Gamma with parameters α and
θ is a Gamma distribution with parameters nα and θ. For α = a positive integer, the Gamma
distribution is the sum of α independent variables each of which follows an Exponential distribution.
x α -1 e - x / θ
F(x) = Γ(α; x/θ) f(x) =
θ α Γ(α)
Mean = αθ Variance = αθ2 E[Xn ] = θn (α )...(α + n -1) .

The skewness for the Gamma distribution is always twice the coefficient of variation.
LogNormal: If ln(x) follows a Normal, then x itself follows a LogNormal.
⎡ ln(x) − µ ⎤
F(x) = Φ⎢
exp - [( ln(x) − µ)2
2σ2 ]
⎣ ⎥⎦ f(x) =
σ x σ 2π
Mean = exp[µ + 0.5 σ2] Second Moment = exp[2µ + 2σ2]

Weibull: τ is a shape parameter, while θ is a scale parameter.

⎡ ⎛ x ⎞ τ⎤ ⎛ x ⎞τ
F(x) = 1 - exp⎢-⎜ ⎟ ⎥
⎣ ⎝ θ⎠ ⎦
f(x) =
τ τ− 1
θτ
x exp - ⎜ ⎟
⎝ θ⎠[ ]
For τ = 1 you get the Exponential Distribution.
Other Two Parameter Distributions (Section 25):
Inverse Gaussian: Mean = µ Variance = µ3 / θ

⎛ x ⎞2
θ ⎜ − 1⎟
⎛x ⎞ ⎛x ⎞ θ exp -[ ⎝µ ⎠
]
[
F(x) = Φ ⎜ − 1⎟
⎝µ ⎠
θ
x
] [
+ e2θ / µ Φ − ⎜ + 1⎟
⎝µ ⎠
θ
x
] . f(x) =
2π x
2x
1.5
.
(x / θ)γ γ x γ −1
LogLogistic: F(x) = . f(x) = γ .
1 + (x / θ)γ θ (1 + (x / θ) γ)2
Inverse Gamma : If X follows a Gamma Distribution with parameters α and 1, then θ/x follows an
Inverse Gamma Distribution with parameters τ = α and θ. α is the shape parameter and θ is the
scale parameter. The Inverse Gamma is heavy-tailed.
θα e - θ / x
F(x) = 1 - Γ(α ; θ/x) f(x) = α + 1
x Γ[α]
θ θn
Mean = , α > 1. E[Xn ] = , α > n.
α −1 (α − 1)...(α − n)
Producing Additional Distributions (Section 29):
Introduce a scale parameter by "multiplying by a constant".

Let G(x) = 1 - F(1/x). One gets the Inverse Gamma from the Gamma.
Let G(x) = F(ln(x)). One gets the LogNormal from the Normal by “exponentiating.”
Add up independent identical copies. One gets the Gamma from the Exponential.
Let G(x) = F(xτ). One gets a Weibull from the Exponential by "raising to a power."
One can get a new distribution as a continuous mixture of distributions.
The Pareto can be obtained as a mixture of Exponentials via an Inverse Gamma.
Another method of getting new distributions is via two-point or n-point mixtures.
Tails of Loss Distributions (Section 30):
If S(x) goes to zero slowly as x approaches ∞, this is a "heavy-tailed distribution."

The righthand tail is thick.
If S(x) goes to zero quickly as x approaches ∞, this is a "light-tailed distribution."
The righthand tail is thin.
The Pareto Distribution is heavy-tailed.
The Exponential distribution is light-tailed.
The Pareto Distribution is heavier-tailed than the LogNormal Distribution.
The Gamma, Pareto and LogNormal all have positive skewness.
Heavier Tailed Lighter Tailed
f(x) goes to zero more slowly f(x) goes to zero more quickly
Few Moments exist All (positive) moments exist
Larger Coefficient of Variation Smaller Coefficient of Variation
Higher Skewness Lower Skewness
e(x) Increases to Infinity e(x) goes to a constant
Decreasing Hazard Rate Increasing Hazard Rate
Here is a list of some loss distributions, arranged in increasing heaviness of the tail:
Distribution Mean Excess Loss All Moments Exist
Weibull for τ > 1 decreases to zero less quickly than 1/x Yes
Gamma for α > 1 decreases to a constant Yes
Exponential constant Yes
Gamma for α < 1 increases to a constant Yes
Inverse Gaussian increases to a constant Yes
Weibull for τ < 1 increases to infinity less than linearly Yes
LogNormal increases to infinity just less than linearly Yes
Pareto increases to infinity linearly No
Let f(x) and g(x) be the two densities, then if:

lim f(x) / g(x) = ∞, f has a heavier tail than g
x→ ∞
lim f(x) / g(x) = 0, f has a lighter tail than g

x→ ∞
lim f(x) / g(x) = positive constant, f has a similar tail to g.

x→ ∞
Limited Expected Values (Section 31):
x
E[X ∧ x] = ∫0 t f(t) dt + xS(x).
Rather than calculating this integral, make use of Appendix A of Loss Models, which has formulas for
the limited expected value for each distribution.
mean = E[X ∧ infinity].

e(x) = { mean - E[X ∧ x] } / S(x).
LER(x) = E[X ∧ x] / mean.

Layer Average Severity = E[X ∧ top of Layer] - E[X ∧ bottom of layer].
Expected Losses Excess of d: E[(X - d)+] = E[X] - E[X ∧ d].
the average payment per non-zero payment by the insurer is:
E[X ∧ u] - E[X ∧ d]
c .
S(d)
the insurerʼs average payment per loss to the insured is:
c (E[X ∧ u] - E[X ∧ d]).
x ∞
E[X ∧ x] = ∫0 S(t) dt . E[X] = ∫0 S(t) dt .
The Losses in a Layer can be written as an integral of the Survival Function from the bottom of the
Layer to the top of the Layer:
b
E[X ∧ b] - E[X ∧ a] = ∫a S(t) dt .
The expected amount by which losses are less than d is: E[(d - X)+ ] = d - E[X ∧ d].
E[Max[X, a]] = a + E[X] - E[X ∧ a]. E[Min[Max[X , a] , b]] = a + E[X ∧ b] - E[X ∧ a].
Limited Higher Moments (Section 32):
u
E[(X ∧ u)2 ] = ∫0 t2 f(t) dt + S(u) u2
The second moment of the average payment per loss under a Maximum Covered Loss u and a
deductible of d = the second moment of the layer from d to u is:
E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d {E[X ∧ u] - E[X ∧ d]}.
Given a deductible of d and a Maximum Covered Loss u, the second moment of the non-zero
payments is : (2nd moment of the payments per loss)/S(d).
If one has a coinsurance factor of c, then each payment is multiplied by c, therefore the second
moment and the variance are each multiplied by c2 .
Mean Excess Loss (Mean Residual Life) (Section 33):
∞ ∞ ∞
∫ (t - x) f( t) dt ∫x t f(t) dt ∫x S(t) dt
x
e(x) = E[X - x | X > x] = = -x= .
S(x) S(x) S(x)
e(x) = { mean - E[X ∧ x] } / S(x).

e(d) = average payment per payment with a deductible d.
It should be noted that for heavier-tailed distributions, just as with the mean, the Mean Excess Loss
only exists for certain values of the parameters. Otherwise it is infinite.
Distribution Behavior of e(x) as x→∞

Exponential constant
Pareto increases linearly
LogNormal increases to infinity less than linearly
Gamma, α > 1 decreases towards a horizontal asymptote
Gamma, α < 1 increases towards a horizontal asymptote
Weibull, τ > 1 decreases to zero
Weibull, τ < 1 increases to infinity less than linearly
Hazard Rate (Section 34):
The Hazard Rate, force of mortality, or failure rate, is defined as:

h(x) = f(x)/S(x), x ≥ 0. h(x) = -d ln(S(x)) / dx
x
S(x) = exp[-H(x)], where H(x) = ∫0 h(t) dt , the cumulative hazard rate.
h(x) defined for x > 0 is a legitimate hazard rate, if and only if h(x) ≥ 0 and the integral of h(x) from 0 to
infinity is infinite.
For the Exponential, h(x) = 1/θ = constant.
1
lim e(x) = lim .
x→∞ x→∞ h(x)
Loss Elimination Ratios and Excess Ratios (Section 35):
Loss Elimination Ratio = LER(x) = E[X ∧ x] / mean = 1 - R(x)

Excess Ratio = R(x) = (mean - E[X ∧ x]) / mean = 1 - { E[X ∧ x] / mean } = 1 - LER(x).
x x
∫0 S(t) dt ∫0 S(t) dt
LER(x)= = ∞ .
E[X]
∫ S(t) dt
0
The percent of losses in a layer can be computed either as the difference in Loss Elimination Ratios
or the difference of Excess Ratios in the opposite order.
The Effects of Inflation (Section 36):
Uniform Inflation ⇔ Every size of loss increases by a factor of 1+r.
Under uniform inflation, for a fixed limit the excess ratio increases and for a fixed deductible amount
the loss elimination ratio declines.
In order to keep up with inflation either the deductible or the limit must be increased at the rate of
inflation, rather than being held constant.
Under uniform inflation the dollars limited by a fixed limit increase slower than the overall
rate of inflation. Under uniform inflation the dollars excess of a fixed limit increase faster
than the overall rate of inflation.
Limited Losses plus Excess Losses = Total Losses.
Common ways to express the amount of inflation:

1. State the total amount of inflation from the earlier year to the later year.
2. Give a constant annual inflation rate.
3. Give the different amounts of inflation during each annual period between the earlier and later year.
4. Give the value of some consumer price index in the earlier and later year.
In all cases you want to determine the total inflation factor, (1+r), to get from the earlier year to the
later year.
The Mean, Mode, Median, and the Standard Deviation are each multiplied by (1+r).
Any percentile of the distribution is multiplied by (1+r);

in fact this is the definition of inflation uniform by size of loss.
The Variance is multiplied by (1+r)2 . The nth moment is multiplied by (1+r)n .
The Coefficient of Variation, the Skewness, and the Kurtosis are each unaffected by
uniform inflation.
Provided the limit keeps up with inflation, the Limited Expected Value, in dollars,
is multiplied by the inflation factor.
In the later year, the mean excess loss, in dollars, is multiplied by the inflation factor, provided the
The Loss Elimination Ratio, dimensionless, is unaffected by uniform inflation, provided the
deductible has been adjusted to keep up with inflation.
In the later year, the Excess Ratio, dimensionless, is unaffected by uniform inflation, provided the
Most of the size of loss distributions are scale families; under uniform inflation one gets the same
type of distribution. If there is a scale parameter, it is revised by the inflation factor.
For the Pareto, Single Parameter Pareto, Gamma, Weibull, and Exponential
Distributions, θ becomes θ(1+r).
Under uniform inflation for the LogNormal, µ becomes µ + ln(1+r).
For distributions in general, one can determine the behavior under uniform inflation as follows.
One makes the change of variables Z = (1+r) X.
For the Distribution Function one just sets FZ(z) = FX(x); one substitutes for x = z / (1+r).
Alternately, for the density function fZ(z) = fX(x) / (1+r).
The domain [a, b] becomes under uniform inflation [(1+r)a, (1+r)b].

The uniform distribution on [a, b] becomes under uniform inflation the uniform
distribution on [a(1+r), b(1+r)].
There are two alternative ways to solve many problems involving uniform inflation:
1. Adjust the size of loss distribution in the earlier year to the later year based on the amount of
inflation. Then calculate the quantity of interest in the later year.
2. Calculate the quantity of interest in the earlier year at its deflated value, and then adjust it to
the later year for the effects of inflation.
insurerʼs average payment per loss in the later year is:
u d
(1+ r) c { E[X ∧ ] - E[X ∧ ] }.
1+ r 1+ r
average payment per (non-zero) payment by the insurer in the later year is:
u d
E[X ∧ ] - E[X ∧ ]
(1+r) c 1+ r 1+ r .
⎛ d ⎞
S⎜ ⎟
⎝ 1+ r ⎠
Given uniform inflation, with inflation factor of 1+r, Deductible Amount d, Maximum Covered Loss u,
and coinsurance factor c, then in terms of the values in the earlier year,
the second moment of the insurerʼs payment per loss in the later year is:
u 2 d 2 d u d
(1+r)2 c2 { E[(X ∧ ) ] - E[(X ∧ ) ]-2 ( E[X ∧ ] - E[X ∧ ])}.
1+ r 1+ r 1+r 1+ r 1+ r
Given uniform inflation, with inflation factor of 1+r, Deductible Amount d, Maximum Covered Loss u,
and coinsurance factor c, then in terms of the values in the earlier year,
the average payment per (non-zero) payment by the insurer in the later year is:
u 2 d 2 d u d
E[(X ∧ ) ] - E[(X ∧ ) ] - 2 {E[X ∧ ] - E[X ∧ ]}
1+ r 1+r 1+r 1+r 1+r .
(1+r)2 c2
⎛ d ⎞
S
⎝ 1+r ⎠
If one has a mixed distribution, then under uniform inflation each of the component distributions acts
as it would under uniform inflation.
Lee Diagrams (Section 37):
Put the size of loss on the y-axis and probability on the x-axis.
The mean is the area under the curve.
Layers of loss correspond to horizontal strips.
Restricting attention to only certain sizes of loss corresponds to vertical strips.
N-Point Mixtures of Models (Section 38):
Mixing models is a technique that provides a greater variety of loss distributions.
One can take a weighted average of any two Distribution Functions:

G(x) = pA(x) + (1-p)B(x). This is called a 2-point mixture of models.
The Distribution Function of the mixture is the mixture of the Distribution Functions.
The Survival Function of the mixture is the mixture of the Survival Functions.
The density of the mixture is the mixture of the densities.
The mean of the mixture is the mixture of the means.

The moment of the mixture is the mixture of the moments:
E G[Xn ] = p EA [Xn ] + (1-p) EB [Xn ].
Limited Moments of the mixed distribution are the weighted average of the limited moments of the
individual distributions: EG[X ∧ x] = p EA[X ∧ x] + (1-p) EB[X ∧ x].
In general, one can weight together any number of distributions, rather than just
two. These are called n-point mixtures.
Sometimes the mixture of models is just a mathematical device with no physical significance.
However, it can also be useful when the data results from different perils.
Variable Mixture ⇔ weighted average of unknown # of distributions of the same family but differing
parameters ⇔ F(x) = ∑ wi Fi(x) , with each Fi of the same family and ∑ wi = 1.

Continuous Mixtures of Models (Section 39):

One takes a mixture of the density functions for specific values of the parameter ζ via some mixing
distribution u: g(x) = ∫ f(x; ζ) u(ζ) dζ .

The nth moment of a mixed distribution is the mixture of the nth moments for specific values of the
parameter ζ: E[Xn ] = Eζ[E[Xn | ζ]].
If the severity is Exponential and the mixing distribution of their means is Inverse Gamma,
then the mixed distribution is a Pareto, with
α = shape parameter of the Inverse Gamma and θ = scale parameter of the Inverse Gamma.
If the hazard rate of the Exponential, λ, is distributed via a Gamma(α, θ), then the mean 1/λ is
distributed via an Inverse Gamma(α, 1/θ), and therefore the mixed distribution is Pareto.
If the Gamma has parameters α and θ, then the mixed Pareto has parameters α and 1/θ.
If the severity is Normal with fixed variance s2 , and the mixing distribution of their means is also
Normal with mean µ and variance σ2, then the mixed distribution is another Normal,
with mean µ and variance: s2 + σ2.
In a Frailty Model, the hazard rate is of the form: h(x | λ) = λ a(x), where λ is a parameter which varies
across the portfolio, and a(x) is some function of x.
x
Let A(x) = ∫0 a(t) dt.
Then S(x) = Mλ[-A(x)].
For an Exponential Distribution: a(x) = 1, and A(x) = x.
For a Weibull Distribution: λ = θ−τ, a(x) = τxτ−1, and A(x) = xτ.
Spliced Models (Section 40):
A 2-component spliced model has: f(x) = w1 f1 (x) on (a1 , b1 ) and f(x) = w2 f2 (x) on (a2 , b2 ),
where f1 (x) is a density with support (a1 , b1 ), f2 (x) is a density with support (a2 , b2 ),
and w1 + w2 = 1.
A 2-component spliced density will be continuous at the breakpoint b,

provided the weights are inversely proportional to the component densities at the breakpoint:
f2(b) f1(b)
w1 = , w2 = .
f1(b) + f2(b) f1(b) + f2(b)
Extreme Value Distributions (Section 41):
The distribution for the maximum of N claims sizes is F(x)N.
Fisher-Tippett Theorem: The limit as the sample size increases of the maximum (properly scaled) is
one of three possible distributions; for actuarial applications the one of interest is the Frechet
Distribution, which is the Inverse Weibull Distribution from Appendix A of Loss Models.
Balkema-de Haan-Pickands Theorem: As the truncation point d increases towards infinity, the excess
loss variable (properly scaled) approaches one of three distributions; for insurance applications this
will be either the Exponential Distribution or the Pareto Distribution.
Relationship to Life Contingencies (Section 42):
y-xp x ≡ Prob[Survival past y | Survival past x] = S(y)/S(x).
y-xq x ≡ Prob[Not Surviving past y | Survival past x] = {S(x) - S(y)} / S(x) = 1 - y-xp x.
p x ≡ 1 p x = Prob[Survival past x+1 | Survival past x] = S(x+1) / S(x).
qx ≡ 1 qx = Prob[Death within one year | Survival past x] = 1 - S(x+1) / S(x).
t|uq x ≡ Prob[x+t < time of death ≤ x+t+u | Survival past x] = {S(x+t) - S(x+t+u)} / S(x).
Mahlerʼs Guide to
Aggregate Distributions
Exam C
prepared by
Study Aid 2016-C-3
Howard Mahler
hmahler@mac.com
2016-C-3, Aggregate Distributions, HCM 10/21/15, Page 1
Mahlerʼs Guide to Aggregate Distributions

The Aggregate Distribution concepts in Loss Models are demonstrated.
Information presented in italics (and sections whose titles are in italics) should not be needed to
directly answer exam questions and should be skipped on first reading. It is provided to aid the
readerʼs overall understanding of the subject, and to be useful in practical applications.

Solutions to problems are given at the end of each section.1

A 1 3-26 Introduction
2 27-58 Convolutions
3 59-73 Using Convolutions
B 4 74-109 Generating Functions
5 110-203 Moments of Aggregate Losses
6 204-215 Individual Risk Model
C 7 216-249 Recursive Method / Panjer Algorithm
8 250-264 Recursive Method / Panjer Algorithm, Advanced
D 9 265-286 Discretization
10 287-307 Analytic Results
11 308-349 Stop Loss Premiums
12 350-355 Important Formulas & Ideas
1
Note that problems include both some written by me and some from past exams. The latter are copyright by the
CAS and/or SOA and are reproduced here solely to aid students in studying for exams. The solutions and
comments are solely the responsibility of the author; the CAS/SOA bear no responsibility for their accuracy. While
some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no
way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams. In
some cases Iʼve rewritten past exam questions in order to match the notation in the current Syllabus. In some cases
the material covered is preliminary to the current Syllabus; you will be assumed to know it in order to answer exam
questions, but it will not be specifically tested.
2016-C-3, Aggregate Distributions, HCM 10/21/15, Page 2
Past Exam Questions by Section of this Study Aid2
Course 3 Course 3 Course 3 Course 3 Course 3Course 3 CAS 3 SOA 3 CAS 3

Section Sample 5/00 11/00 5/01 11/01 11/02 11/03 11/03 5/04
1 26 36 19
2
3 37
4
5 20 25 16 19 8 32 29 7 6 24, 25 4, 33 19 22 38 39
6
7 41 42 36 40
8
9
10
11 14-15 11 19 30 18 16
CAS 3 SOA 3 CAS 3 SOA M CAS 3 SOA M CAS 3 CAS 3 SOA M 4/C
Section 11/04 11/04 5/05 5/05 11/05 11/05 5/06 11/06 11/06 5/07
1 17 6 40
2
3
4
5 31, 32 15 8, 9, 40 17, 31, 40 30, 34 34, 38, 40 29 21, 32 17
6
7 8
8
9
10
11 18 19 7
The SOA did not release its 5/04 and 5/06 exams.
From 5/00 to 5/03, the Course 3 Exam was jointly administered by the CAS and SOA.
Starting in 11/03, the CAS and SOA gave separate exams.
The CAS/SOA did not release the 11/07 and subsequent exams 4/C.
2
2016-C-3, Aggregate Distributions §1 Introduction, HCM 10/21/15, Page 3
Aggregate losses are the total dollars of loss. Aggregate losses are the product of: the number of
exposures, frequency per exposure, and severity.
Aggregate Losses =
# of Claims $ of Loss
(# of Exposures) ( ) ( )=
# of Exposures # of Claims
(Exposures) (Frequency) (Severity).
If one is not given the frequency per exposure, but is rather just given the frequency for the whole
number of exposures,3 whatever they are for the particular situation, then
Aggregate Losses = (Frequency) (Severity).
Definitions:
The Aggregate Loss is the total dollars of loss for an insured or set of an insureds. If not stated
otherwise, the period of time is one year.
For example, during 1999 the MT Trucking Company may have had $152,000 in aggregate losses
on its commercial automobile collision insurance policy. All of the trucking firms insured by the
Fly-by-Night Insurance Company may have had $16.1 million dollars in aggregate losses for
collision. The dollars of aggregate losses are determined by how many losses there are and the
severity of each one.
Exercise: During 1998 MT Trucking suffered three collision losses for $8,000, $13,500, and
$22,000. What are its aggregate losses?
[Solution: $8,000 + $13,500 + $22,000 = $43,500.]
The Aggregate Payment is the total dollars paid by an insurer on an insurance policy or set of
insurance policies. If not stated otherwise, the period of time is one year.
Exercise: During 1998 MT Trucking suffered three collision losses for $8,000, $13,500, and
$22,000. MT Trucking has a $10,000 per claim deductible on its policy with the Fly-by-Night
Insurance Company. What are the aggregate payments by Fly-by-Night?
[Solution: $0 + $3,500 + $12,000 = $15,500.]
3
For example, the expected annual number claims from a large commercial insured is 27.3 per year or the expected
annual number of Homeownerʼs claims expected by XYZ Insurer in the State of Florida is 12,310.
Loss Models uses many different terms for each of the important concepts:4
aggregate losses, frequency and severity.
aggregate losses ⇔ aggregate loss random variable ⇔ total loss random variable ⇔ aggregate
payments ⇔ total payments ⇔ S.
frequency ⇔ frequency distribution ⇔ number of claims ⇔ claim count distribution ⇔
claim count random variable ⇔ N.
severity ⇔ severity distribution ⇔ single loss random variable ⇔
individual loss random variable ⇔ loss random variable ⇔ X.
Collective Risk Model:
There are two different types of risk models discussed in Loss Models in order to calculate
aggregate losses or payments.
The collective risk model adds up the individual losses.5 Frequency is independent of severity
and the sizes of loss are independent, identically distributed variables. Exam questions almost
always involve the collective risk model.
1. Conditional on the number of losses, the sizes of loss are independent, identically
distributed variables.
2. The size of loss distribution is independent of the number of losses.
3. The distribution of the number of claims is independent of the sizes of loss.
For example, one might look at the aggregate losses incurred this year by Few States Insurance
Company on all of its Private Passenger Automobile Bodily Injury Liability policies in the State of
West Carolina. Under a collective risk model one might model the number of losses via a Negative
Binomial and the size of loss via a Weibull Distribution. In such a model one is not modeling what
happens on each individual policy.6
If we have 1000 independent, identical policies, then the mean of the sum of the aggregate loss is
1000 times the mean of the aggregate loss for one policy. If we have 1000 independent, identical
policies, then the variance of the sum of the aggregate losses is 1000 times the variance of the
aggregate loss for one policy.
4
This does not seem to add any value for the reader.
5
6
In any given year, almost all of these policies would have no Bodily Injury Liability claim.
Individual Risk Model:
In contrast the individual risk model adds up the amount paid on each insurance policy.7
The amounts paid on the different insurance policies are assumed to be independent of each other.8
For example, we have 10 life insurance policies with death benefit of $100,000, and 5 with a death
benefit of $250,000. Each policy could have a different mortality rate. Then one adds the modeled
payments on these policies. This is an example of an individual risk model.
Advantages of Analyzing Frequency and Severity Separately:9
Loss Models lists seven advantages of separately analyzing frequency and severity, which allow a
more accurate and flexible model of aggregate losses:
1. The number of claims changes as the volume of business changes.
Aggregate frequency may change due to changes in exposures. For example, if exposures are in
car years, the expected frequency per car-year might stay the same, while the number of caryears
insured increases somewhat. Then the expected total frequency would increase.
For example, if the frequency per car-year were 3%, and the insured caryears increased from
100,000 to 110,000, then the expected number of losses would increase from 3000 to 3300.10
2. The effects of inflation can be incorporated.
3. One can adjust the severity distribution for changes in deductibles, maximum covered loss, etc.
4. One can adjust frequency for changes in deductibles.

7
8
Unless specifically told to, do not assume that the amount of loss on the different policies are identically distributed.
For example, the different policies might represent the different employees under a group life or health contract.
Each employee might have different amounts of coverage and/or frequencies.
9
10
In spite of what Loss Models says, in my opinion there is no significant advantage to looking at frequency and
severity separately in this case. One could just multiply expected aggregate losses by 10% in this example. Nor is
this example likely to be a big concern financially, if as usual the premiums collected increase in proportion to the
increase in the number of caryears. Nor is this situation relatively hard to keep track of and/or predict. It should be
noted that insurers make significant efforts to keep track of the volume of business they are insuring. In most cases it
is directly related to collecting appropriate premiums.
Of far greater concern is when the expected frequency per car-year changes significantly. For example, if the
expected frequency per car-year went from 3.0% to 3.3%, this would also increase the expected total number of
losses, but without any automatic increase in premiums collected. This might have occurred when speed limits were
increased from 55 m.p.h. or when lawyers were first allowed to advertise on television. Being able to separately
adjust historical frequency and severity for the expected impacts of such changes would be an advantage.
5. One can appropriately combine data from policies with different deductibles and maximum
covered losses into a single severity distribution.
6. One can create consistent models for the insurer, insured, and reinsurer.
7. One can analyze the tail of the aggregate losses by separately analyzing the tails of the
frequency and severity.11
A separate analysis allows an actuary to estimate the parameters of the frequency and severity from
separate sources of information.12
Deductibles and Maximum Covered Losses:13
Just as when dealing with loss distributions, aggregate losses may represent somewhat different
mathematical and real world quantities. They may relate to the total economic loss, i.e., no deductible
and no maximum covered loss. They may relate to the amount paid by an insurer after deductibles
and/or maximum covered losses.14 They may relate to the amount paid by the insured due to a
deductible and/or other policy modifications. They may relate to the amount the insurer pays net of
reinsurance. They may relate to the amount paid by a reinsurer.
In order to get the aggregate losses in these different situations, one has to adjust the severity
distribution and then add up the number of payments or losses.15 One can either add up the
non-zero payments or one can add up all the payments, including zero payments. One needs to
be careful to use the corresponding frequency and severity distributions.
For example, assume that losses are Poisson with λ = 3, and severity is Exponential with
θ = 2000. Frequency and severity are independent.
11
See “Mahlerʼs Guide to Frequency Distributions” and “Mahler's Guide to Loss Distributions.” For most casualty
lines, the tail behavior of the aggregate losses is determined by that of the severity distribution rather than the
frequency distribution.
12
This can be either an advantage or a disadvantage. Using inconsistent data sources or models may produce a
nonsensical estimate of aggregate losses.
13
See Sections 8.5 and 8.6 of Loss Models and “Mahlerʼs Guide to Loss Distributions.”
14
Of course in practical applications, we may have a combination of different deductibles, maximum covered losses,
coinsurance clauses, reinsurance agreements, etc. However, the concept is still the same.
15
See “Mahlerʼs Guide to Loss Distributions.” Policy provisions that deals directly with the aggregate losses, such as
the aggregate deductible for stop loss insurance to be discussed subsequently, must be applied to the aggregate
losses at the end.
If there is a deductible of 1000, then the (non-zero) payments are also Exponential with θ = 2000.16
S(1000) = e-1000/2000 = 60.65%; this is the proportion of losses that result in (non-zero) payments.
Therefore the aggregate payments can be modeled as either:

1. Number of (non-zero) payments is Poisson with λ = (0.6065)(3) = 1.8195 and the size of
(non-zero) payments are also Exponential with θ = 2000.17
2. Number of losses is Poisson with λ = 3 and the size of payments per loss is a two-point
mixture, with 39.35% chance of zero and 60.65% of an Exponential with θ = 2000.
Average aggregate payments are: (1.8915)(2000) = 3639 = (3){(0)(0.3935) + (2000)(0.6065)}.
Exercise: Frequency is Negative Binomial with β = 4 and r = 3. Severity is Pareto with

α = 2 and θ = 10,000. Frequency and severity are independent. What are the average aggregate
losses?
The insurer buys $25,000 per claim reinsurance; the reinsurer will pay the portion of each claim
greater than $25,000.
What are the insurerʼs aggregate annual losses after reinsurance?
How would one model the insurerʼs aggregate losses after reinsurance?
[Solution: Average frequency = (4)(3) = 12. Average severity = 10,000/(2-1) = 10,000.
Average aggregate losses (prior to reinsurance) = (12)(10,000) = 120,000.
After reinsurance, the average severity is:
E[X ∧ 25000] = {(10000)/(2-1)} {1 - (1+25/10)-(2-1)} = 7143.
Average aggregate losses, after reinsurance = (12)(7143) = 85,716.
After reinsurance, the frequency distribution is the same, while the severity distribution is censored at
25,000:
G(x) = 1 - {1+(x/10000)}-2, for x < 25000; G(25,000) = 1.]
16
Due to the memoryless property of the Exponential Distribution. See “Mahlerʼs Guide to Loss Distributions.”
17
When one thins a Poisson, one gets another Poisson. See “Mahlerʼs Guide to Frequency Distributions.”
Exercise: Frequency is Negative Binomial with β = 4 and r = 3. Severity is Pareto with

α = 2 and θ = 10,000. Frequency and severity are independent. The insurer buys $25,000 per
claim reinsurance. What is the average aggregate loss for the reinsurer?
How would one model the reinsurerʼs aggregate losses?
[Solution: From the previous exercise, the reinsurer pays on average 120,000 - 85,716 = 34,284.
Alternately, the distribution function of the Pareto at $25000 is: 1 - (1 + 2.5)-2 = 0.918367.
Thus only 8.1633% of the insurers losses lead to a non-zero payment by the reinsurer.
Thus the reinsurer sees a frequency distribution for non-zero payments which is Negative Binomial
with β = (4)(8.1633%) = 0.32653, r = 3, and mean (0.32653)(3) = 0.97959.18
The severity distribution for non-zero payments by the reinsurer is truncated and shifted from below
at 25,000. G(x) = {F(x+25000)-F(25000)}/S(25000) =
{(1+2.5)-2 - (1+ (x+25000)/10000)-2} /(1+2.5)-2 = 1 - {3.5 + (x/10000)}-2 3.52 =
1 - (1 + (x/35,000))-2.
Thus the distribution of the reinsurerʼs non-zero payments is Pareto with α = 2 and θ = 35000, and
mean 35000/(2-1) = 35,000.19
Thus the reinsurerʼs expected aggregate loses are: (0.97959)(35,000) = 34,286.
Alternately, we can model the reinsurer including its non-zero payments. In that case, the frequency
distribution is the original Negative Binomial with β = 4 and r = 3. The severity would be a
91.8367% and 8.1633% mixture of zero and a Pareto with α = 2 and θ = 35,000;
G(0) = 0.918367, and G(x) = 1 - (0.081633){1+ (x/35000)}-2, for x > 0.]
Model Choices:
Severity distributions that are members of scale families have the advantage that they are easy to
adjust for inflation and/or changes in currency.20 Infinitely divisible frequency distributions have the
advantages that they are easy to adjust for changes in level of exposures and/or time period.21
Loss Models therefore recommends the use of infinitely divisible frequency distributions,
unless there is a specific reason not to do so.22
18
If one thins a Negative Binomial variable one gets the same form, but with β multiplied by the thinning factor.
19
The mean excess loss for a Pareto is e(x) = (θ+x)/(α-1). e(25000) = (10,000 + 25,000)/(2-1) =35,000.
20
See “Mahlerʼs Guide to Loss Distributions.” If X is a member of a scale family, then for any c > 0, cX is also a
member of that family.
21
If a distribution is infinitely divisible, then if one takes the probability generating function to any positive power, one
gets the probability generating function of another member of the same family of distributions. See “Mahlerʼs Guide
to Frequency Distributions.” Infinitely divisible frequency distributions include the Poisson and Negative Binomial.
Compound distributions with a primary distribution that is infinitely divisible are also infinitely divisible.
22
For example, the Binomial may be appropriate when there is some maximum possible number of claims.
Problems:
1.1 (1 point) According to Loss Models, which of the following are advantages of the separation of
the aggregate loss process into frequency and severity?
1. Allows the actuary to estimate the parameters of the frequency and severity from separate
sources of information.
2. Allows the actuary to adjust for inflation.
3. Allows the actuary to adjust for the effects of deductibles.
A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A,B, C, or D.

• Frequency is Poisson with λ = 130, prior to the effect of any deductible.
• Loss amounts have a LogNormal Distribution with µ = 6.5 and σ = 1.3, prior to the
effect of any deductible.
• Frequency and loss amounts are independent.
1.2 (1 point) Calculate the expected aggregate amount paid by the insurer.
(A) 160,000 (B) 170,000 (C) 180,000 (D) 190,000 (E) 200,000
1.3 (3 points) Calculate the expected aggregate amount paid by the insurer, if there is a deductible
of 500 per loss.
(A) 140,000 (B) 150,000 (C) 160,000 (D) 170,000 (E) 180,000
1.4 (1 point)
Which of the following are advantages of the use of infinitely divisible frequency distribution?
1. The type of distribution selected does not depend on whether one is working with months
or years.
2. Allows the actuary to retain the same type of distribution after adjusting for inflation.
3. Allows the actuary to adjust for changes in exposure levels while using the same type of
distribution.
A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A,B, C, or D.
1.5 (1 point) Aggregate losses for a portfolio of policies are modeled as follows:
(i) The number of losses before any coverage modifications follows a distribution
with mean 30.
(ii) The severity of each loss before any coverage modifications is uniformly distributed
between 0 and 1000.
The insurer would like to model the impact of imposing an ordinary deductible of 100 on each loss
and reimbursing only 80% of each loss in excess of the deductible.
It is assumed that the coverage modifications will not affect the loss distribution.
The insurer models its claims with modified frequency and severity distributions.
The modified claim amount is uniformly distributed on the interval [0, 720].
Determine the mean of the modified frequency distribution.
(A) 3 (B) 21.6 (C) 24 (D) 27 (E) 30
1.6 (3 points) The amounts of loss have a Pareto Distribution with α = 4 and θ = 3000, prior to any
maximum covered loss or deductible. Frequency is Negative Binomial with r = 32 and β = 0.5, prior
to any maximum covered loss or deductible. If there is a 1000 deductible and 5000 maximum
covered loss, what is the expected aggregate amount paid by the insurer?
(A) 6000 (B) 6500 (C) 7000 (D) 7500 (E) 8000
1.7 (3 points) The Boxborough Box Company owns three factories.

It buys insurance to protect itself against major repair costs.
Profit = 45 less the sum of insurance premiums and retained major repair costs.
The Boxborough Box Company will pay a dividend equal to half of the profit, if it is positive.
You are given:
(i) Major repair costs at the factories are independent.
(ii) The distribution of major repair costs for each factory is:
k Prob(k)
0 0.6
20 0.3
50 0.1
(iii) At each factory, the insurance policy pays the major repair costs in excess of that factoryʼs
ordinary deductible of 10.
(iv) The insurance premium is 25.
Calculate the expected dividend.
(A) 3.9 (B) 4.0 (C) 4.1 (D) 4.2 (E) 4.3
1.8 (2 points) Lucky Tom always gives money to one beggar on his walk to work and money to
another beggar on his walk home from walk.
There is a 3/4 chance he gives a beggar $1 and a 1/4 chance he gives a beggar $10.
However, 1/8 of the time Tom has to stay late at work and takes a cab home rather than walk.
What is the 90th percentile of the amount of money Tom gives away on a work day?
A. 1 B. 2 C. 10 D. 11 E. 20
1.9 (2 points) A restaurant has tables that seat two people. 30% of the time one person eats at a
table, while 70% of the time two people eat at a table. After eating their meal, tips are left.
50% of the time the tip at a table is $1 per person, 40% of the time the tip at a table is $2 per
person, and 10% of the time the tip at a table is $3 per person.
What is the 70th percentile of the distribution of tips left at a table?
A. 2 B. 3 C. 4 D. 5 E. 6
1.10 (2 points) A coffee shop has tables that seat two people. 30% of the time one person sits at
a table, while 70% of the time two people sit at a table. There are a variety of beverages which cost
either $1, $2, or $3. Each person buys a beverage, independently of anyone else. 50% of the
time the beverage costs $1, 40% of the time the beverage costs $2, and 10% of the time the
beverage costs $3.
Determine the probability that the total cost of beverages at a table is either $2, $3, or $4.
A. 78% B. 79% C. 80% D. 81% E. 82%

The losses for the Mockingbird Tequila Company have a Poisson frequency distribution with λ = 5
and a Weibull severity distribution with τ = 1/2 and θ = 50,000.
The Mockingbird Tequila Company buys insurance from the Atticus Insurance Company, with a
deductible of $5000, maximum covered loss of $250,000, and coinsurance factor of 90%.
The Atticus Insurance Company buys reinsurance from the Finch Reinsurance Company.
Finch will pay Atticus for the portion of any payment in excess of $100,000.
1.11 (3 points) Construct a model for the aggregate payments retained by the Mockingbird Tequila
Company.
1.12 (3 points) Construct a model for the aggregate payments made by Atticus Insurance
Company to the Mockingbird Tequila Company, prior to the impact of reinsurance.
1.13 (3 points) Construct a model for the aggregate payments made by the Finch Reinsurance
Company to the Atticus Insurance Company.
1.14 (3 points) Construct a model for the aggregate payments made by the Atticus Insurance
Company to the Mockingbird Tequila Company net of reinsurance.
1.15 (2 points) The number of claims is Binomial with m = 2 and q = 0.1.

The size of claims is Normal with µ = 1500 and σ = 400.
Determine the probability that the aggregate loss is greater than 2000.
A. Less than 1.0%
E. 2.5% or more
1.16 (5A, 11/95, Q.21) (1 point) Which of the following assumptions are made in the collective risk
model?
1. The individual claim amounts are identically distributed random variables.
2. The distribution of the aggregate losses generated by the portfolio is continuous.
3. The number of claims and the individual claim amounts are mutually independent.
A. 1 B. 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
1.17 (Course 151 Sample Exam #1, Q.15) (1.7 points) An insurer issues a portfolio of 100
automobile insurance policies. Of these 100 policies, one-half have a deductible of 10 and the other
half have a deductible of zero. The insurance policy pays the amount of damage in excess of the
deductible subject to a maximum of 125 per accident. Assume:
(i) the number of automobile accidents per year per policy has a Poisson distribution
with mean 0.03
(ii) given that an accident occurs, the amount of vehicle damage has the distribution:
x p(x)
30 1/3
150 1/3
200 1/3
Compute the total amount of claims the insurer expects to pay in a single year.
(A) 270 (B) 275 (C) 280 (D) 285 (E) 290
1.18 (Course 151 Sample Exam #2, Q.21) (1.7 points) Aggregate claims, X, is uniformly
distributed over (0, 20). Complete insurance is available with a premium of 11.6.
If X is less than k, a dividend of k - X is payable.
Determine k such that the expected cost of this insurance is equal to the expected claims without
insurance.
(A) 2 (B) 4 (C) 6 (D) 8 (E) 10

Aggregate claims has a compound Poisson distribution with:
(i) λ = 1.0
(ii) severity distribution: p(1) = p(2) = 0.5
For a premium of 4.0, an insurer will pay total claims and a dividend equal to the excess of 75% of
premium over claims. Determine the expected dividend.
(A) 1.5 (B) 1.7 (C) 2.0 (D) 2.5 (E) 2.7
1.20 (5A, 11/96, Q.37) (2 points) Claims arising from a particular insurance policy have a
compound Poisson distribution. The expected number of claims is five.
The claim amount density function is given by P(X = 1,000) = 0.8 and P(X = 5,000) = 0.2
Compute the probability that losses from this policy will total 6,000.
1.21 (Course 1 Sample Exam, Q.10) (1.9 points) An insurance policy covers the two
employees of ABC Company, Bob and Carol. The policy will reimburse ABC for no more than
one loss per employee in a year. It reimburses the full amount of the loss up to an annual
company-wide maximum of 8000. The probability of an employee incurring a loss in a year is 40%.
The probability that an employee incurs a loss is independent of the other employeeʼs losses.
The amount of each loss is uniformly distributed on [1000, 5000].
Given that Bob has incurred a loss in excess of 2000, determine the probability that losses will
exceed reimbursements.
A. 1/20 B. 1/15 C. 1/10 D. 1/8 E. 1/6
Note: The original exam question has been rewritten.
1.22 (IOA 101, 4/01, Q.8) (5.25 points) Consider two independent lives A and B.
The probabilities that A and B die within a specified period are 0.1 and 0.2 respectively.
If A dies you lose 50,000, whether or not B dies.
If B dies you lose 30,000, whether or not A dies.
(i) (3 points) Calculate the mean and standard deviation of your total losses in the period.
(ii) (2.25 points) Calculate your expected loss within the period, given that one, and only one,
of A and B dies.
1.23 (3, 5/01, Q.26 & 2009 Sample Q.109) (2.5 points) A company insures a fleet of vehicles.
Aggregate losses have a compound Poisson distribution.
The expected number of losses is 20.
Loss amounts, regardless of vehicle type, have exponential distribution with θ = 200.
In order to reduce the cost of the insurance, two modifications are to be made:
(i) a certain type of vehicle will not be insured. It is estimated that this will
reduce loss frequency by 20%.
(ii) a deductible of 100 per loss will be imposed.
Calculate the expected aggregate amount paid by the insurer after the modifications.
(A) 1600 (B) 1940 (C) 2520 (D) 3200 (E) 3880
1.24 (1, 11/01, Q.16) (1.9 points) Let S denote the total annual claim amount for an insured.
There is a probability of 1/2 that S = 0.
There is a probability of 1/3 that S is exponentially distributed with mean 5.
There is a probability of 1/6 that S is exponentially distributed with mean 8.
Determine the probability that 4 < S < 8.
(A) 0.04 (B) 0.08 (C) 0.12 (D) 0.24 (E) 0.25
Note: This past exam question has been rewritten.
1.25 (3, 11/01, Q.36 & 2009 Sample Q.102) (2.5 points) WidgetsRUs owns two factories. It
buys insurance to protect itself against major repair costs. Profit equals revenues, less the sum of
insurance premiums, retained major repair costs, and all other expenses.
WidgetsRUs will pay a dividend equal to the profit, if it is positive.
You are given:
(i) Combined revenue for the two factories is 3.
(ii) Major repair costs at the factories are independent.
(iii) The distribution of major repair costs for each factory is
k Prob(k)
0 0.4
1 0.3
2 0.2
3 0.1
(iv) At each factory, the insurance policy pays the major repair costs in excess of that factoryʼs
ordinary deductible of 1. The insurance premium is 110% of the expected claims.
(v) All other expenses are 15% of revenues.
Calculate the expected dividend.
(A) 0.43 (B) 0.47 (C) 0.51 (D) 0.55 (E) 0.59
Aggregate losses for a portfolio of policies are modeled as follows:
(i) The number of losses before any coverage modifications follows a Poisson
distribution with mean λ.
(ii) The severity of each loss before any coverage modifications is uniformly distributed
between 0 and b.
The insurer would like to model the impact of imposing an ordinary deductible, d (0 < d < b), on each
loss and reimbursing only a percentage, c (0 < c < 1), of each loss in excess of the
deductible.
It is assumed that the coverage modifications will not affect the loss distribution.
The insurer models its claims with modified frequency and severity distributions.
The modified claim amount is uniformly distributed on the interval [0, c(b - d)].
Determine the mean of the modified frequency distribution.
(A) λ (B) λc (C) λd/b (D) λ(b-d)/b (E) λc(b-d)/b
The number of annual losses has a Poisson distribution with a mean of 5.
The size of each loss has a two-parameter Pareto distribution with θ = 10 and α = 2.5.
An insurance for the losses has an ordinary deductible of 5 per loss.
Calculate the expected value of the aggregate annual payments for this insurance.
(A) 8 (B) 13 (C) 18 (D) 23 (E) 28
1.28 (CAS3, 5/05, Q.6) (2.5 points) For a portfolio of 2,500 policies, claim frequency is 10% per
year and severity is distributed uniformly between 0 and 1,000. Each policy is independent and has
no deductible. Calculate the reduction in expected annual aggregate payments, if a deductible of
$200 per claim is imposed on the portfolio of policies.
E. $49,000 or more
A compound Poisson distribution has λ = 5 and claim amount distribution as follows:
x p(x)
100 0.80
500 0.16
1000 0.04
Calculate the probability that aggregate claims will be exactly 600.
(A) 0.022 (B) 0.038 (C) 0.049 (D) 0.060 (E) 0.070
1.1. D. 1. True. 2. True. May also adjust for expected changes in frequency. 3. True.
1.2. E. E[X] = exp(µ + σ2/2) = exp(6.5 + 1.32 /2) = 1548.

Mean aggregate loss = (130)(1548) = 201,240.
1.3. B. E[X] = exp(µ + σ2/2) = e7.345 = 1548.
E[X ∧ 500] = exp(µ + σ2/2)Φ[(ln500 − µ − σ2)/σ] + 500{1 - Φ[(ln500 − µ)/σ]} =

1548Φ[-1.52] + 500{1 - Φ[-0.22]} = (1548)(0.0643) + (500)(0.5871) = 393.
The mean payment per loss is: E[X] - E[X ∧ 500] = 1548 - 393 = 1155.
Mean aggregate loss = (130)(1155) = 150,150.
Alternately, the nonzero payments are Poisson with mean: (130)S(500).
The average payment per nonzero payment is: (E[X] - E[X ∧ 500])/S(500).
Therefore, the mean aggregate loss is: (130)S(500)(E[X] - E[X ∧ 500])/S(500) =
(130)(E[X] - E[X ∧ 500]) = (130)(1155) = 150,150.
1.4. B. 1. True. For example, if the frequency over one year is Negative Binomial with parameters
β = 0.3 and r = 2, then (assuming the months are independent) the frequency is Negative Binomial
over one month, with parameters β = 0.3 and r = 2/12. This follows from the form of the Probability
Generating Function, which is: P(z) = {1- β(z-1)}-r.
2. False. The frequency distribution has no effect on this. 3. True.
1.5. D. For the uniform distribution on (0, 1000), S(100) = 90%.

The distribution of non-zero payments has mean: (90%)(30) = 27.
Comment: Similar to SOA3, 11/03 Q.19.
The mean aggregate payment after modifications is: (27)(720/2) = 9720.
1.6. A. The mean frequency is: (32)(.5) = 16.

The mean payment per loss is: E[X ∧ 5000] - E[X ∧ 1000]
= {θ/(α-1)} {1-(θ/(θ+5000))α−1} - {θ/(α-1)} {1-(θ/(θ+1000))α−1} =

(3000/3){(3000/4000)3 - (3000/8000)3 } = (1000)(0.4219 - 0.0527) = 369.2.
Mean aggregate loss = (16)(369.2) = 5907.
Alternately, the nonzero payments have mean frequency: (16)S(1000).
The average payment per nonzero payment is: (E[X ∧ 5000] - E[X ∧ 1000])/S(1000).
Therefore, the mean aggregate loss is: (16)S(1000)(E[X ∧ 5000] - E[X ∧ 1000])/S(1000) =
(16)(E[X ∧ 5000] - E[X ∧ 1000]) = (16)(947.3 - 578.1) = 5907.
1.7. E. There is 10 deductible per factory. If the cost is either 20 or 50 the insured retains 10.
Thus each factory has retained costs that are 0 60% of the time, and 10 40% of the time.
Profit = 45 - 25 - retained costs = 20 - retained costs.
Prob[retained cost = 0] = Prob[all 3 factories have 0 retained costs] = 0.63 = 0.216.
Prob[retained cost = 10] = Prob[2 factories have 0 retained costs and other has 10] =
(3)(0.62 )(0.4) = 0.432.
Probability Retained Cost Profit Dividend
0.216 0 20 10
0.432 10 10 5
0.288 20 0 0
0.064 30 -10 0
Average 12 8 4.32
1.8. D. There is a 7/8 chance he donates to two beggars and a 1/8 chance he donates to one
beggar. Prob[Agg = 1] = (1/8)(3/4) = 3/32. Prob[Agg = 2] = (7/8)(3/4)2 = 63/128.
Prob[Agg = 10] = (1/8)(1/4) = 1/32. Prob[Agg = 11] = (7/8)(2)(3/4)(1/4) = 42/128.
Prob[Agg = 20] = (7/8)(1/4)2 = 7/128. Prob[Agg ≤ 10] = 79/128 = 61.7% < 90%.
Prob[Agg ≤ 11] = 121/128 = 94.5% ≥ 90%. The 90th percentile is 11.
Comment: The 90th percentile is the first value such that F ≥ 90%.
1.9. C. Prob[Agg = 1] = (30%)(50%) = 15%.

Prob[Agg = 2] = (30%)(40%) + (70%)(50%) = 47%.
Prob[Agg = 3] = (30%)(10%) = 3%.
Prob[Agg = 4] = (70%)(40%) = 28%.
Prob[Agg = 6] = (70%)(10%) = 7%.
Prob[Agg ≤ 3] = 65% < 70%.
Prob[Agg ≤ 4] = 93% ≥ 70%. The 70th percentile is 4.
1.10. B. Prob[Agg = 2] = Prob[1 person] Prob[drink = $2] + Prob[2 persons] Prob[both $2] =
(30%)(40%) + (70%)(50%)2 = 29.5%.
Prob[Agg = 3] = Prob[1 person] Prob[drink = $3] + Prob[2 persons] Prob[one at $1 and one at $2]
= (30%)(10%) + (70%)(2)(50%)(40%) = 31%.
Prob[Agg = 4] = Prob[2 persons] Prob[each $2] + Prob[2 persons] Prob[one at $1 and one at $3]
= (70%)(40%)2 + (70%)(2)(50%)(10%) = 18.2%.
Prob[Agg = 2] + Prob[Agg = 3] + Prob[Agg = 4] = 29.5% + 31% + 18.2% = 78.7%.
Comment: Prob[Agg = 1] = (30%)(50%) = 15%.
Prob[Agg = 5] = (70%)(2)(40%)(10%) = 5.6%. Prob[Agg = 6] = (70%)(10%)2 = 0.7%.
1.11. Mockingbird retains all of any loss less than $5000.

For a loss of size greater than $5000, it retains $5000 plus 10% of the portion above $5000.
Mockingbird retains the portion of any loss above the maximum covered loss of $250,000.
Let X be the size of loss and Y be the amount retained.
Let F be the Weibull Distribution of X and G be the distribution of Y.
y = x, for x ≤ 5000.
y = 5000 + (0.1)(x - 5000) = 4500 + 0.1x, for 5000 ≤ x ≤ 250,000.
Therefore, x = 10y - 45000, for 5000 ≤ y ≤ 29,500.
y = 4500 + (.1)(250000) + (x - 250000) = x - 220,500, for 250,000 ≤ x.
Therefore, x = y + 220,500, for 29,500 ≤ y.
G(y) = F(y) = 1 - exp[-(y/50000)1/2], for y ≤ 5000.
G(y) = F(10y - 45000) = 1 - exp[-((10y - 45000)/50000)1/2]
= 1 - exp[-(y/5000 - 0.9)1/2], 5000 ≤ y ≤ 29,500.
G(y) = F(y + 220,500) = 1 - exp[-((y + 220,500)/50000)1/2]
= 1 - exp[-(y/50000 + 4.41)1/2], 29,500 ≤ y.
The number of losses is Poisson with λ = 5.
1.12. Let X be the size of loss and Y be the amount paid on that loss.
Atticus Insurance pays nothing for a loss less than $5000. For a loss of size greater than $5000,
Atticus Insurance pays 90% of the portion above $5000.
For a loss of size 250,000, Atticus Insurance pays: (0.9)(250,000 - 5000) = 220,500.
Atticus Insurance pays no more for a loss larger than the maximum covered loss of $250,000.
y = 0, for x ≤ 5000.
y = (0.9)(x - 5000) = 0.9x - 4500, for 5000 ≤ x ≤ 250,000.
Therefore, x = (y + 4500)/.9, y < 220,500.
y = 220,500, for 250,000 ≤ x.
G(0) = F(5000) = 1 - exp[-(5000/50000)1/2] = 0.2711.
G(y) = F[(y + 4500)/0.9] = 1 - exp[-((y + 4500)/45000)1/2], 0 < y < 220,500.
G(220,500) = 1.
Alternately, let Y be the non-zero payments by Atticus Insurance.
Then G(y) = {F((y + 4500)/.9) - F(5000)} / S(5000)
= {1 - exp[-((y + 4500)/45000)1/2] - 0.2711} / 0.7289
= 1 - exp[-((y + 4500)/45000)1/2] / 0.7289, 0 < y < 220,500.
G(220,500) = 1.
The number of non-zero payments is Poisson with λ = (0.7289)(5) = 3.6445.
1.13. Finch Reinsurance pays something when the loss results in a payment by Atticus of more
than $100,000. Solve for the loss that results in a payment of $100,000:
100000 = (0.9)(x - 5000). ⇒ x = 116,111.
Let X be the size of loss and Y be the amount paid by Finch Reinsurance.
y = 0, for x ≤ 116,111.
y = (.9)(x - 116,111) = 0.9x - 104,500, for 116,111 < x ≤ 250,000.
Therefore, x = (y + 104500)/.9, for 0 < y < 120,500.
y = 120,500, for 250,000 ≤ x.
G(0) = F(116111) = 1 - exp[-(116111/50000)1/2] = 0.7821.
G(y) = F((y + 104500)/0.9) = 1 - exp[-((y + 104500)/45000)1/2], for 0 < y < 120,500.
G(120500) = 1.
Alternately, let Y be the non-zero payments by Finch.
Then G(y) = {F((y + 104500)/.9) - F(116111)}/S(116111)
= {1 - exp[-((y + 104500)/45000)1/2] - 0.7821}/0.2179
= 1 - exp[-((y + 104500)/45000)1/2]/0.2179, for 0 < y < 120,500.
G(120500) = 1.
The number of non-zero payments by Finch is Poisson with λ = (0.2179)(5) = 1.0895.
1.14. Let X be the size of loss and Y be the amount paid on that loss net of reinsurance.
For a loss greater than 116,111, Atticus Insurance pays 100,000 net of reinsurance.
y = 0, for x ≤ 5000.
y = (0.9)(x - 5000) = 0.9x - 4500, for 5000 ≤ x ≤ 116,111.
y = 100,000, for 116,111 < x.
G(0) = F(5000) = 1 - exp[-(5000/50000)1/2] = 0.2711.
G(y) = F[(y + 4500)/0.9] = 1 - exp[-((y + 4500)/45000)1/2], 0 < y < 100,000.
G(100,000) = 1.
Alternately, let Y be the non-zero payments by Atticus Insurance net of reinsurance.
Then G(y) = {F((y + 4500)/.9) - F(5000)}/S(5000)
= {1 - exp[-((y + 4500)/45000)1/2] - 0.2711}/.7289
= 1 - exp[-((y + 4500)/45000)1/2]/0.7289, 0 < y < 100,000.
G(100,000) = 1.
The number of non-zero payments is Poisson with λ = (0.7289)(5) = 3.6445.
1.15. E. Prob[N = 0] = 0.92 = 0.81. Prob[N = 1] = (2)(0.1)(0.9) = 0.18.

Prob[N = 2] = 0.12 = 0.01.
Let X be the size of a single loss.
Prob[X > 2000] = 1 - Φ[(2000 - 1500)/400] = 1 - Φ[1.25] = 0.1056.
X + X is also Normal, but with µ = 3000 and σ = 400 2 = 565.685.
Prob[X + X > 2000] = 1 - Φ[(2000 - 3000)/565.685] = 1 - Φ[-1.77] = 0.9616.
Prob[aggregate > 2000] = Prob[N = 1] Prob[X > 2000] + Prob[N = 2] Prob[X + X > 2000] =
(0.18)(0.1056) + (0.01)(0.9616) = 2.86%.
Comment: Do not use the Normal Approximation.
1.16. C. 1. True. 2. False. The aggregate losses are continuous if the severity distribution is
continuous and there is no probability of zero claims. The aggregate losses are discrete if the
severity distribution is discrete. 3. True.
1.17. B. The expected number of accidents is: (.03)(100) = 3. The mean payment with no
deductible is: (30 + 125 + 125)/3 = 93.333. The mean payment with a deductible of 10 is:
(20 + 125 + 125)/3 = 90. The overall mean payment is: (1/2)(93.333) + (1/2)(90) = 91.667.
Therefore, the mean aggregate loss is: (3)(91.667) = 275.
Comment: The maximum amount paid is 125; in one case there is no deductible and a maximum
covered loss of 125, while in the other case there is a deductible of 10 and a maximum covered loss
of 135.
1.18. D. The expected aggregate losses are: (0+20)/2 = 10. The expected dividend is:
k x =k
∫0 (k - x) (1/ 20) dx = -(k - x) / 40 ]

2
= k2 /40.
x =0
Setting as stated, premiums - expected dividends = expected aggregate losses,

11.6 - k2 /40 = 10. Therefore, k = 8.
1.19. B. Let A be the aggregate claims. 75% of premiums is 3. If A ≥ 3, the dividend is 0;

if A ≤ 3, the dividend is 3 - A. Thus the dividend is: 3 - Minimum(3, A).
Therefore, the expected dividend is 3 - E[A ∧ 3].
Prob(A = 0) = Prob(0 claims) = e-1.
Prob(A = 1) = Prob(1 claim)Prob(claim size = 1) = (e-1)(.5).
Prob(A = 2) = Prob(1 claim)Prob(claim size = 2) + Prob(2 claims)Prob(claim sizes are both 1) =
(e-1)(0.5) + (e-1/2)(0.52 ) = 0.625e-1. Prob(A ≥ 3) = 1 - 2.125e-1.
Thus E[A ∧ 3] = (0)(e-1) + (1)(0.5e-1) + (2)(0.625e-1) + (3)(1 - 2.125e-1) = 3 - 4.625 e-1.
Therefore the expected dividend is: 3 - E[A ∧ 3] = 4.625 e-1 = 1.70.
Alternately, if A = 0 the dividend is 3, if A = 1 the dividend is 2, if A = 2 the dividend is 1,
and if A ≥ 3, the dividend is zero. Therefore, the expected dividend is:
(3)(e-1) + (2)(0.5e-1) + (1)(0.625e-1) + (0)(1 - 2.125e-1) = 4.625 e-1 = 1.70.
1.20. For the aggregate losses to be 6000, either there are two claims with one of size 1000 and
one of size 5000, or there are six claims each of size 1000.
The probability is: (2)(0.8)(0.2)(52 e-5/2!) + (0.86 )(56 e-5/6!) = 6.53%.
Alternately, thin the Poisson Distribution. The number of claims of size 1000 is Poisson with λ = 4.
The number of claims of size 5000 is Poisson with λ = 1. The number of small and large claims is
independent. For the aggregate losses to be 6000, either there are two claims with one of size
1000 and one of size 5000, or there are six claims each of size 1000.
The probability is: Prob[1 claim of size 1000]Prob[1 claim of size 5000] +
Prob[6 claims of size 1000]Prob[no claim of size 5000] = (4e-4)(e-1) + (46 e-4/6!)(e-1) = 6.53%.
Comment: One could use either convolution or the Panjer Algorithm (recursive method), but they
would take longer in this case.
1.21. B. The only way that reimbursements can be greater than 8000 is if both employees have a
claim, and they sum to more than 8000.
Let x be the size of Bobʼs claim and y be the size of the Carolʼs claim, then x + y > 8000
⇔ y > 8000 - x. Prob[ y > 8000 - x] = {5000 - (8000 - x)}/4000 = (x - 3000)/4000, x > 3000.
Given Bob has had a claim of size > 2000, f(x) = 1/(5000-2000) = 1/3000.
Prob[reimbursements > 8000 | Bob has a claim > 2000] =
Prob[Carol has a claim] Prob[x + y > 8000] =
5000 5000
0.4 ∫ (Prob[ y > 8000 - x] / 3000) dx = (4/30,000)
∫ {(x - 3000) / 4000} dx
2000 3000
= (20002 /2) / 30,000,000 = 1/15.

1.22. (i) A has mean: (0.1)(50,000) = 5000, and variance: (500002 )(0.1)(0.9) = 225 million.
B has mean: (0.2)(30,000) = 6000, and variance: (300002 )(0.2)(0.8) = 144 million.
The total has mean: 5000 + 6000 = 11,000,
and variance: 225 million + 144 million = 369 million.
The standard deviation of the total is: 369 million = 19,209.
(ii) Prob[A and not B] = (0.1)(0.8) = 0.08. Prob[B and not A] = (0.2)(0.9) = 0.18.
{(0.08)(50000) + (0.18)(30000)} / (0.08 + 0.18) = 36,154.
1.23. B. After the modifications, the mean frequency is (80%)(20) = 16.

The mean payment per loss is: E[X] - E[X ∧ 100] = θ - θ(1 - e-100/θ) = (200)e-100/200 = 121.31.
After the modifications, the mean aggregate loss is: (16)(121.31) = 1941.
Alternately, given a loss, the probability of a non-zero payment given a deductible of size 100 is:
S(100) = e-100/200 = 0.6065.
Thus the mean frequency of non-zero payments is: (0.6065)(16) = 9.704.
Due to the memoryless property of the Exponential, the non-zero payments excess of the
deductible are also exponentially distributed with θ = 200.
Thus the mean aggregate loss is: (9.704)(200) = 1941.
1.24. C. Prob(4 < S < 8) = (1/3)Prob[4 < S < 8 | θ = 5] + (1/6)Prob[4 < S < 8 | θ = 8] =
(1/3)(e-4/5 - e-8/5) + (1/6)(e-4/8 - e-8/8) = 0.122.
1.25. E. E[X] = (0.4)(0) + (0.3)(1) + (0.2)(2) + (0.1)(3) = 1.

E[X ∧ 1] = (0.4)(0) + (0.3)(1) + (0.2)(1) + (0.1)(1) = 0.6.
Expected losses paid by the insurer per factory = E[X] - E[X ∧ 1] = 1 - 0.6 = 0.4.
Insurance Premium = (110%)(2 factories)(0.4 / factory) = 0.88.
Profit = 3 - (0.88 + retained costs + (0.15)(3)) = 1.67 - retained costs.
For each factory independently, the retained costs are either zero 40% of the time, or one 60% of
the time. Therefore, total retained costs are: zero 16% of the time, one 48% of the time, and two
36% of the time.
Probability Retained Losses Profit Dividend
16% 0 1.67 1.67
48% 1 0.67 0.67
36% 2 -0.33 0
Average 1.20 0.47 0.589
Expected Dividend = (0.16)(1.67) + (0.48)(0.67) + (0.36)(0) = 0.589.
Alternately, expected losses paid by the insurer per factory = E[(X-1)+] =
(0.4)(0) + (0.3)(0) + (0.2)(2 - 1) + (0.1)(3 -1) = 0.4. Proceed as before.
Alternately, the dividends are the amount by which the retained costs are less than:
revenues - insurance premiums - all other expenses = 3 - 0.88 - 0.45 = 1.67.
Expected Dividend = expected amount by which retained costs are less than 1.67 =
1.67 - E[retained costs ∧ 1.67].
Probability Retained Costs Retained Costs Limited to 1.67
16% 0 0
48% 1 1
36% 2 1.67
E[retained costs ∧ 1.67] = (.16)(0) + (.48)(1) + (.36)(1.67) = 1.081.
Expected Dividend = 1.67 - 1.081 = 0.589.
Comment: Note that since no dividend is paid when the profit is negative, the average dividend is
not equal to the average profit.
E[Profit] = 1.67 - E[retained costs] = 1.67 - 1.20 = 0.47 ≠ 0.589.
1.26. D. For the uniform distribution on (0, b), S(d) = (b-d)/b, for d < b.
The frequency distribution of non-zero payments is Poisson with mean: S(d)λ = λ(b-d)/b.
Comment: The severity distribution has been truncated and shifted from below, which would have
been uniform on [0, b - d], and then all the values were multiplied by c. ⇒ uniform on [0, c(b -d)].
The mean aggregate payment per year after the modifications is:
{λ(b-d)/b}{c(b-d)/2} = λc(b-d)2 /(2b).
For example, assume b = 5000, so that the severity of each loss before any coverage
modifications is uniformly distributed between 0 and 5000, d = 3000, and c = 90%.
Then 40% of the losses exceed the deductible of 3000.
Thus the modified frequency is Poisson with mean: 0.4λ = λ(5000 - 3000)/5000.
The modified severity is uniform from 0 to (90%)(5000 - 3000) = 1800.
The mean aggregate payment per year after the modifications is: (0.4λ)(1800/2) = 360λ.
1.27. C. E[X ∧ 5] = {θ/(α-1)}{1 - (θ/(θ+x))α−1} = (10/1.5)(1 - (10/15)1.5) = 3.038.

E[X] = θ/(α-1) = 10/1.5 = 6.667. Expected aggregate: (5)(6.667 - 3.038) = 18.1.
Alternately, the expected number of (nonzero) payments is: (5)S(5) = (5)(10/15)2.5 = 1.81.
The average payment per (nonzero) payment is for the Pareto Distribution:
e(5) = (5 + θ)/(α - 1) = (5 + 10)/(2.5 - 1) = 10.
Expected aggregate loss is: (10)(1.81) = 18.1.
1.28. A. Prior to the deductible, we expect: (2500)(10%) = 250 losses.

The expected losses eliminated by the deductible is:
250/5 = 50 averaging 100 and (250)(4/5) = 200 at 200. (50)(100) + (200)(200) = 45,000.
Alternately, the losses eliminated are: (expected number of losses) E[X ∧ 200]
200
= (250){ ∫0 (1/ 1000) x dx + 200(4/5)} = (250){(2002 /2)/1000 + 160} = (250)(180) = 45,000.
1.29. D. One can either have six claims each of size 100, or two claims with one of size 100 and
the other of size 500, in either order.
Prob[n = 6] Prob[x = 100]6 + Prob[n = 2] 2 Prob[x = 100] Prob[x = 500] =
(56 e-5/6!)(0.86 ) + (52 e-5/2!)(2)(0.8)(0.16) = (0.1462)(0.2621) + (0.0842)(0.2560) = 0.0599.
Alternately, the claims of size 100 are Poisson with λ = (0.8)(5) = 4, the claims of size 500 are
Poisson with λ = (0.16)(5) = 0.8, and the claims of size 1000 are Poisson with λ = (0.04)(5) = 0.2,
and the three processes are independent.
Prob[6 @ 100]Prob[0 @ 500]Prob[0 @ 1000] + Prob[1 @ 100]Prob[1 @ 500]Prob[0 @ 1000] =
(46 e-4/6!)(e-0.8)(e-0.2) + (4e-4)(0.8e-0.8)(e-0.2) = 0.0599.
Comment: One could instead use the Panjer Algorithm, but that would be much longer.
2016-C-3, Aggregate Distributions §2 Convolutions, HCM 10/21/15, Page 27
Section 2, Convolutions23
Quite often one has to deal with the sum of two independent variables. One way to do so is via the
so-called convolution formula.24 You are unlikely to be tested directly on convolutions on your exam.
As will be discussed in the next section, convolutions can be useful for computing either aggregate
distributions or compound distributions.
Six-sided Dice:
If one has a variable with density f, then the convolution of f with itself, f*f, is the density of the sum of
two such independent, identically distributed variables.
Exercise: Let f be a distribution which has 1/6 chance of a 1, 2, 3, 4, 5, or 6.

One can think of this as the result of rolling a six-sided die. What is f*f at 4?
[Solution: Prob[X1 + X2 = 4] = Prob[X1 = 1] Prob[X2 = 3] + Prob[X1 = 2] Prob[X2 = 2]
+ Prob[X1 = 3] Prob[X2 = 1] = (1/6)(1/6) + (1/6)(1/6) + (1/6)(1/6) = 3/36.]
One can think of f*f as the sum of the rolls of two six-sided dice.
Then f*f*f = f*3 can be thought of as the sum of the rolls of three six-sided dice.
Adding Variables versus Multiplying by a Constant:
Let X be the result of rolling a six-sided die:

Prob[X = 1] = Prob[X = 2] = Prob[X = 3] = Prob[X = 4] = Prob[X = 5] = Prob[X = 6] = 1/6.
Exercise: What are the mean and variance of X?

[Solution: The mean is 3.5, the second moment is: (12 + 22 + 32 + 42 + 52 + 62 )/6 = 91/6, and the
variance is: 91/6 - 3.52 = 35/12.]
Then X + X is the sum of rolling two dice.
Exercise: What is the distribution of X + X?

[Solution:
Result: 2 3 4 5 6 7 8 9 10 11 12
Prob.: 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 ]
23
See Section 9.3 of Loss Models. Also see Section 2.3 of Actuarial Mathematics, not on the Syllabus,
or An Introduction to Probability Theory and Its Applications by William Feller.
24
Another way is to work with Moment Generating Functions or other generating functions.
In general, X + X means the sum of two independent, identically distributed variables.
In contrast 2X means twice the value of a single variable.

In this example, 2X is twice the result of rolling a single die.
Exercise: What is the distribution of 2X?

[Solution:
Result: 2 4 6 8 10 12
Prob.: 1/6 1/6 1/6 1/6 1/6 1/6 ]
We note that the distributions of 2X and X + X are different.

For example X + X can be 3, while 2X can not.
Exercise: What are the mean and variance of X + X?

[Solution: Mean = (2)(3.5) = 7. Variance = (2)(35/12) = 35/6.]
The variances of independent variables add.
Exercise: What are the mean and variance of 2X?

[Solution: Mean = (2)(3.5) = 7. Variance = (22 )(35/12) = 35/3.]
Multiplying a variable by a constant, multiplies the variance by the square of that constant.
Note that while X + X and 2X each have mean 2 E[X], they have different variances.
Var[X + X] = 2 Var[X].
Var[2X] = 4 Var[X].
Convoluting Discrete Distributions:
Let X have a discrete distribution such that f(0) = 0.7, f(1) = 0.3.
Let Y have a discrete distribution such that g(0) = 0.9, g(1) = 0.1.
If X and Y are independent, we can calculate the distribution of Z = X + Y as follows:
If X is 0 and Y is 0, then Z = 0. This has a probability of (0.7)(0.9) = 0.63.

Thus Z has a 63% chance of being 0, a 34% chance of being 1, and a 3% chance of being 2.
One can put this solution in terms of formulas. Let z be the outcome, Z = X + Y.
Let f(x) be the density of X. Let g(y) be the density of Y.
Then since y = z - x and x = z - y one can sum over the possible outcomes in either of two ways:
(f*g)(z) = ∑ f(x) g(z - x) = ∑ f(z - y) g(y) .

x y
Exercise: Let h(z) be the density of Z. Use the above formulas to calculate h(1).
[Solution:
x=1
h(1) = ∑ f(x) g(1- x) = f(0)g(1) + f(1)g(0) = (0.7)(0.1) + (0.3)(0.9) = 0.34.
x=0
Alternately,
y=1
h(1) = ∑ f(1- y) g(y) = f(1)g(0) + f(0)g(1) = (0.3)(0.9) + (0.7)(0.1) = 0.34. ]
y=0
One could arrange this type of calculation in a spreadsheet:

x f(x) Product g(1-x) 1-x
0 0.7 0.07 0.1 1
1 0.3 0.27 0.9 0
Sum 1 0.34 1
One has to write g in the reverse order, so as to line up the appropriate entries. Then one takes
products and sums them. Let us see how this works for a more complicated case.
Exercise: Let X have a discrete distribution such that f(2) = 0.3, f(3) = 0.4, f(4) = 0.1, and f(5) = 0.2.
Let have Y have a discrete distribution such that g(0) = 0.5, g(1) = 0.2, g(2) = 0, and g(3) = 0.3.
X and Y are independent. Z = X +Y. Calculate the density at 4 of Z.
[Solution: Since we want x + y = 4, we put f(3) next to g(1), etc.
x f(x) Product g(4-x) 4-x
1 0 0.3 3
2 0.3 0 0 2
3 0.4 0.08 0.2 1
4 0.1 0.05 0.5 0
5 0.2 0
Sum 1 0.13 1
Alternately, we can list f(4-x).
x g(x) Product f(4-x) 4-x
-1 0 0.2 5
0 0.5 0.05 0.1 4
1 0.2 0.08 0.4 3
2 0 0 0.3 2
3 0.3 0
Sum 1 0.13 1
Alternately, list the possible ways the two variables can add to 4:
X = 2 and Y = 2, with probability: (0.3)(0) = 0,
X = 3 and Y = 1, with probability: (0.4)(0.2) = 0.8,
X = 4 and Y = 0, with probability: (0.1)(0.5) = 0.5.
The density at 4 of X + Y is the sum of these probabilities: 0 + 0.8 + 0.5 = 0.13.
Comment: f*g = g*f.]
In a similar manner we can calculate the whole distribution of Z:

z 2 3 4 5 6 7 8
h(z) 0.15 0.26 0.13 0.21 0.16 0.03 0.06
Note that the probabilities sum to one: 0.15 + 0.26 + 0.13 + 0.21 + 0.16 + 0.03 + 0.06 = 1.
This is one good way to check the calculation of a convolution.
One could arrange this whole calculation in spreadsheet form as follows:

The possible sums of X and Y are:

Y
X 0 1 2 3
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
5 5 6 7 8
With the corresponding probabilities:

Probabilities of Probabilities of Y
X 0.5 0.2 0 0.3
0.3 15% 6% 0% 9%
0.4 20% 8% 0% 12%
0.1 5% 2% 0% 3%
0.2 10% 4% 0% 6%
Then adding up probabilities:

sum = 2: 15%. sum = 6: 4% + 0% + 12% = 16%.
sum = 3: 20% + 6% = 26%. sum = 7: 0% + 3% = 3%.
sum = 4: 5% + 8% + 0% = 13%. sum = 8: 6%.
sum = 5: 10% + 2% + 0% + 9%= 21%.
Convoluting Three or More Variables:
If one wants to add up three numbers, one can sum the first two and then add in the third number.
For example 3 + 5 + 12 = (3 + 5) + 12 = 8 + 12 = 20. Similarly, if one wants to add three variables
one can sum the first two and then add in the third variable. In terms of convolutions, one can first
convolute the first two densities and then convolute this result with the third density.
Continuing the previous example, once one has the distribution of Z = X + Y, then we could
compute the densities of X + Y + Y = Z + Y, by performing another convolution. For example,
here is how one could compute the density of X + Y + Y = Z + Y at 6:
x h(x) Product g(6-x) 6-x

2 0.15 0
3 0.26 0.078 0.3 3
4 0.13 0 0 2
5 0.21 0.042 0.2 1
6 0.16 0.08 0.5 0
7 0.03 0
8 0.06 0
Sum 1 0.2 1
Notation:
We use the notation h = f * g, for the convolution of f and g.
Repeated convolutions are indicated by powers, in the same manner as is continued multiplication.
f * f = f*2 . f*f*f = f*3 .
Sum of 2 independent, identically distributed variables: f* f = f*2 .
Sum of 3 independent, identically distributed variables: f* f*f = f*3 .
Loss Models employs the convention that f*0 (x) = 1 if x = 0, and 0 otherwise.25
f*1 (x) = f(x).
Similar notation is used for the distribution of the sum of two independent variables.
If X follows F and Y follows G, then the distribution of X + Y is F * G.
(F*G)(z) = ∑ F(x)g(z - x) = ∑ f(z - y)G(y) = ∑ f(x)G(z - x) = ∑ F(z - y)g(y) .

x y x y
Exercise: Assume that all variables have support on the nonnegative integers.
Use the first of the above formulas to write an expression for (F*G)(2).
[Solution: (F*G)(2) = F(0)g(2) + F(1)g(1) + F(2)g(0).
Comment: F(0)g(2) + F(1)g(1) + F(2)g(0) = f(0)g(2) + {f(0) + f(1)}g(1) + {f(0) + f(1) + f(2)}g(0)
= f(0)g(2) + f(0)g(1)) + f(1)g(1) + f(0)g(0) + f(1)g(0) + f(2}g(0),
which does indeed cover all six possible ways we can have the two variables sum to 2 or less.
Personally, I use the formulas or common sense to calculate densities, and then cumulate to get the
distribution function if necessary.]
Again repeated convolutions are indicated by a power: F*F*F = F*3 .
Loss Models employs the convention that F*0 (x) = 0 if x < 0 and 1 if x ≥ 0;
F*0 has a jump discontinuity at 1.
F*1 (x) = F(x).
25
This will be used when one writes aggregate or compound distributions in terms of convolutions.
Properties of Convolutions:
The convolution operator is commutative and associative.

f*g = g*f. (f*g)*h = f* (g*h).
Note that the moment generating function of the convolution f* g is the product of the moment
generating functions of f and g. Mf*g = Mf Mg . This follows from the fact that the moment
generating function for a sum of independent variables is the product of the moment generating
functions of each of the variables. Thus if one takes the sum of n independent identically distributed
variables, the Moment Generating Function is taken to the power n.
Similarly, the Probability Generating Function of the convolution f* g is just of the product of the
Probability Generating Functions of f and g. Pf*g = Pf Pg .
This follows from the fact the Probability Generating Function for a sum of independent variables is
the product of the Probability Generating Functions of each of the variables. Thus if one takes the
sum of n independent identically distributed variables, the Probability Generating Function is taken to
the power n.
Convolution of Continuous Distributions:
The same convolution formulas apply if one is dealing with continuous rather than
discrete distributions, with integration taking the place of summation. One can get the density function
for the sum of two independent variables X + Y:
h(z) = ∫ f(x) g(z - x) dx = ∫ f(z - y) g(y) dy .

Similar formulas apply to get the Distribution Function of X + Y:
H(z) = ∫ f(x)G(z - x)dx = ∫ F(z - y)g(y)dy = ∫ F(x)g(z - x)dx = ∫ f(z - y)G(y)dy .

For example, let X have a uniform distribution on [1,4], while Y has a uniform distribution on [7,12].
If X and Y are independent and Z = X + Y, here is how one can use the convolution formulas to
compute the density of Z.
In this case f(x) = 1/3 for 1< x < 4 and g(y) = 1/5 for 7<y<12, so the density of the sum is:
x=4
∫ f(x) g(z - x) dx = x=1∫ (1/ 3) g(z - x)dx = (1/15) Length[{7< (z-x) <12} and {1 <x < 4}].
Length[{7 < (z-x) < 12} and {1 < x < 4}] = Length[{z-12 < x < z-7} and {1 < x < 4}].
If z < 8, then Length[{z-12 < x < z-7} and {1 < x < 4}] = 0.
If 8≤ z ≤ 11, then Length[{z-12 < x < z-7} and {1 < x < 4}] = z - 8.
If 11≤ z ≤ 13, then Length[{z-12 < x < z-7} and {1 < x < 4}] = 3.
If 13≤ z ≤ 16, then Length[{z-12 < x < z-7} and {1 < x < 4}] = 16 - z.
If 16 < z, then Length[{z-12 < x < z-7} and {1 < x < 4}]= 0.
Thus the density of the sum is:

0 z≤8
(z-8)/15 8 ≤ z ≤ 11
3/15 11 ≤ z ≤ 13
(16-z)/15 13 ≤ z ≤ 16
0 16 ≤ z
A graph of this density:
8 11 13 16
For example, the density of the sum at 10 is:

x=4
∫ f(x) g(10 - x)dx = x=1∫ (1/ 3) g(10 - x)dx = (1/15)Length[{7<(10-x)<12} and {1 <x < 4}] =
(1/15)Length[{1 < x < 3}] = (1/15)(2) = 2/15.
Note the convolution is in this case a continuous density function. Generally the convolution will
behave “better” than the original distributions, so convolution serves as a smoothing operator.26
26
See An Introduction to Probability Theory and Its Applications by William Feller.
Exercise: X has density f(x) = e-x, x > 0. Y has density g(y) = e-y, y > 0. If X and Y are
independent, use the convolution formula to calculate the density function for their sum.
[Solution: The density of X + Y is a Gamma Distribution with α = 2 and θ = 1.
x=∞ x=z x=z
f* g =
∫x=0 f(x) g(z - x) dx = x=0∫ e- x e- (z - x) dx = e-z x=0∫ dx = ze-z.
Note that the integral extends only over the domain of g, so that z-x > 0 or x < z. ]
Exercise: X has a Gamma Distribution with α = 1 and θ = 0.1.

Y has a Gamma Distribution with α = 5 and θ = 0.1.
If X and Y are independent, use the convolution formula to calculate the density function for their sum.
[Solution: The density of X + Y is a Gamma Distribution with α = 1 + 5 = 6 and θ = 0.1.
y=∞ y=z y=z
f* g =
∫y=0 f(z - y)g(y) dy = ∫ (10e- 10(z - y) ) (105y4 e- 10y / 4!) dy = 106 e-z ∫ y4 / 4! dy
y=0 y=0
= 106 e-z z5 /5!.

Note that the integral extends only over the domain of f, so that z-y > 0 or y < z. ]
Thus we have shown that adding an independent Exponential to a Gamma with the same scale
parameter, θ, increases the shape parameter of the Gamma, α, by 1. In general, the sum of two
independent Gammas with the same scale parameter, θ is another Gamma with the same θ and the
sum of the two alphas.
Problems:

Let X have density: f(0) = 0.6, f(1) = 0.3, and f(2) = 0.1.
Let Y have density: g(0) = 0.2, g(1) = 0.5, and g(2) = 0.3.
2.1 (1 point) What is f*f at 2?

A. Less than 0.15
E. At least 0.30
2.2 (1 point) What is the cumulative distribution function of X + X at 2?

A. Less than 0.80
E. At least 0.95
2.3 (2 points) What is f*3 = f*f*f at 3?

A. Less than 0.12
E. At least 0.18
2.4 (3 points) What is f*4 = f*f*f*f at 5?

A. Less than 0.04
E. At least 0.10
2.5 (1 point) What is g*g at 3?

A. Less than 0.20
E. At least 0.35
2.6 (1 point) What is the cumulative distribution function of Y + Y at 1?

A. Less than 0.10
E. At least 0.25
2.7 (1 point) What is g*3 = g*g*g at 4?

A. Less than 0.25
E. At least 0.40
2.8 (1 point) What is g*4 = g*g*g*g at 3?

A. Less than 0.10
E. At least 0.25
2.9 (1 point) What is f*g at 3?

A. Less than 0.15
E. At least 0.30
2.10 (1 point) What is the cumulative distribution function of X + Y at 2?

A. Less than 0.70
E. At least 0.85
2.11 (1 point) Which of the following are true?

1. If f has a Gamma Distribution with α = 3 and θ = 7,
then f*10 has a Gamma Distribution with α = 30 and θ = 70.
2. If f has a Normal Distribution with µ = 3 and σ = 7,
then f*10 has a Normal Distribution with µ = 30 and σ = 70.
3. If f has a Negative Binomial Distribution with β = 3 and r = 7,
then f*10 has a Negative Binomial Distribution with β = 30 and r = 70.
A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C, or D.
2.12 (2 points) The severity distribution is: f(1) = 40%, f(2) = 50% and f(3) = 10%.
There are three claims. What is the chance they sum to 6?
A. Less than 0.24
E. At least 0.27
2.13 (3 points) The waiting time, x, from the date of an accident to the date of its report to an
insurance company is exponential with mean 1.7 years. The waiting time, y, in years, from the
beginning of an accident year to the date of an accident is a random variable with density
f(y) = 0.9 + 0.2y, 0 ≤ y ≤ 1. Assume x and y are independent. What is the expected portion of the
total number of accidents for an accident year reported to the insurance company by one half year
after the end of the accident year?
A. Less than 0.41
E. At least 0.44
2.14 (2 points) Let f(x) = 0.02(10 - x), 0 ≤ x ≤ 10. What is the density of X + X at 7?
A. Less than 0.07
E. At least 0.10

X is the sum of a random variable with a uniform distribution on [1, 6] and an independent random
variable with a uniform distribution on [3, 11].
2.15 (1 point) What is the density of X at 14?

A. Less than 0.08
E. At least 0.11
2.16 (2 point) What is the Distribution Function of X at 14?

A. Less than 0.87
E. At least 0.90
2.17 (2 points) X follows an Exponential Distribution with mean 7. Y follows an Exponential

Distribution with mean 17. X and Y are independent. What is the density of Z = X + Y?
A. e-z/12 / 12
B. ze-z/12 / 144
C. ( e-z/17 - e-z/7) / 10
D. ( e-z/17 + e-z/7) / 24
2.18 (3 points) Tom, Dick, and Harry are actuaries working on the same project.
Each actuary performs his calculations with no intermediate rounding. Each result is a large number,
which the actuary rounds to the nearest integer. If without any rounding Tomʼs and Dickʼs results
would sum to Harryʼs, what is the probability that they do so after rounding?
A. 1/2 B. 3/5 C. 2/3 D. 3/4 E. 7/8
2.19 (2 points) Let X be the results of rolling a 4-sided die (1, 2, 3 or 4), and let Y be the result of
rolling a 6-sided die. X and Y are independent. What is the distribution of X + Y?
2.20 (2 points) The density function for X is f(1) = 0.2, f(3) = 0.5, f(4) = 0.3.
The density function for Y is g(0) = 0.1, g(2) = 0.7, g(3) = 0.2.
X and Y are independent. Z = X + Y. What is the density of Z at 4?
A. 7% B. 10% C. 13% D. 16% E. 19%

• The Durham Bulls and Toledo Mud Hens baseball teams will play a series of games against each
other.
• Each game will be played either in Durham or Toledo.
• Each team has a 55% chance of winning and 45% chance of losing any game at home.
• The outcome of each game is independent of the outcome of any other game.
• In the series, the Durham Bulls will play one more game at home than the Toledo Mud Hens.
2.21 (2 points) If the series consists of 3 games, what is probability that the Durham Bulls win the
series; in other words win more games than their opponents the Toledo Mud Hens?
2.24 (1 point) Which of the following are true?

1. If f has a Binomial Distribution with m = 2 and q = 0.07,
then f*5 has a Binomial Distribution with m = 10 and q = 0.07.
2. If f has a Pareto Distribution with α = 4 and θ = 8,
then f*5 has a Pareto Distribution with α = 40 and θ = 8.
3. If f has a Poisson Distribution with λ = 0.2, then f*5 has a Poisson with λ = 1.
A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C, or D.
2.25 (3 points) The frequency distribution is p1 = 0.6 and p2 = 0.4.

Severity is uniform from 0 to 10.
Determine the probability that the aggregate loss is greater than 8.
A. Less than 0.36
E. At least 0.44
2.26 (4B, 5/85, Q.49) (2 points) The waiting time, x, in years, from the date of an
accident to the date of its report to an insurance company is a random variable with probability
density function (p.d.f.) f(x), 0 < x < ∞. The waiting time, y, in years, from the beginning of an
accident year to the date of an accident is a random variable with p.d.f.
g(y), 0 < y < 1. Assuming x and y are independent, which of the following expressions represents
the expected proportion of the total number of accidents for an accident year reported to the
insurance company by the end of the accident year?
F(x), 0 < x < ∞, and G(y), 0 < y < 1 represent respectively the distribution functions of x and y.
1 1 1
A. ∫0 f(t) G(1- t) dt B. ∫0 f(t) G(t) dt C. ∫0 f(t) g(1- t) dt
1 1
D. ∫0 F(t) G(t) dt E. ∫0 F(t) G(1- t) dt
2.27 (5A, 11/94, Q.21) (1 point) Let S = X1 + X2 , where X1 and X2 are independent random
variables with distribution functions defined below:
X F1 (X) F2 (X)
0 0.3 0.6
1 0.4 0.8
2 0.6 1.0
3 0.7
4 1.0
Calculate Pr(S≤ 2).
A. Less than 0.25
E. Greater than or equal to 0.55
2.28 (5A, 11/94, Q.23) (1 point) X1 , X2 , X3 ,and X4 are independent random variables for a
Gamma distribution with the parameters α = 2.2 and θ = 0.2.
If S = X1 + X2 + X3 + X4 , then what is the distribution function for S?
A. Gamma distribution with the parameters α = 8.8 and θ = 0.8.
B. Gamma distribution with the parameters α = 8.8 and θ = 0.2.
C. Gamma distribution with the parameters α = 2.2 and θ = 0.8.
D. Gamma distribution with the parameters α = 2.2 and θ = 0.2.
2.29 (5A, 5/95, Q.19) (1 point) Assume S = X1 + X2 + ... + XN, where X1 , X2 , ... XN are
identically distributed and N, X1 , X2 , ... XN are mutually independent random variables.
1. If the distribution of the Xiʼs is continuous and the Prob(N = 0) > 0, the distribution of S will be
continuous.
2. If the distribution of the Xiʼs is normal, then the nth convolution of the Xiʼs is normal.
3. If the distribution of the Xiʼs is exponential, then the nth convolution of the Xiʼs is exponential.
A. 1 B. 2 C. 1, 2 D. 2, 3 E. 1, 2, 3
2.30 (5A, 11/95, Q.19) (1 point) Let S = X1 + X2 , where X1 and X2 are independent random
variables with the following distribution functions:
X F1 (X) F2 (X)
0 0.5 0.3
1 0.8 0.6
2 1 1
What is the probability that S > 2?
A. Less than 0.20
E. At least 0.80
2.31 (5A, 11/97, Q.22) (1 point) The following information is given regarding three mutually
independent random variables:
x f1 (x) f2 (x) f3 (x)
0 0.5 0.2 0.1
1 0.4 0.2 0.9
2 0.1 0.2
3 0.2
4 0.2
If S = x1 + x2 + x3 , calculate the probability that S = 5.
A. Less than 0.10
E. 0.25 or more
2.32 (5A, 11/98, Q.24) (1 point) Assume that S = X1 +X2 +X3 +...+XN where X1 , X2 , X3 , ...XN are
identically distributed and N, X1 , X2 , X3 , ... XN are mutually independent random variables. Which of
the following statements is true?
1. If the distribution of the Xi's is continuous and the Pr[N=0] > 0, the distribution of S will be
continuous.
2. The nth convolution of a normal distribution with parameters µ and σ is also normal
with mean nµ and variance nσ2.

3. If the individual claim amount distribution is discrete, the distribution of S is also discrete.
A. 1 B. 2 C. 3 D. 1, 2, 3 E. None of A, B, C, or D
2.33 (5A, 5/99, Q.37) (2 points) ABC Insurance Company writes liability coverage with one
maximum covered loss of $90,000 offered to all insureds. Only two types of losses to the insurer
arise out of this coverage:
(1) total limits plus expenses: $100,000
(2) loss expenses only: $50,000
You are given the following distribution of aggregate losses that applies in years when the insurer
faces 2 claims.
x f(x)
100,000 90.25%
150,000 9.5%
200,000 0.25%
If, next year, the insurer faces 3 claims, what is the likelihood that the aggregate losses will exceed
$150,000?
2.34 (5A, 11/99, Q.23) (1 point) X1 , X2 , X3 are mutually independent random variables with
probability functions as follows:
x f1 (X) f2 (X) f3 (X)
0 0.9 0.5 0.25
1 0.1 0.3 0.25
2 0.0 0.2 0.25
3 0.0 0.0 0.25
S = X1 + X2 + X3 . Find fS(2).
A. Less than 0.25
E. At least 0.28

Aggregate claims S = X1 + X2 + X3 , where X1 , X2 and X3 are mutually independent random
variables with probability functions as follows:
x f1 (x) f2 (x) f3 (x)
0 0.6 p 0.0
1 0.4 0.3 0.5
2 0.0 0.0 0.5
3 0.0 0.0 0.0
4 0.0 0.7-p 0.0
You are given FS(4) = 0.6.
Determine p.
(A) 0.0 (B) 0.1 (C) 0.2 (D) 0.3 (E) 0.4

S = X1 + X2 + X3 where X1 , X2 and X3 are independent random variables distributed as follows:
x f1 (X) f2 (X) f3 (X)
0 0.2 0 0.5
1 0.3 0 0.5
2 0.5 p 0.0
3 0.0 1-p 0.0
4 0.0 0 0.0
You are given FS(4) = 0.43.
Determine p.
(A) 0.1 (B) 0.2 (C) 0.3 (D) 0.4 (E) 0.5
2.37 (1, 11/01, Q.37) (1.9 points) A device containing two key components fails when, and only
when, both components fail. The lifetimes, T1 and T2 , of these components are independent with
common density function f(t) = e-t, t > 0.
The cost, X, of operating the device until failure is 2T1 + T2 .
Which of the following is the density function of X for x > 0?
(A) e-x/2 - e-x (B) 2(e-x/2 - e-x) (C) x2 e-x/2 (D) e-x/2/2 (E) e-x/3/3
2.1. C.
x f(x) f(2-x) Product
0 0.6 0.1 0.06
1 0.3 0.3 0.09
2 0.1 0.6 0.06
Sum 1 1 0.21
Comment: f*f(0) = f(0) f(0) = (0.6)(0.6) = 0.36.
f*f(1) = f(0) f(1) + f(1) f(0) = (0.6)(0.3) + (0.3)(0.6) = 0.36.
f*f(3) = f(1) f(2) + f(2) f(1) = (0.1)(0.3) + (0.3)(0.1) = 0.06. f*f(4) = f(2)f(2) = (0.1)(0.1) = 0.01.
2.2. D.
x F(x) f(2-x) Product
0 0.6 0.1 0.06
1 0.9 0.3 0.27
2 1 0.6 0.6
Sum 1 0.93
Comment: Alternately, f*f(0) + f*f(1) + f*f(2) = 0.36 + 0.36 + 0.21 = 0.93.
2.3. B. One can use the fact that f*f*f = (f*f)*f.

x f*f(x) f(3-x) Product
-1
0 0.36
1 0.36 0.1 0.036
2 0.21 0.3 0.063
3 0.06 0.6 0.036
4 0.01
Sum 1 1 0.135
2.4. A. One can use the fact that f*f = (f*f)*(f*f).

x f*f(x) f*f(5-x) Product
0 0.36 0
1 0.36 0.01 0.0036
2 0.21 0.06 0.0126
3 0.06 0.21 0.0126
4 0.01 0.36 0.0036
5 0.36
Sum 0.0324
2.5. D.
x g(x) g(3-x) Product
0 0.2 0
1 0.5 0.3 0.15
2 0.3 0.5 0.15
3 0.2 0
Sum 0.3
2.6. D.
x G(x) g(1-x) Product
0 0.2 0.5 0.1
1 0.7 0.2 0.14
2 1 0
Sum 0.24
2.7. B.
x g*g(x) g(4-x) Product
0 0.04 0
1 0.2 0
2 0.37 0.3 0.111
3 0.3 0.5 0.15
4 0.09 0.2 0.018
Sum 0.279
2.8. C. One can use the fact that g*g = (g*g)*(g*g).

x g*g(x) g*g(3-x) Product
0 0.04 0.3 0.012
1 0.2 0.37 0.074
2 0.37 0.2 0.074
3 0.3 0.04 0.012
4 0.09 0
Sum 0.172
2.9. A.
x f(x) g(3-x) Product
0 0.6 0
1 0.3 0.3 0.09
2 0.1 0.5 0.05
Sum 0.14
2.10. D. One can calculate the answer in either of two ways.

x F(x) g(2-x) Product
0 0.6 0.3 0.18
1 0.9 0.5 0.45
2 1 0.2 0.2
Sum 0.83
x G(x) f(2-x) Product
0 0.2 0.1 0.02
1 0.7 0.3 0.21
2 1 0.6 0.6
Sum 0.83
2.11. E. 1. False, the sum of 10 independent Gammas is a Gamma with parameters 10α and θ.
2. False. The variances add, so that the new variance is 10σ2 and the new standard deviation is
σ 10 , not 10σ. The sum of 10 independent Normals is a Normal with parameters 10µ and σ 10 .
3. False. The sum of 10 independent Negative Binomials is a Negative Binomial with parameters β
and 10r.
2.12. B. First one can compute f*f. f*f(2) = 0.16. f*f(3) = 0.40.
f*f(4) = (0.4)(0.1) + (0.5)(0.5) + (0.1)(0.4) = 0.33. f*f(5) = 0.10. f*f(6) = 0.01.
Then use the fact that f*f*f = (f*f)*f.
x f*f(x) f(6-x) Product
2 0.16 0
3 0.4 0.1 0.04
4 0.33 0.5 0.165
5 0.1 0.4 0.04
6 0.01 0
Sum 1 1 0.245
Comment: The mean of f is 0.4 + 1 + 0.3 = 1.7. If one computes f*f*f and computes the mean, one
gets 5.1 = (3) (1.7) The mean of the sum of 3 claims is three times the mean of a single claim.
x f*f*f(x) Product
3 0.064 0.192
4 0.24 0.96
5 0.348 1.74
6 0.245 1.47
7 0.087 0.609
8 0.015 0.12
9 0.001 0.009
Sum 1 5.1
2.13. D. For date of accident 0 < y < 1, the expected portion of accidents reported by time is:
1.5 is: 1 - e-(1.5-y)/1.7. We can integrate over the dates of accident: H(1.5) =
1
∫ G(1.5 - y) f(y) dy = ∫0 (1 - e - (1.5 - y) / 1.7 ) (0.9 + 0.2y) dy =

1
∫0 {0.9 - 0.9e - (1.5 - y) / 1.7 + 0.2y - 0.2y e - (1.5 - y) / 1.7} dy

y =1
= {0.9y - 1.53e
- (1.5 - y) / 1.7
+ 0.1y2 - 0.34y e - (1.5 - y) / 1.7 + 0.0578e - (1.5 - y) / 1.7]}
y =0
0.9 - 0.5070 + 0.1 - 0.2534 + 0.1915 = 0.4311.
Comment: One could instead use: H(1.5) = ∫ g(x) F(1.5 - x) dx .

2.14. E. Now f(7-x) > 0 for 0 ≤ 7-x < 10, which implies -3 < x ≤ 7.
In addition f(x) > 0 when 0 ≤ x < 10. Thus f(x)f(7-x) > 0 when 0 ≤ x ≤ 7.
7 7
∫0
f*f(7) = (0.02)(10 - x) (0.02){10 - (7 - x)} dx = 0.0004 ∫0 30 + 7x - x2 dx =
(0.0004) {210 + (7/2)(49) - (343/3)} = 0.1069.
2.15. A. Let f(y) = 1/5 for 1 < y < 6 and g(z) = 1/8 for 3 < z < 11, then density of the sum is:
6
h(14) = ∫ f(y) g(14 - y) dy = ∫3 (1/ 5) (1/ 8) dy = 3/40 = 0.075.
Comment: The integrand is zero unless 1 ≤ y ≤ 6 and 3 ≤ 14-y ≤ 11.
Therefore, we only integrate from y = 3 to y = 6.
2.16. C. Let F(y) = (y-1)/5 for 1 ≤ y ≤ 6, F(y) = 1 for y > 6, and g(z) = 1/8 for 3 ≤ z ≤ 11, then the
distribution of the sum is:
8 11
H(14) = ∫ F(14 - z) g(z) dz = ∫3 (1/ 8) dz + ∫8 (13 - z)/ 40 dz = 5/8 + 21/80 = 0.8875.
Alternately, X has a density function:
(x - 4)/40 4≤x≤9
5/40 9 ≤ x ≤ 12
(17-x)/40 12 ≤ x ≤ 17.
9 12 14
Thus F(14) = ∫4 (x - 4)/ 40 dx + ∫9 (1/ 8) dx + 12∫ (17 - x) / 40 dx = 0.3125 + 0.375 + 0.200 =
0.8875.
Comment: In general, if X is the sum of two uniform variables on (a, b) and (c, d), with
x - (a + b)
d - c ≥ b - a, then X has a density: ,a+c≤x≤b+c
(b - a) (d - c)
b - a c + d - x
,b+c≤x≤a+d , a + d ≤ x ≤ b + d.
(b - a) (d - c) (b - a) (d - c)
2.17. C. f(x) = e-x/7 /7. g(y) = e-y/17 /17. Using the convolution formula:
z z
h(z) = ∫ f(t) g(z - t) dt = ∫0 (e- t / 7 / 7) (e- (z - t) / 17 / 17) dt = (e-z/17 /119) ∫0 e- 0.0840336t dt =
(e-z/17 / 119) (e-0.0840336z - 1) / (-0.0840336) = ( e- z / 1 7 - e- z / 7) /10.
Alternately, the Moment Generating Functions are: 1/(1-7t) and 1/(1-17t).
Their product is: 1/{(1-7t)(1-17t)} = (17/10)/(1-17t) - (7/10)/(1-7t). This is:
(17/10)(m.g.f. of an Exponential with mean 17) - (7/10)(m.g.f. of an Exponential with mean 7) =
m.g.f of [(17/10 times an Expon. with mean 17)- (7/10 times an Exponential with mean 7)].
Thus Z is (17/10 times an Expon. with mean 17) - (7/10 times an Exponential with mean 7). Density
of Z is: (17/10)(e-y/17 /17) - (7/110)( e-z/7 /7) = ( e-z/17 - e-z/7) / 10.
Comment: In general, the sum of two independent Exponentials with different means θ1 and θ2 ,
exp(-x / θ1) - exp(-x / θ2 )
has a density of: .
θ1 - θ2
If θ1 = θ2 , then one would instead get a Gamma Distribution with parameters α = 2 and θ = θ1 = θ2 .
In this case with differing means, the density is closely approximated by a Gamma Distribution with
α = 2 and θ = (7+17)/2 =12, but it is not a Gamma Distribution.
The sum of n independent Exponentials with different means θi, has density:
n
∑ θin - 2 exp[-x / θi]

i=1
, n ≥ 2, θi ≠ θj.
∏ (θi - θj )
i≠ j
See Example 2.3.3 in Actuarial Mathematics, not on the Syllabus.

This mathematics is used in “Aids: Survival Analysis of Persons Testing HIV Positive,”
by Harry H. Panjer, TSA 1988.
2.18. D. Let x = (Tomʼs unrounded result) - (Tomʼs rounded result).

Then x is uniformly distributed from -0.5 to 0.5.
Let y = Dickʼs unrounded result - Dickʼs rounded result.
Then y is uniformly distributed from -0.5 to 0.5.
Then z = x + y has a triangle distribution:
f(z) = 1 + z for -1 ≤ z ≤ 0, and f(z) = 1 - z for 1 ≥ z ≥ 0.
F(z) = (1 + z)2 /2 for -1 ≤ z ≤ 0, and F(z) = 1 - (1 - z)2 /2 for 1 ≥ z ≥ 0.
The sum of the rounded results equals the rounding of the sums provided z is between -.5 and +.5.
The probability that -0.5 < z < 0.5 is: F(0.5) - F(-0.5) = 7/8 - 1/8 = 3/4.
Alternately, let t = decimal portion of Tomʼs unrounded result and d = decimal portion of Dickʼs
unrounded result. Divide into cases:
t < 1/2, d < 1/2, and 0 ≤ t + d < 1/2; OK; Prob = 1/8.
t < 1/2, d < 1/2, and 1 > t + d ≥ 1/2; not OK; Prob = 1/8.
t < 1/2, d ≥ 1/2; OK; Prob = 1/4.
t ≥ 1/2, d < 1/2; OK; Prob = 1/4.
t ≥ 1/2, d ≥ 1/2, and 1 ≤ t + d < 3/2; not OK; Prob = 1/8.
t ≥ 1/2, d ≥ 1/2, and 2 ≥ t + d ≥ 3/2; OK; Prob = 1/8.
Total probability where OK is: 1/8 + 1/4 + 1/4 + 1/8 = 3/4.
2.19. The possible results range from 2 to 10. There are (4)(6) = 24 equally likely results.
For example, a 4 can result from 1, 3; 2, 2; or 3, 1. Thus the chance of a 4 is: 3/24.
A 7 can result from 1, 6; 2; 5; 3, 4; or 4, 3. Thus the chance of a 7 is 4/24.
The probability density function is:
Result 2 3 4 5 6 7 8 9 10
Number of Chances 1 2 3 4 4 4 3 2 1
Probability 0.042 0.083 0.125 0.167 0.167 0.167 0.125 0.083 0.042
2.20. A. h(4) = ∑ f(x) g(4 - x) = f(1) g(3) + f(3) g(1) + f(4) g(0)
x
= (0.2)(0.2) + (0.5)(0) + (0.3)(0.1) = 0.07.
Or h(4) = ∑ f(4 - y) g(y) = f(4)g(0) + f(2)g(2) + f(1)g(3) = (0.3)(0.1) + (0)(0.7) + (0.2)(0.2) = 0.07.
y
One can arrange this calculation in a spreadsheet:

x f(x) g(4-x) f(x)g(4-x)
1 0.2 0.2 0.04
2 0 0.7 0
3 0.5 0 0
4 0.3 0.1 0.03
0.07
Comment: Similarly, h(5) = Σ f(x)g(5-x) = 0.35.
x f(x) g(5-x) f(x)g(5-x)
1 0.2 0
2 0 0.2 0
3 0.5 0.7 0.35
4 0.3 0 0
0.1 0
0.35
h(6) = Σ f(x)g(6-x) = (0.5)(0.2) + (0.3)(0.7) = 0.31.
The entire convolution of f and g is shown below:
z 1 2 3 4 5 6 7
h(z) 0.02 0 0.19 0.07 0.35 0.31 0.06
Note that being a distribution, h = f*g sums to 1.
2.21. The number of home games the Durham Bulls win is Binomial with m = 2 and q = 0.55:
f(0) = 0.452 = 0.2025. f(1) = (2)(0.55)(0.45) = 0.4950. f(2) = 0.552 = 0.3025.
The number of road games the Durham Bulls win is Binomial with m = 1 and q = 0.45.
Prob[3 wins] = Prob[2 home wins]Prob[1 road win] = (0.3025)(0.45) = 0.1361.
Prob[2 wins] = Prob[2 home wins]Prob[0 road win] + Prob[1 home wins]Prob[1 road win] =
(0.3025)(0.55) + (0.4950)(0.45) = 0.3891.
Prob[at least 2 wins] = 0.1361 + 0.3891 = 52.52%.
Comment: The total number of games they win is the convolution of the two Binomials.
In a three game championship series, if one team won the first 2 games, then the final game
would not be played. However, this does not affect the answer to the question.
f(0) = 0.453 = 0.0911. f(1) = (3)(0.55)(0.452 ) = 0.3341.
f(2) = (3)(0.552 )(0.45) = 0.4084. f(3) = 0.553 = 0.1664.
The number of road games the Durham Bulls win is Binomial with m = 2 and q = 0.45:
g(0) = 0.552 = 0.3025. g(1) = (2)(0.45)(0.55) = 0.4950. g(2) = 0.452 = 0.2025.
Prob[5 wins] = Prob[3 home wins]Prob[2 road wins] = (0.1664)(0.2025) = 0.0337.
Prob[4 wins] = Prob[3 home wins]Prob[1 road win] + Prob[2 home wins]Prob[2 road wins] =
(0.1664)(0.4950) + (0.4084)(0.2025) = 0.1651.
Prob[3 wins] = Prob[3 home wins]Prob[0 road win] + Prob[2 home wins]Prob[1 road win]
+ Prob[1 home win]Prob[2 road wins] = (0.1664)(0.2025) + (0.4084)(0.4950) + (0.3341)(0.2025)
= 0.3201. Prob[at least 3 wins] = 0.0337 + 0.1651 + 0.3201 = 51.89%.
Comment: The longer the series, the less advantage the Durham Bulls get from the extra home
game.
f(0) = 0.454 = 0.0410. f(1) = (4)(0.55)(0.453 ) = 0.2005. f(2) = (6)(0.552 )0(.452 ) = 0.3675.
f(3) = (4)(.553 )(0.45) = 0.2995. f(4) = 0.554 = 0.0915.
The number of road games the Durham Bulls win is Binomial with m = 3 and q = 0.45:
g(0) = 0.553 = 0.1664. g(1) = (3)(0.552 )(0.45) = 0.4084.
g(2) = (3)(0.55)(0.452 ) = 0.3341. g(3) = 0.453 = 0.0911.
Prob[7 wins] = Prob[4 home wins]Prob[3 road wins] = (0.0915)(0.0911) = 0.0083.
Prob[6 wins] = Prob[4 home wins]Prob[2 road wins] + Prob[3 home wins]Prob[3 road wins] =
(.0915)(0.3341) + (.2995)(0.0911) = 0.0579.
Prob[5 wins] = Prob[4 home wins]Prob[1 road win] + Prob[3 home wins]Prob[2 road wins]
+ Prob[2 home wins]Prob[3 road wins]
= (0.0915)(0.4084) + (0.2995)(0.3341) + (.03675)(0.0911) = 0.1709.
Prob[4 wins] = Prob[4 home wins]Prob[0 road win] + Prob[3 home wins]Prob[1 road win]
+ Prob[2 home wins]Prob[2 road wins] + Prob[1 home wins]Prob[3 road wins]
= (0.0915)(0.1664) + (0.2995)(0.4084)) + (0.3675)(0.3341) + (0.2005)(0.0911) = 0.2786.
Prob[at least 4 wins] = 0.0083 + 0.0579 + 0.1709 + 0.2786 = 51.57%.
Comment: The probabilities for the number of games won by the Durham Bulls are:
0.68%, 5.01%, 15.67%, 27.06%, 27.86% , 17.09%, 5.79%, 0.83%.
2.24. B. The sum of 5 independent, identically distributed Binomials is another Binomial with q the
same and m multiplied by 5. Statement #1 is true.
The sum of independent, identically distributed Paretos is not another Pareto.
The sum of 5 independent, identically distributed Poissons is another Poisson with λ multiplied by
5. Statement #3 is true.
2.25. C. If there is one claim, then S(8) = 0.2.

For example, f*f(2) = ∫ f(y) f(2 - y) dy .
For the integrand to be positive, we need 0 ≤ y ≤ 10, and 0 ≤ 2-y ≤ 10.
Thus we need 0 ≤ y ≤ 2.
More generally, for x ≤ 10, we need 0 ≤ y ≤ x.
x
For x ≤ 10, f*f(x) = ∫ f(y) f(x - y) dy = ∫0 0.1 0.1 dy = 0.01 x.
8
Thus F*F(8) = ∫0 0.01 x dx = (0.01)(82/2) = 0.32.
Thus if there are 2 claims, the probability that the aggregate is greater than 8 is: 1 - 0.32 = 0.68.
In total, the probability that the aggregate is greater than 8 is: (0.6)(0.2) + (0.4)(0.68) = 0.392.
2.26. A. We can add up over all possible reporting delays the chance that for a given reporting
delay the accident date is such that the accident will be reported by the end of the accident year. For
a given reporting delay x, the accident will be reported by the end of the accident year (time = 1) if
and only if the accident date is ≤ 1 - x. This only holds for x≤1; if the reporting delay is greater than 1,
then the accident can not be reported before the end of the accident year, regardless of the accident
date. The chance that the accident date is ≤ 1-x is: G(1-x). So the chance that we have an reporting
delay of x and that the accident is reported by the end of the accident year is the product: f(x)G(1-x),
since X and Y are given to be independent. Integrating over all reporting delays less than or equal
to 1, we get the chance that the accident is reported by the end of the accident year:
1
∫0 f(x) G(1- x) dx .
Comment: More generally, if x is the time from the accident date to the reporting date and y is the
time from the beginning of the accident year to the accident date, then the time from the beginning of
the accident year to the reporting date is x + y. An accident is reported by time z from the beginning
of the accident year is x + y ≤ z. So the distribution of Z is the distribution of the sum of X and Y. If X
and Y are independent, then the probability density function of their sum is given by the convolution
formula:
∫ f(x) g(z - x) dx = ∫ f(z - y) g(y) dy . The distribution function is given by:

z z=∞ ∞
∫ ∫ ∫
f(x) g(z - x) dz dx = f(x) G(z - x) dx .
x=-∞ z=-∞ -∞
In this case we are asked for the chance that the time between the beginning of the accident year
and the date of reporting is less than 1 (year) so we want the chance that z = x+y ≤1. Since in this
case 0 < x < ∞, the integral only goes at most from 0 to infinity. Also since in this case 0 < y < 1, we
have 0 < (z -x) < 1 so that (z-1) < x < z. Thus the integral goes at most from z-1 to z. Thus the
chance that the accident is reported by z from the beginning of the accident year is for z > 0:
z
∫ f(x) G(1- x) dx .
max[0, z-1]
Thus the chance that the accident is reported by the end of the accident year (z=1) is:
∫0 f(x) G(1- x) dx .
An important actuarial idea, very unlikely to be asked about on this exam.
Alternately to the intended solution, we can add up over all possible accident dates the chance that
for a given accident date the reporting delay is less than or equal to the time until the end of the
accident year. For a given accident date y, the time until the end of the accident year is 1-y. The
chance that the reporting delay is ≤ 1-y is: F(1-y). So the chance that we have an accident date of y
and that it is reported by the end of the accident year is the product
g(y)F(1-y), since X and Y are given to be independent. Integrating over all possible accident dates
(0<y<1) we get an alternate form of the chance that the accident is reported by the end of the
accident year:
1
∫0 g(y) F(1- y) dy .
Convolutions can generally be computed in either of these two alternate forms.
2.27. D. S > 2 when: X2 = 0 and X1 > 2, X2 = 1 and X1 > 1, X2 = 2 and X1 > 0.

This has probability: (0.6)(1 - 0.6) + (0.2)(1 - 0.4) + (0.2)(1 - 0.3) = 0.50. Pr(S≤ 2) = 1 - 0.5 = 0.5.
Alternately, FS(2) = Σ f1 (x)F2 (2-x) = (0.3)(1) + (0.1)(0.8) + (0.2)(0.6) = 0.5.
FS(2) = Σ f1 (2-x)F2 (x) = (0.2)(0.6) + (0.1)(0.8) + (0.3)(0.1) = 0.5.
FS(2) = Σ F1 (x)f2 (2-x) = (0.3)(0.2) + (0.4)(0.2) + (0.6)(0.6) + (0.7)(0) + (1)(0) = 0.5.
FS(2) = Σ F1 (2-x)f2 (x) = (0.6)(0.6) + (0.4)(0.2) + (0.3)(0.2) = 0.5.
2.28. B. The sum of 4 independent, identical Gamma Distributions is another Gamma Distribution
with the same θ parameter and 4 times the α parameter, in this case: α = 8.8 and θ = 0.2.
2.29. B. 1. False. There will be a point mass of probability at zero. 2. True.

3. False. The nth convolution of an Exponential is a Gamma with shape parameter α = n.
2.30. B. S > 2 if X1 = 1 and X2 ≥ 2, or X1 = 2 and X2 ≥ 1.

This has probability: (0.3)(0.4) + (0.2)(0.7) = 0.26.
Comment: Iʼve used the formula: (F*G)(z) = Σ f(x) G(z-x).

2.31. C. Ways in which S can be 5: (0, 4, 1), (1, 4, 0), (1, 3, 1), (2, 3, 0), (2, 2, 1) with probability:
(0.5)(0.2)(0.9) + (0.4)(0.2)(0.1) + (0.4)(0.2)(0.9) + (0.1)(0.2)(0.1) + (0.1)(0.2)(0.9) = 0.19.
Alternately, f1 (x)*f3 (x) = 0.05 @ 0, 0.49 @ 1, 0.37 @ 2, and 0.09 @ 3.
f1 *f3 *f2 (5) = (0.49)(0.2) + (0.37)(0.2) + (0.09)(0.2) = 0.19.
2.32. E. If there is a chance of no claims, then there is an extra point mass of probability at zero in
the aggregate distribution, and the distribution of aggregate losses is not continuous at zero, so
Statement #1 is False. The sum of n independent Normal Distributions is also Normal, with n times
the mean and n times the variance, so statement #2 is true.
Statement #3 is True.
2.33. Based on the aggregate distribution with two losses, there is a 90.25% = 0.95 chance of a
$50,000 loss and a 0.25% = 0.05 chance of a $100,000 loss. The aggregate distribution with
three claims is that for two claims convoluted with that for one claim; it has density at $150,000 of
(0.95)(0.9025) = 0.857375. With 3 claims the aggregate distribution is ≥ $150,000, so the chance
of exceeding $150,000 is: 1 - 0.857375 = 14.2625%.
2.34. A. f1 *f2 is: (0.9)(0.5) = 0.45@0, (0.9)(0.3) + (0.1)(0.5) = 0.32 @1,

(0.9)(0.2) + (0.1)(0.3) = 0.21@2, and (0.1)(0.2) = 0.02 @ 3.
fS = f1 *f2 *f3 is: (0.45)(0.25) = 0.1125 @ 0, (0.45)(0.25) + (0.32)(0.25) = 0.1925 @ 1,
(0.45)(0.25) + (0.32)(0.25) + (0.21)(0.25) = 0.245 @ 2,
(0.45)(0.25) + (0.32)(0.25) + (0.21)(0.25) + (0.02)(0.25) = 0.25 @ 3,
(0.32)(0.25) + (0.21)(0.25) + (0.02)(0.25) = 0.1375 @ 4,
(0.21)(0.25) + (0.02)(0.25) = 0.0575 @ 5, and (0.02)(0.25) = 0.005 @ 6.
2.35. D. We are given that the chance that S is greater than 4 is: 1- 0.6 = 0.4. Since the sum of
severities one and three is either 1, 2 or 3, and since the second severity is either 0, 1, or 4, S is
greater than 4 if and only if the second severity is 4. Thus 0.7 - p = 0.4. p = 0.3.
Alternately, we can compute the distribution function of S, FS, via convolution. First convolute F1 and
f3 . F1 * f3 is: (.6)(.5) = 0.3 @ 1, (0.6)(0.5) + (1)(0.5) = 0.8 @ 2, and (1)(0.5) + (1)(0.5) = 1 @ 3.
Next convolute by f2 . FS = F1 * f3 * f2 is:
0.3p @ 1, 0.8p + (0.3)(0.3) @ 2, p + (0.3)(0.8) + (0)(0.3) @ 3,
p + (0.3)(1) +(0)(0.8) + 0(0.3) @ 4, p + 0.3 + 0 + 0 + (0.7-p)(.3) @ 5, p + 0.3 + (0.7-p)(0.8) @
6, and 1 @ 7. We are given FS(4) = 0.6, so that 0.6 = p + (0.3)(1) +(0)(.8) + 0(0.3).
Therefore p = 0.3.
Comment: In order to compute FS, one can do the convolutions in any order.
I did it in the order I found easiest.
In general, FX+Y(z) = ∑ FX(x) fY(z - x) = ∑ FY(y) fX(z - y) = ∑ FX(z - y) fY(y) = ∑ FY(z - x) fX(x) .
x y y x
2.36. B. f1 *f3 is: (0.2)(0.5) = 0.1@0, (0.2)(0.5) + (0.3)(0.5) = 0.25 @1,

(0.3)(0.5) + (0.5)(0.5) = 0.4@2, and (0.5)(0.5) = 0.25 @ 3.
fS = f2 *f1 *f3 is: 0.1p @2, 0.1(1 - p) + 0.25p @3, and (0.25)(1-p) + (0.4)(p) @4.
Since S≥ 2, FS(4) = fS(2) + fS(3) +fS(4) =
0.1p + 0.1(1 - p) + 0.25p + (0.25)(1-p) + (0.4)(p) = 0.35 + 0.4p.
Setting FS(4) = 0.43: 0.35 + 0.4p = 0.43. ⇒ p = 0.2.
Comment: Although it is not needed to solve the problem: f2 *f1 *f3 is:
0.25p + 0.4(1-p) @5, and 0.25(1-p) @6. One can verify that the density of S sums to one.
2.37. A. T1 is Exponential with mean 1. When we multiply by 2, we get another Exponential with
mean 2. Let 2T1 = U. Then U is Exponential with θ = 2.
Density of U: e-u/2/2. X = U + V, where V = T2 . Density of V is: e-v = e-(x-u).

x x
∫0 ∫0
Density of x = (e- u / 2 / 2) e- (x - u) du = e-x eu / 2 / 2 du = (e-x)(ex/2 - 1) = e- x / 2 - e- x, x > 0.
Comment: The sum of an Exponential with θ = 2 and an Exponential with θ = 1, is not a Gamma
Distribution.
2016-C-3, Aggregate Distributions §3 Using Convolutions, HCM 10/21/15, Page 59
Section 3, Using Convolutions
Convolutions can be useful for computing either aggregate distributions or compound distributions.27
Aggregate Distributions:
Exercise: Frequency is given by a Poisson with mean 7. Severity is given by an Exponential with
mean 1000. Frequency and Severity are independent.
Write the Distribution Function for the aggregate losses.
[Solution: The chance of n claims is e-7 7n / n!.
If one has n claims, then the Distribution of Aggregate Losses is the sum of n independent
Exponentials or a Gamma with parameter α = n and θ = 1000.
Let FA(x) be the Distribution of Aggregate Losses.
FA(x) = Σ (Probability of n claims)(Aggregate Distribution given n claims) =
Σ (e-7 7n / n!) Γ(n; x/1000).]
We note that each Gamma Distribution was the nth convolution of the Exponential.
Each term of the sum is the density of the frequency distribution at n times the nth convolution of the
More generally, if frequency is FN, if severity is FX, frequency and severity are independent, and
aggregate losses are FAgg then:
∞
FA g g(x) = ∑ fN (n) FX * n (x) .
n= 0
∞
fA g g(x) = ∑ fN (n) fX * n (x). Recalling that f*0 (0) ≡ 1.
n= 0
If one has discrete severity distributions, one can employ these formulas to directly calculate the
distribution of aggregate losses.28
27
The same mathematics applies to aggregate distributions (independent frequency and severity) and compound
distributions. One can think of a compound distribution as an aggregate model with a discrete severity.
28
If the severity is continuous, as will be discussed in a subsequent section, then one could approximate it by a
discrete distribution.
An Example with a Discrete Severity:
Exercise: Let a discrete severity distribution be: f(10) = 0.4, f(20) = 0.5, f(30) = 0.1. What is f*f?
[Solution: List the possible ways the two variables can add to 20:
10 and 10, with probability: (0.4)(0.4) = 0.16. f*f(20) = 0.16.
List the possible ways the two variables can add to 30:
10 and 20, with probability: (0.4)(0.5) = 0.20, or 20 and 10, with probability: (0.5)(0.4) = 0.20.
f*f(30) = 0.20 + 0.20 = 0.40. List the possible ways the two variables can add to 40:
10 and 30, with probability: (0.4)(0.1) = 0.04, or 20 and 20, with probability: (0.5)(0.5) = 0.25,
or 30 and 10, with probability: (0.1)(0.4) = 0.04. f*f(40) = 0.04 + 0.25 + 0.04 = 0.33.
List the possible ways the two variables can add to 50:
20 and 30, with probability: (0.5)(0.1) = 0.05, or 30 and 20, with probability: (0.1)(0.5) = 0.05.
f*f(50) = 0.05 + 0.05 = 0.10. List the possible ways the two variables can add to 60:
30 and 30, with probability: (0.1)(0.1) = 0.01. f*f(60) = 0.01.]
Exercise: For the distribution in the previous exercise, what is f*f*f?

[Solution: One can use the fact that f*f*f = (f*f)*f. f*f*f(30) = 0.064. f*f*f(40) = 0.240.
f*f*f(50) = 0.348. f*f*f(60) = 0.245. f*f*f(70) = 0.087. f*f*f(80) = 0.015. f*f*f(90) = 0.001.
x f*f(x) Product f(50-x) 50-x
10 40
20 0.16 0.016 0.1 30
30 0.4 0.2 0.5 20
40 0.33 0.132 0.4 10
50 0.1 0 0
60 0.01 0
Sum 1 0.348 1
10 50
20 0.16 0 40
30 0.4 0.04 0.1 30
40 0.33 0.165 0.5 20
50 0.1 0.04 0.4 10
60 0.01 0 0
Sum 1 0.245 1
10 60
20 0.16 0 50
30 0.4 0 40
40 0.33 0.033 0.1 30
50 0.1 0.05 0.5 20
60 0.01 0.004 0.4 10
Sum 1 0.087 1
Exercise: Let frequency be Binomial with parameters m = 3 and q = 0.2.

Let the severity have a discrete distribution such that f(10) = 0.4, f(20) = 0.5, f(30) = 0.1.
Calculate the distribution of aggregate losses, using the convolutions calculated in the previous
exercises.
[Solution: Recall that f*0 (0) ≡ 1.
n 0 1 2 3 Aggregate
Binomial 0.512 0.384 0.096 0.008 Density
x f*0 f f*f f*f*f
0 1 0.512000
10 0.4 0.153600
20 0.5 0.16 0.207360
30 0.1 0.40 0.064 0.077312
40 0.33 0.240 0.033600
50 0.10 0.348 0.012384
60 0.01 0.245 0.002920
70 0.087 0.000696
80 0.015 0.000120
90 0.001 0.000008
Sum 1 1 1 1 1
The aggregate density at 30 is: (0.384)(0.1) + (0.096)(0.40) + (0.008)(0.064) = 0.077312.
Comment: Since the Binomial Distribution and severity distribution have finite support, so does the
aggregate distribution. In this case the aggregate losses can only take on the values 0 through 90.]
In general, when the frequency distribution is Binomial, there are only a finite number of terms in the
sum used to get the aggregate density via convolutions:
m
m!
fA(x) = ∑ n! (m- n)!
qn (1- q)m - n fX* n (x) .
n=0
Density of a Compound Distribution in terms of Convolutions:
of passengers dropped off by any other taxicab. Then the aggregate number of passengers
dropped off per minute at the Heartbreak Hotel is a compound Poisson-Binomial distribution, with
parameters λ = 1.3, q = 0.4, m = 5.
Let the primary distribution be p and the secondary distribution be s and let c be the compound
distribution. Then we can write the density of c, in terms of a weighted average convolutions of s.
For example, assume we have 4 taxis. Then the distribution of the number of people is given by
the sum of 4 independent variables each distributed as per the secondary distribution, s.
This sum is distributed as the four-fold convolution of s: s* s * s* s = s* 4 .
The chance of having four taxis is the density of the primary distribution at 4: p(4).
Thus this possibility contributes p(4)s* 4 to the compound distribution c.
The possibility of n taxis contributes p(n)s* n to the compound distribution c.
Therefore, the compound distribution is the sum of such terms:29

∞
c(x) = ∑ p(n) s* n (x) .
n=0
Compound Distribution =
Sum over n of: (Density of primary distribution at n)(n-fold convolution of secondary distribution).
29
See Equation 9.3 in Loss Models. The same formula holds for the distribution of aggregate losses, where severity
takes the place of the secondary distribution.
Exercise: What is the four-fold convolution of a Binomial distribution, with parameters q = 0.4, m = 5.
[Solution: The sum of 4 independent Binomials each with parameters q = 0.4, m = 5 is a Binomial
with parameters q = 0.4, m = (4)(5) = 20.]
The n-fold convolution of a Binomial distribution, with parameters q = 0.4, m = 5 is a Binomial with
(5n)!
parameters q = 0.4, m = 5n. It has density at x of: 0.4x 0.65n-x.
x! (5n - x)!
Exercise: Write a formula for the density of a compound Poisson-Binomial distribution, with
parameters λ = 1.3, q = 0.4, m = 5.
[Solution:
∞
e- 1.3 1.3n
∑ ∑
(5n)!
c(x) = p(n) s * n (x) = 0.4 x 0.6n - x .]
n! (x!) (5n - x)!
n=0
One could perform this calculation in a spreadsheet as follows:

n 0 1 2 3 4 5
Poisson 0.27253 0.35429 0.23029 0.099792 0.032432 0.008432
Binomial Binomial Binomial Binomial Binomial Binomial Compound
m=0 m=5 m=10 m=15 m=20 m=25 Poisson-
x Binomial
0 1 0.07776 0.00605 0.000470 0.000037 0.000003 0.301522089
1 0.25920 0.04031 0.004702 0.000487 0.000047 0.101600875
2 0.34560 0.12093 0.021942 0.003087 0.000379 0.152585482
3 0.23040 0.21499 0.063388 0.012350 0.001937 0.137881306
4 0.07680 0.25082 0.126776 0.034991 0.007104 0.098817323
5 0.01024 0.20066 0.185938 0.074647 0.019891 0.070981212
6 0.11148 0.206598 0.124412 0.044203 0.050696418
7 0.04247 0.177084 0.165882 0.079986 0.033505760
8 0.01062 0.118056 0.179706 0.119980 0.021065986
9 0.00157 0.061214 0.159738 0.151086 0.012925619
10 0.00010 0.024486 0.117142 0.161158 0.007625757
11 0.007420 0.070995 0.146507 0.004278394
12 0.001649 0.035497 0.113950 0.002276687
13 0.000254 0.014563 0.075967 0.001138213
14 0.000024 0.004854 0.043410 0.000525897
15 0.000001 0.001294 0.021222 0.000221047
16 0.000270 0.008843 0.000083312
17 0.000042 0.003121 0.000027689
18 0.000005 0.000925 0.000007950
19 0.000000 0.000227 0.000001926
20 0.000000 0.000045 0.000000383
21 0.000007 0.000000061
22 0.000001 0.000000007
23 0.000000 0.000000001
24 0.000000 0.000000000
25 0.000000 0.000000000
Sum 1 1 1 1 1 1 0.997769395
For example, the density at 2 of the compound distribution is calculated as:

(0.27253)(0) + (0.35429)(0.34560) + (0.23029)(0.12093) + (0.099792)(0.021942) +
(0.032432)(0.003087) + (0.008432)(0.000379) = 0.1526.
Thus, there is a 15.26% chance that two passengers will be dropped off at the Heartbreak Hotel
during the next minute. Note that by not including the chance of more than 5 taxicabs in our
spreadsheet, we have allowed the calculation to fit in a finite sized spreadsheet, but have also left
out some possibilities.30
30
As can be seen, the computed compound densities only add to 0.998 < 1. The approximate compound densities
at x < 10 are fairly accurate; for larger x one would need a bigger spreadsheet.
Practical Issues:
When one has a frequency with infinite support and a discrete severity, while these calculations of
the aggregate distribution via convolutions are straightforward to perform on a computer, they can
get rather lengthy.31 Also if the severity distribution has a positive density at zero, then each
summation contains an infinite number of terms.32
When the frequency or primary distribution is a member of the (a, b, 0) class or (a, b, 1) class,
aggregate and compound distributions can also be computed via the Panjer Algorithm (Recursive
Method), to be discussed in a subsequent section. The Panjer Algorithm avoids some of these
practical issues.33
31
As stated in Section 9.5 of Loss Models, in order to compute the aggregate distribution up to n using
convolutions, the number of calculations goes up as n3 .
32
One can get around this difficulty when the frequency distribution can be “thinned”.
33
As stated in Section 9.5 of Loss Models, in order to compute the aggregate distribution up to n using the Panjer
Algorithm to be discussed subsequently, the number of calculations goes up as n2 .
Problems:
3.1 (3 points) There are either one, two or three claims, with probabilities of 60%, 30% and 10%,
respectively.
Each claim is of size $100 or $200, with probabilities of 80% and 20% respectively, independent of
the size of any other claim.
Calculate the aggregate distribution.
3.2 (2 points) The number of claims in a period has a Geometric distribution with mean 2.
The amount of each claim X follows P(X = x) = 0.50, x = 1, 2.
The number of claims and the claim amounts are independent.
S is the aggregate claim amount in the period.
Calculate Fs(3).
(A) 0.66 (B) 0.67 (C) 0.68 (D) 0.69 (E) 0.70
3.3 (3 points) The number of accidents per year follows a Binomial distribution with m = 2 and
q = 0.7. The number of claims per accident is Geometric with β = 1.
The number of claims for each accident is independent of the number of claims for any other accident
and of the total number of accidents.
Calculate the probability of 2 or fewer claims in a year.
A. Less than 80%
E. At least 86%
3.4 (3 points) For a certain company, losses follow a Poisson frequency distribution with mean 2 per
year, and the amount of a loss is 1, 2, or 3, each with probability 1/3.
Loss amounts are independent of the number of losses, and of each other.
What is the probability of 4 in annual aggregate losses?
A. 7% B. 8% C. 9% D. 10% E. 11%
3.5 (8 points) The number of accidents is either 0, 1, 2, or 3 with probabilities 50%, 20%, 20%, and
10% respectively.
The number of claims per accident is 0, 1, 2 or 3 with probabilities 30%, 40%, 20%, and 10%
respectively.
Calculate the distribution of the total number of claims.
3.6 (5A, 11/94, Q.37) (2 points) Let N = number of claims and S = X1 + X2 + ... + XN.
Suppose S has a compound Poisson distribution with Poisson parameter λ = 0.6.
The only possible individual claim amounts are $2,000, $5,000, and $10,000 with probabilities 0.6,
0.3, and 0.1, respectively. Calculate Prob[S ≤ $7000 | N ≤ 2].
3.7 (CAS3, 5/04, Q.37) (2.5 points)

An insurance portfolio produces N claims with the following distribution:
n P(N = n)
0 0.1
1 0.5
2 0.4
Individual claim amounts have the following distribution:
x fX(x)
0 0.7
10 0.2
20 0.1
Individual claim amounts and claim counts are independent.
Calculate the probability that the ratio of aggregate claim amounts to expected aggregate claim
amounts will exceed 4.
A. Less than 3%
E. At least 15%
3.1. The severity distribution is: f(100) = 0.8 and f(200) = 0.2.
f*f is: 200@64%, 300@32%, 400@4%, since theYpossible sums of two claims are:
100 200
100 200 300
200 300 400
with the corresponding probabilities:
0.8 0.2
0.8 64% 16%
0.2 16% 4%
f*f *f = f*(f*f) is: 300@51.6%, 400@38.4%, 500@9.6%, 600@0.8%,
since the possible sums of three claims are: Y
200 300 400
100 300 400 500
200 400 500 600
with the corresponding probabilities:
0.64 0.32 0.04
0.8 51.2% 25.6% 3.2%
0.2 12.8% 6.4% 0.8%
The aggregate distribution is Σ Prob(N = n) f*n .

n 0 1 2 3 Aggregate
0.00 0.60 0.30 0.10 Distribution
x f*0 f f*f f*f*f
0 1 0.0000
100 0.8 0.4800
200 0.2 0.64 0.3120
300 0.32 0.512 0.1472
400 0.04 0.384 0.0504
500 0.096 0.0096
600 0.008 0.0008
Sum 1 1 1 1 1.0000
For example, the probability that the aggregate distribution is 300 is:
(0.3)(0.32) + (0.1)(0.512) = 14.72%.
The aggregate distribution is:
100@48%, 200@31.2%, 300@14.72%, 400@5.04%, 500@.96%, 600@.08%.
Comment: One could instead use semi-organized reasoning. For example, the aggregate can be
300 if either one has 2 claims of sizes 100 and 200, or one has 3 claims each of size 100.
This has probability of: (30%)(2)(80%)(20%) + (10%)(80%)(80%)(80%) = 14.72%.
3.2. C. For the Geometric with β = 2: f(0) = 1/3, f(1) = 2f(0)/3 = 2/9,
f(2) = 2f(1)/3 = 4/27, f(3) = 2f(2)/3 = 8/81.
The ways in which the aggregate is ≤ 3:
0 claims: 1/3 = 0.3333. 1 claim: 2/9 = 0.2222.
2 claims of sizes 1 & 1, 1 & 2, or 2 & 1: (3/4)(4/27) = 1/9 = 0.1111.
3 claims of sizes 1 & 1 & 1: (1/8)(8/81) =1/81 = 0.0123.
Distribution of aggregate at 3 is: 0.3333 + 0.2222 + 0.1111 + 0.0123 = 0.679.
Alternately, using convolutions:
n 0 1 2 3 Aggregate
Geometric 0.3333 0.2222 0.1481 0.0988 Distribution
x f*0 f f*f f*f*f
0 1 0.3333
1 0.50 0.1111
2 0.50 0.250 0.1481
3 0.500 0.1250 0.0864
Comment: Similar to but easier than 3, 11/02, Q.36.
One could also use the Panjer Algorithm (Recursive Method).
3.3. A. For the Binomial with m = 2 and q = 0.7: f(0) = 0.32 = 0.09, f(1) = (2)(0.3)(0.7) = 0.42,
f(2) = 0.72 = 0.49.
For a Geometric with β = 1, f(0) = 1/2, f(1) = 1/4, f(2) = 1/8.
The number of claims with 2 accidents is the sum of two independent Geometrics, which is a
Negative Binomial with r = 2 and β =1, with:
f(0) = 1/(1+ β)r = 1/4. f(1) = rβ/(1+ β)r+1 = 1/4. f(2) = {r(r+1)/2}β2 /(1+ β)r+2 = 3/16.
Using convolutions:
n 0 1 2 Compound
Poisson 0.09 0.42 0.49 Distribution
x f*0 f f*f
0 1 0.5000 0.2500 0.4225
1 0.2500 0.2500 0.2275
2 0.1250 0.1875 0.1444
Prob[2 or fewer claims] = 0.4225 + 0.2275 + 0.1444 = 0.7944.
Comment: One could instead use the Panjer Algorithm (Recursive Method).
3.4. E. For the Poisson with λ = 2: f(0) = e-2 = 0.1353, f(1) = 2f(0) = 0.2707,
f(2) = 2f(1)/2 = 0.2707, f(3) = 2f(2)/3 = 0.1804, f(4) = 2f(3)/4 = 0.0902.
Using convolutions:
n 0 1 2 3 4 Aggregate
Poisson 0.1353 0.2707 0.2707 0.1804 0.0902 Distribution
x f*0 f f*f f*f*f f*f*f*f
0 1 0.1353
1 0.3333 0.0902
2 0.3333 0.1111 0.1203
3 0.3333 0.2222 0.0370 0.1571
4 0.3333 0.1111 0.0123 0.1114
Prob[Aggregate = 4] = Prob[2 claims]Prob[2 claims sum to 4] +
Prob[3 claims]Prob[3 claims sum to 4] + Prob[4 claims]Prob[4 claims sum to 4] =
(0.2707)(0.3333) + (0.1804)(0.1111) + (0.0902)(0.0123) = 0.1114.
Comment: One could instead use the Panjer Algorithm (Recursive Method).
3.5. The possible sums of the numbers of claims for 2 accidents is:
0 1 2 3
0 0 1 2 3
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
With the corresponding
Probabilities ofprobabilities:
0.3 0.4 0.2 0.1
0.3 9% 12% 6% 3%
0.4 12% 16% 8% 4%
0.2 6% 8% 4% 2%
0.1 3% 4% 2% 1%
f*f = 0@9%, 1@24%, 2@28%,3@22%, 4@12%, 5@4%, 6@1%.
The possible sums of the numbers of claims for 3 accidents is:

0 1 2 3
0 0 1 2 3
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
5 5 6 7 8
6 6 7 8 9
With the corresponding
Probabilities ofprobabilities:
0.3 0.4 0.2 0.1
0.09 2.7% 3.6% 1.8% 0.9%
0.24 7.2% 9.6% 4.8% 2.4%
0.28 8.4% 11.2% 5.6% 2.8%
0.22 6.6% 8.8% 4.4% 2.2%
0.12 3.6% 4.8% 2.4% 1.2%
0.04 1.2% 1.6% 0.8% 0.4%
0.01 0.3% 0.4% 0.2% 0.1%
f*f*f = 0@2.7%, 1@10.8%, 2@19.8%,3@23.5%, 4@20.4%, 5@13.2%, 6@6.5%, 7@2.4%,
8@0.6, 9@0.1%.
n 0 1 2 3 Compound
0.5 0.2 0.2 0.1 Distribution
x f*0 f f*f f*f*f
0 1 0.30 0.09 0.027 0.5807
1 0.40 0.24 0.108 0.1388
2 0.20 0.28 0.198 0.1158
3 0.10 0.22 0.235 0.0875
4 0.12 0.204 0.0444
5 0.04 0.132 0.0212
6 0.01 0.065 0.0085
7 0.024 0.0024
8 0.006 0.0006
9 0.001 0.0001
sum 1 1 1 1 1
For example, (0.2)(0.1) + (0.2)(0.22) + (0.1)(0.235) = 0.0875.
3.6. f*f(4) = Prob[1st claim = 2]Prob[2nd claim = 2] = 0.62 = 0.36.

f*f(7) = Prob[1st claim = 2]Prob[2nd claim = 5] + Prob[1st claim = 5]Prob[2nd claim = 2] =
(2)(.6)(.3) = 0.36. f*f(10) = Prob[1st claim = 5]Prob[2nd claim = 5] = 0.32 = 0.09.
The aggregate distribution is Σ Prob(N = n) f*n .
Given that N ≤ 2, we need only calculate the first three terms of that sum.
n 0 1 2 Aggregate
Poisson 0.5488 0.3293 0.0988 Distribution
x f*0 f f*f
0 1 0.5488
1 0.0000
2 0.6 0.1976
3 0.0000
4 0.36 0.0356
5 0.3 0.0988
6 0.0000
7 0.36 0.0356
8 0.0000
9 0.0000
10 0.1 0.09 0.0418
Sum 0.9581
The probability that N ≤ 2 and the aggregate losses are less ≤ 7 is:
0.5488 + 0.1976 + 0.0356 + 0.0988 + 0.0356 = 0.9164.
The probability that N ≤ 2 is 0.5488 + 0.3293 + 0.0988 = 0.9769. Thus Prob[S ≤ $7000 | N ≤ 2]
= Prob[ S ≤ $7000 and N ≤ 2] / Prob[N ≤ 2] = 0.9164 / 0.9769 = 0.938.
Comment: If there are more than 3 claims, the aggregate losses are > 7. The chance of three claims
all of size 2 is (e-0.6 0.63 / 6)(0.63 ) = 0.0043. Thus the unconditional probability that S ≤ 7 is
0.9164 + 0.0043 = 0.9207.
3.7. A. Mean Frequency is 1.3. Mean severity is 4. Mean Aggregate is: (1.3)(4) = 5.2.
Prob[Agg > (4)(5.2) = 20.4] = Prob[Agg ≥ 30].
The aggregate is ≥ 30 if there are two claims of sizes: 10 and 20, 20 and 10, or 20 and 20.
Prob[Agg ≥ 30] = (0.4) {(2)(0.2)(0.1) + 0.12 } = 2%.
2016-C-3, Aggregate Distributions §4 Generating Functions, HCM 10/21/15, Page 74
Section 4, Generating Functions34
There are a number of different generating functions, with similar properties. On this exam, the
the Probability Generating Function (p.g.f.)35 is used for working with frequency distributions.
On this exam, the Moment Generating Function (m.g.f.) and the Probability Generating
Function are used for working with aggregate distributions. Other generating functions include: the
Characteristic Function, the Laplace Transform, and the Cumulant Generating Function.
Name Symbol Formula
Probability Generating Function P X(t) E[tx ] = MX(ln(t))
Moment Generating Function M X(t) E[et x] = PX( et )
Characteristic Function ϕX(t) E[eitx] = MX(it)
Laplace Transform LX(t) E[e-tx]
Cumulant Generating Function ψX(t) ln MX(t) = ln E[etx]
Moment Generating Functions:36
The moment generating function is defined as MX(t) = E[et x].
The moment generating function for a continuous loss distribution with support from 0 to ∞ is given
by:37
∞
∫0
M(t) = E[ext] = f(x) ext dx .
34
See Section 3.3 of Loss Models. Also see page 38 of Actuarial Mathematics, not on the Syllabus.
35
36
Moment Generating Functions are used in the study of Aggregate Distributions and Continuous Time Ruin
Theory. Continuous Time Ruin Theory is not on the syllabus.
See either Chapter 13 of Actuarial Mathematics or Chapter 11 of the Third Edition of Loss Models.
37
In general the integral goes over the support of the probability distribution.
In the case of discrete distributions, one substitutes summation for integration.
For example for the Exponential distribution:

∞ ∞ x= ∞
M(t) = ∫0 f(x) ext dx =
∫0 {e - x / θ / θ } ext dx = (1/ θ)ex(t - 1/ θ) / (t - 1/ θ)]
x= 0
= 1 / (1 - θt), for t < 1/θ.
Exercise: What is the moment generating function for a uniform distribution on [3, 8]?
[Solution: M(0) = E[ex0] = E[1] = 1. M(t) = E[ext] =
8
x =8
∫3 (1/ 5)e xt dx = (1/ 5)e / t ]
xt
= (e8t - e3t) / 5t, for t ≠ 0.]
x=3
The Moment Generating Functions of severity distributions, when they exist, are given
in Appendix A of Loss Models. The Probability Generating Functions of frequency
distributions are given in Appendix B of Loss Models.
M(t) = P(et ) .
Table of Moment Generating Functions
Distribution38 Parameters Moment Generating Function Support of M.G.F.
Uniform on [a, b] (ebt - eat)/ t(b-a) 39
Normal µ, σ exp[µt + σ2t2 /2]
Exponential θ 1 / (1 - θt) t < 1/θ
Gamma α, θ (1 - θt)−α t < 1/θ
Inverse Gaussian µ, θ exp[(θ / µ) (1 - 1 - 2tµ2 / θ )] t < θ/ 2µ2
Bernoulli q qet + 1 - q
Binomial q, m (qet + 1 - q)m
Poisson λ exp[λ(et - 1)]
Geometric β 1/{1 - β(et - 1)} et < (1+β)/β
Negative Binomial r, β {1 - β(et - 1)}-r et < (1+β)/β
Exercise: Assume X is Normally Distributed with parameters µ = 2 and σ = 3. What is E[etX]?

[Solution: If X is Normally distributed with parameters µ = 2 and σ = 3, then tX is Normally distributed
with parameters µ = 2t and σ = 3t. ⇒ etX is LogNormally distributed with µ = 2t and σ = 3t.
E[etX] = mean of a LogNormal Distribution = exp[2t + (3t)2 /2] = exp[2t + 4.5t2 ]. ]
X is Normal(µ, σ) ⇒ tX is Normal(tµ, tσ) ⇒ etX is LogNormal(tµ, tσ).
X is Normal(µ, σ) ⇒ MX(t) ≡ E[ext] = mean of LogNormal(tµ, tσ) = exp[µt + σ2t2 /2].40
The Moment Generating Function of a Normal Distribution is: M(t) = exp[tµ + t2 σ2 /2].
38
As per Loss Models.
39
M(0) = 1.
40
The mean of a LogNormal Distribution is: exp[(first parameter) + (second parameter)2 /2].
Discrete Distributions:
For a discrete distribution, we substitute summation for integration. For example, for the Poisson
Distribution the m.g.f. is determined as follows:
∞ ∞
M(t) = E[ext] = ∑ (e- λ x
λ / x!) etx = ∑ (λet)x / x! = e−λ exp[λet] = exp[λ(et -1)].
x=0 x=0
Exercise: Severity is 300 with probability 60% and 700 with probability 40%.
What is the moment generating function for severity?
[Solution: M(t) = E[ext] = 0.6e300t + 0.4e700t.]
Relation of Moment and Probability Generating Functions:
M(t) = E[ext] = E[(et)x] = P(et). Thus one can write the Moment Generating Function in terms of the
Probability Generating Function, M(t) = P(et ).41 For example, for the Poisson Distribution,
P(t) = exp[λ(t-1)]. Therefore, M(t) = P(et) = exp[λ(et -1)].
On the other hand, if one knows the Moment Generating Function, one can get the Probability
Generating Function as: P(t) = M(ln(t)).
Exercise: What is the Moment Generating Function of a Negative Binomial Distribution as per
Loss Models?
[Solution: As shown in Appendix B.2.1.4 of Loss Models, for the Negative Binomial Distribution:
P(t) = {1 - β(t - 1)}-r. Thus M(t) = P(et) = {1 - β(et - 1)}-r.
Comment: Instead, one could calculate E[ext] for a Geometric Distribution as 1/{1 - β(et - 1)}, and
since a Negative Binomial is a sum of r independent Geometrics, M(t) = {1 - β(et - 1)}-r.]
41
The Probability Generating Functions of frequency distributions are given in Appendix B of Loss Models. The
Moment Generating Functions of severity distributions, when they exist, are given in Appendix A of Loss Models.
Properties of Moment Generating Functions:
The Moment Generating Function has useful properties. For example, for X and Y independent
variables, the moment generating function of their sum is:
MX+Y(t) = E[e(X+Y) t] = E[eXt eYt] = E[eXt]E[eYt] = MX(t)MY(t).
The moment generating function of the sum of two independent variables is the product
of their moment generating functions:
M X+Y(t) = MX(t) MY(t).
Exercise: X and Y are each Exponential with mean 25. X and Y are independent.
What is the m.g.f. of their sum?
[Solution: X and Y each have m.g.f.: 1/(1 - 25t). Thus the m.g.f. of their sum is: 1/(1 - 25t)2 .
Comment: This is the m.g.f. of a Gamma Distribution with θ = 25 and α = 2.]
Exercise: X follows an Inverse Gaussian Distribution with µ = 10 and θ = 8.

Y follows an Inverse Gaussian Distribution with µ = 5 and θ = 2. X and Y are independent.
What is the m.g.f. of their sum?
[Solution: The m.g.f. of an Inverse Gaussian Distribution is:
M(t) = exp[(θ/µ){1 - 1 - 2µ2 t / θ }]. Therefore, MX(t) = exp[0.8{1 - 1 - 25t }] and
M Y(t) = exp[0.4{1 - 1 - 25t }]. MX+Y(t) = MX(t) MY(t) = exp[1.2{1 - 1 - 25t }].
Comment: This is the m.g.f. of another Inverse Gaussian with µ = 15 and θ = 18.]
Since the moment generating function of the sum of two independent variables is the product of their
moment generating functions, the Moment Generating Function converts convolution into
multiplication:
M f * g = Mf M g .
The Moment Generating Function for a sum of independent variables is the product of
the Moment Generating Functions of each of the variables. Of particular importance for
working with aggregate losses, the sum of n independent, identically distributed variables
has the Moment Generating Function taken to the power n.
The m.g.f. of f*n is the nth power of the m.g.f. of f.42

42
Using characteristic functions rather than Moment Generating Functions, this is the key idea behind the
Heckman-Meyers algorithm not on the Syllabus. The Robertson algorithm, not on the Syllabus, relies on the similar
properties of the Fast Fourier Transform. See Section 9.8 of the Third Edition of Loss Models.
For example, MX+X+X(t) = MX(t)3 . Thus the Moment Generating Function for a sum of 3
independent, identical, Exponential variables is M(t) = {1/(1 - tθ)}3 , for t < 1/θ, the moment
generating function of a Gamma Distribution with α = 3.
Exercise: What is the m.g.f. for the sum of 5 independent Exponential Distributions, each with mean
17?
[Solution: M(t) = {1/(1 - 17t)}5 .]
Adding a constant to a variable, multiplies the m.g.f. by e to the power of that constant times t:
M X+b (t) = E[e(x+b)t] = ebt E[ext] = ebt MX(t).
Multiplying a constant times a variable, gives an m.g.f that is the original m.g.f at t times that constant:
M cX(t) = E[ecxt] = MX(ct).
Exercise: There is uniform inflation of 4% between 2001 and 2002. What is the m.g.f. of the severity
distribution in 2002 in terms of that in 2001?
[Solution: y = 1.04x, therefore, M2002(t) = M2001(1.04t).]
For example, if losses in 2001 follow a Gamma Distribution with α = 2 and θ = 1000, then in 2001
M(t) = (1 - 1000t)-2. If there is uniform inflation of 4% between 2001 and 2002, then in 2002 the
m.g.f. is: {1 - 1000(1.04)t}-2 = (1 - 1040t)-2, which is that of a Gamma Distribution with α = 2 and
θ = 1040.
Exercise: What is the m.g.f. for the average of 5 independent Exponential Distributions, each with
mean 17?
[Solution: Their average is their sum multiplied by 1/5. MY/5(t) = MY(t/5).
Therefore, the m.g.f. of the average is the m.g.f. of the sum at t/5: {1/(1 - 17t/5)}5 .]
In general, the Moment Generating Function of the average of n independent identically distributed
variables is the nth power of the Moment Generating Function of t/n. For example, the average of n
independent Geometric Distribution each with parameter β, each with Moment Generating Function:
(1 - β(et - 1))-1, has m.g.f.: (1 - β(et/n - 1))-n.
In addition, the moment generating function determines the distribution, and vice-versa.
Therefore, one can take limits of a distribution by instead taking limits of the Moment Generating
Function.
Exercise: Use moment generating functions to take the limit of Negative Binomial Distributions, such
that rβ = 7 as β → 0.
[Solution: The moment generating function of a Negative Binomial is:
(1 - β(et-1))-r, which for r = 7/β is: (1 - β(et - 1))-7/β.
ln ((1 - β(et - 1))-7/β) = -(7/β) ln[1 - β(et - 1)] ≅ -(7/β){- β(et - 1)} = 7(et - 1).
Thus the limit as β → 0 of the m.g.f. is exp[7(et - 1)], which is the m.g.f. of a Poisson with mean 7.
Thus the limit of these Negative Binomials is a Poisson with the same mean.]
Moments and the Moment Generating Function:
at zero, by reversing the order of integration and the taking of the derivative.
M′(s) = ∫ f(x) x exs dx .

M′(0) = ∫ f(x) x dx = E[X].
M′′(s) = ∫ f(x) x2 exs dx .

M′′(0) = ∫ f(x) x2 dx = E[X2].
M(0) = E[X0 ] = 1
M ′(0) = E[X]
M ′′(0) = E[X2 ]
M′′′(0) = E[X3 ]
M ( n )(0) = E[Xn ]
For example, for the Gamma Distribution: M(t) = (1 - tθ)−α.
M′(t) = θα (1 - tθ)−(α+1). M′(0) = αθ = mean.
M′′(t) = θ2 α(α+1)(1 - tθ)−(α+2). M′′(0) = θ2 α(α+1) = second moment.
M′′′(t) = θ3 α(α+1)(α+2)(1 - tθ)−(α+3). M′′′(0) = θ3 α(α+1)(α+2) = third moment.

Exercise: A distribution has m.g.f. M(t) = exp[11t + 27t2 ].

What are the mean and variance of this distribution?
[Solution: M′(t) = M(t)(11 + 54t). mean = M′(0) = 11.
M′′(t) = Mʼ(t)(11 + 54t) + 54M(t). 2nd moment = Mʼʼ(0) = (11)(11) + 54 = 175.

Variance = 175 - 112 = 54.]
Moment Generating Function as a Power Series:
One way to remember the relationship between the m.g.f. and the moments is to expand the
exponential into a power series:
⎡∞ ⎤ ∞
M(t) = E[ext] = E⎢
⎢∑ (xt)n / n!⎥ = ∑ E[Xn] t n / n! = Σ(nth moment) tn /n!.
⎥ n=0
⎣n=0 ⎦
So the nth moment of the distribution is the term multiplying tn /n! in the power series representation
of its m.g.f., M(t).
∞
For example, the power series for 1/(1-y) is: ∑ yn / n!,
n=0
while the m.g.f. of an Exponential Distribution is: M(t) = 1/(1 - θt). Therefore,
∞ ∞
M(t) = 1 / (1 - θt) = ∑ (θt)n / n! = ∑ θn tn / n! .
n=0 n=0
Therefore, the nth moment of an Exponential is θn .
When one differentiates n times the power series for M(t), the first n terms vanish.
dn (tn / n!)
= 1, and the remaining terms all still have powers of t, which will vanish when we set
dtn
t = 0. Therefore:
⎛∞ ⎞
dn⎜
⎜ ∑ E[X i ] ti / i!⎟
⎟
⎝ i=0 ⎠
M n (0) = n at t equal to zero = nth moment.
dt
The Moments of a Negative Binomial Distribution:
As discussed previously, the probability generating function of a Negative Binomial Distribution is

M(t) = P(et) = 1/{1 - β(et - 1)}r.
Exercise: Using its moment generating function, determine the first four moments of a
Negative Binomial Distribution.
[Solution: Mʼ(t) = rβet/{1 - β(et - 1)}r+1. First Moment = Mʼ(0) = rβ.
Mʼʼ(t) = Mʼ(t) + r(r+1)β2e2t/{1 - β(et - 1)}r+2. Second Moment = Mʼʼ(0) = rβ + r(r+1)β2.
Mʼʼʼ(t) = Mʼʼ(t) + 2r(r+1)β2e2t/{1 - β(et - 1)}r+2 + r(r+1)(r+2)β3e3t/{1 - β(et - 1)}r+3.
Third Moment = Mʼʼʼ(0) = rβ + 3 r(r+1)β2 + r(r+1)(r+2)β3.
Mʼʼʼʼ(t) = Mʼʼʼ(t) + 4r(r+1)β2e2t/{1 - β(et - 1)}r+2 + 2r(r+1)(r+2)β3e3t/{1 - β(et - 1)}r+3 +
3r(r+1)(r+2)β3e3t/{1 - β(et - 1)}r+3 + r(r+1)(r+2)(r+3)β4e4t/{1 - β(et - 1)}r+4.
Fourth Moment = Mʼʼʼʼ(0) = rβ + 3 r(r+1)β2 + r(r+1)(r+2)β3 + 4r(r+1)β2 + 2r(r+1)(r+2)β3
+ 3r(r+1)(r+2)β3 + r(r+1)(r+2)(r+3)β4 = rβ + 7 r(r+1)β2 + 6r(r+1)(r+2)β3 + r(r+1)(r+2)(r+3)β4.]
Exercise: Determine the CV, skewness, and kurtosis of a Negative Binomial Distribution.
[Solution: Variance = rβ + r(r+1)β2 - (rβ)2 = rβ(1 + β).
Coefficient of Variation = rβ(1 + β) / rβ = (1+ β) / (rβ) .
Third Central Moment = E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 =
rβ + 3 r(r+1)β2 + r(r+1)(r+2)β3 - 3(rβ){rβ + r(r+1)β2} + 2(rβ)3 =
r(β + 3β2 + 2β3) = rβ(1 + β)(1 + 2β).
Skewness = rβ(1 + β)(1 + 2β)/{rβ(1 + β)}1.5 = (1 + 2β) / β(1 + β) .

Fourth Central Moment = E[X4 ] - 4E[X]E[X3 ] + 6E[X]2 E[X2 ] - 3E[X]4 =
rβ + 7 r(r+1)β2 + 6r(r+1)(r+2)β3 + r(r+1)(r+2)(r+3)β4
- 4(rβ){rβ + 3 r(r+1)β2 + r(r+1)(r+2)β3} + 6(rβ)2 { rβ + r(r+1)β2} -3(rβ)4 =
rβ{1 + 7β + 12β2 + 6β3 + 3rβ(1 + β)2 }.
Kurtosis = rβ{1 + 7β + 12β2 + 6β3 + 3rβ(1 + β)2 }/{rβ(1 + β)}2 = 3 + {6β2 + 6β + 1} / {(1 + β)rβ}.]
Therefore, for the Negative Binomial Distribution:

Skewness = {3 Variance - 2 mean + 2(Variance - mean)2 /mean}/Variance1.5.43
43
See equation 6.7.8 in Risk Theory by Panjer and Willmot.
Calculating the Moment Generating Function of an Inverse Gaussian Distribution:
∞
Exercise: What is the integral ∫0 x - 3/ 2 exp[-(a2x + b2 / x)] dx ?
Hint: An Inverse Gaussian density integrates to unity from zero to infinity.

[Solution: The density of an Inverse Gaussian with parameters µ and θ is:
f(x) = θ / (2 π) x-1.5 exp(-θ(x/µ -1)2 / (2x)) = θ / (2 π) x-1.5 exp(-θx/ 2µ2 + θ/µ - θ/ 2x).
Let a2 = θ/ 2µ2 and b2 = θ/ 2, then θ = 2b2 and µ = b/a.
Then f(x) = b2 / π x-1.5 exp(-a2 + 2ba - b2 /x) = e2ba (b/ π ) x-1.5 exp(-a2 - b2 /x) .
∞
∫0
Since this integrates to unity: (e2ba b/ π ) x - 3/ 2 exp[-(a2x + b2 / x)] dx = 1.
∫0 x - 3/ 2 exp[-(a2x + b2 / x)] dx = π e-2ba / b.
This is a special case of a Modified Bessel Function of the Third Kind, K-.5. See for example,
Appendix C of Insurance Risk Models by Panjer & Willmot. ]
Exercise: Calculate the Moment Generating Function for an Inverse Gaussian Distribution with
parameters µ and θ. Hint: Use the result of the previous exercise.
[Solution: The Moment Generating Function is the expected value of ezx.
∞
M(z) = ∫ ezx f(x) dx = ∫0 ezx θ / 2π x- 1.5 exp(-θx / 2µ2 + θ / µ - θ / 2x) dx =
∞
eθ/µ θ / (2 π) ∫0 x - 1.5 exp[-(θ / 2µ2 - z)x - θ / 2x] dx = eθ/µ θ / (2 π) { π e-2ba / b} =
eθ/µ exp[-(θ/µ) 1 - 2zµ2 / θ ] = exp[ (θ/µ) (1 - 1 - 2zµ2 / θ ) ].

Where we have used the result of the previous exercise with a2 = θ/ 2µ2 - z and b2 = θ/2.
The former requires that z < θ/ 2µ2 .
Note that ba = b2 a2 = (θ / 2) (θ / 2µ2 - z) = (θ/ 2µ) 1 - 2zµ2 / θ . ]
An Example, Calculating the Skewness of an Inverse Gaussian Distribution:
Let's see how the Moment Generating Function could be used to determine the skewness of an
Inverse Gaussian Distribution.
We can use the Moment Generating Function of the Inverse Gaussian to calculate its moments.44
M(t) = exp[(θ/µ) {1 - 1 - 2tµ2 / θ }]. M(0) = 1.
M'(t) = M(t) µ/ 1 - 2tµ2 / θ . M'(0) = µ = mean.
M''(t) = M'(t) µ/ 1 - 2tµ2 / θ + M(t) (µ3/θ) (1 - 2tµ2/θ)-3/2.
M''(0) = µ2 + µ3/θ = second moment. Variance = µ3/θ.
M'''(t) = M''(t)µ/ 1 - 2tµ2 / θ + 2M'(t) (µ3/θ) (1 - 2tµ2/θ)-3/2 + M(t) (3µ5/θ2) (1 - 2tµ2/θ)-5/2.
M'''(0) = (µ2 + µ3/θ)µ + 2µ(µ3/θ) + 3µ5/θ2 = µ3 + 3(µ4/θ)(1 + µ/θ) = third moment.
Exercise: What is the coefficient of variation of an Inverse Gaussian Distribution with µ and θ?
[Solution: Variance / Mean = (µ3/2/θ1/2)/µ = µ / θ. ]
Exercise: What is the skewness of an Inverse Gaussian Distribution with parameters µ and θ?
[Solution: Third Central Moment = µ3 + 3(µ4/θ)(1+ µ/θ) - 3µ (µ2 + µ3/θ) + 2µ3 = 3µ5/θ2.
Skewness = Third Central Moment / (Variance)1.5 = (3µ5/θ2)/ (µ3/θ)1.5 = 3 µ / θ . ]
Thus we see that the skewness of the Inverse Gaussian Distribution, 3 µ / θ , is always three times
its coefficient of variation, µ / θ . In contrast, for the Gamma Distribution its skewness of 2/ α , is
always twice times its coefficient of variation of 1/ α .
Existence of Moment Generating Functions:
Moment Generating Functions only exist for distributions all of whose moments exist. Thus for
example, the Pareto does not have all of its moments, so that its Moment Generating Function does
not exist. If a distribution is short tailed enough for all of its moments to exist, then its moment
generating function may or may not exist. While the LogNormal Distribution has all of its moments
exist, its Moment Generating Function does not exist.45 The Moment Generating Function of the
Transformed Gamma, or its special case the Weibull, only exists if τ ≥ 1.46
44
One can instead use the cumulant generating function, ln M(z), to get the cumulants.
See for example, Kendall's Advanced Theory of Statistics, page 412.
45
The LogNormal is the heaviest-tailed distribution all of whose moments exist.
46
While if τ > 1, the m.g.f. of the Transformed Gamma exists, the m.g.f. is not a well known function.
at zero. Thus if the Moment Generating Function exists (within an interval around zero) then so do all
the moments. However the converse is not true.
As discussed previously, the moment generating function when it exists can be written as a power
series in t, where E[Xn ] is the nth moment about the origin of the distribution:
∞
M(t) = ∑ E[Xn] t n / n! .
n=0
In order for the moment generating function to converge, in an interval around zero, the moments
E[Xn ] may not grow too quickly as n gets large.
For example, the LogNormal Distribution has moments: E[Xn ] = exp[nµ + 0.5 n2 σ2] =
exp[nµ] exp[0.5 n2 σ2] . Thus in this case E[Xn ] tn / n! = exp[nµ] exp[5 n2 σ2] tn / n! . Using
Sterling's Formula, for large n, n! increases approximately as: en nn . Thus E[Xn ] tn / n!
increases approximately as: exp[n] exp[n2] tn / (en nn ) = exp[n2] (t/n)n . Thus as n increases,
ln[E[Xn ] tn / n!] increases approximately as: n2+ nln(t) - nln(n) = n{n + ln(t) - ln(n)}. Since n
increases more quickly than ln(n), this expression approaches infinity as n approaches infinity.
Thus so does E[Xn ] tn / n! . Since the terms of the power series go to infinity, the sum does not
converge.47 Thus for the LogNormal Distribution the Moment Generating Function fails to exist.
In general, the Moment Generating Function of a distribution exists if and only if the distribution
has a tail which is "exponential bounded."48 A distribution is exponentially bounded if for some
K > 0, c > 0 and for all x: 1 - F(x) ≤ Ke-cx. In other words, the survival function has to decline at
least exponentially.
For example, for the Weibull Distribution the survival function is exp(-cxτ). For τ > 1 this survival
function declines faster than e-x and thus the Weibull is exponentially bounded.
For τ < 1 this survival function declines slower than e-x and thus the Weibull is not
exponentially bounded.49 Thus the Weibull for τ > 1 has a Moment Generating Function,
while for τ < 1 it does not.50
47
In fact, in order for the power series to converge the terms have to decline faster than 1/n.
48
See page 186 of Adventures in Stochastic Processes by Sidney L. Resnick.
49
For τ = 1 one has an Exponential Distribution which is exponentially bounded and its m.g.f. exists.
50
The Transformed Gamma has the same behavior as the Weibull; for τ > 1 the Moment Generating Function exists
and the distribution is lighter-tailed than τ < 1 for which the Moment Generating Function does not exist. For a
Transformed Gamma with τ = 1, one gets a Gamma, for which the m.g.f. exists.
Characteristic Function:
The Characteristic Function is defined as ϕX(z) = E[eizx] = MX(iz) = PX(eiz).51

The Characteristic Function has the advantage that it exists for all z.
However, Characteristic Functions involve complex variables:
ϕX(z) = E[eizx] = E[cos(zx) + i sin(zx)] = E[cos(zx)] + i E[sin(zx)].
One can obtain most of the same useful results using either Moment Generating Functions,
Probability Generating Functions, or Characteristic Functions.
Cumulant Generating Function:
The Cumulant Generating Function is defined as the natural log of the Moment Generating Function.52
ψX(t) = ln MX(t) = ln E[etx]. The cumulants are then obtained from the derivatives of the Cumulant
Generating Function at zero. The first cumulant is the mean. ψʼ(0) = E[X].
The 2nd and 3rd cumulants are equal to the 2nd and 3rd central moments.53
Thus one can obtain the variance as ψʼʼ(0).
ψʼʼ(0) = Var[X].54
d2 (ln MX(t)) / dt2 | t =0 = Var[X]. d3 (ln MX(t)) / dt3 | t =0 = 3rd central moment of X.
Cumulants of independent variables add. Thus for X and Y independent the 2nd and 3rd central
moments add.55
Exercise: What is the cumulant generating function of an Inverse Gaussian Distribution?

[Solution: M(t) = exp[(θ/µ) {1 - 1 - 2µ2 t / θ }]. ψ(t) = ln M(t) = (θ/µ) {1 - 1 - 2µ2 t / θ }. ]
Exercise: Use the cumulant generating function to determine the variance of an Inverse Gaussian
Distribution.
[Solution: ψʼ(t) = (θ/µ)(µ2/θ) (1 - 2tµ2/θ)-.5 = µ (1 - 2tµ2/θ)-.5. Mean = ψʼ(0) = µ.
ψʼʼ(t) = µ (µ2/θ)(1 - 2tµ2/θ)-1.5. Variance = ψʼʼ(0) = µ3/θ. ]
51
See Definition 7.7 and Theorem 7.8 in Loss Models, not on the syllabus.
52
See Kendall's Advanced Theory of Statistics Volume 1, by Stuart and Ord
or Practical Risk Theory for Actuaries, by Daykin, Pentikainen and Pesonen.
53
The fourth and higher cumulants are not equal to the central moments.
54
See pages 387 and 403 of Actuarial Mathematics.
55
The 4th central moment and higher central moments do not add.
Exercise: Use the cumulant generating function to determine the skewness of an Inverse Gaussian
Distribution.
[Solution: ψʼʼʼ(t) = (µ3/θ) (3µ2/θ)(1 - 2tµ2/θ)-1.5. Third Central Moment = ψʼʼʼ(0) = 3µ5/θ2.
Skewness = Third Central Moment / Variance1.5 = (3µ5/θ2)/( µ3/θ)1.5 = 3 µ / θ . ]
Aggregate Distributions:
Generating functions are useful for working with the distribution of aggregate losses, when frequency
and severity are independent.
Let Agg be Aggregate Losses, X be severity and N be frequency, then the Moment Generating
Function of the Aggregate Losses can be written in terms of the p.g.f. of the frequency and m.g.f. of
the severity:
M Agg(t) = E[exp[tl]] = Σ E[exp[tl] | N = n] Prob(n = N) = Σ {E[exp[tx1 ] ... E[exp[txn ]}Prob(n = N) =
Σ MX(t)n Prob(n = N) = EN[MX(t)n] = PN(MX(t)).
MA g g(t) = PN [MX(t)] = MN[ln(MX(t))].
Exercise: Frequency is given by a Poisson with mean 7. Frequency and severity are independent.
What is the Moment Generating Function for the aggregate losses?
[Solution: As shown in Appendix B.2.1.1 of Loss Models, PN(z) = eλ(z-1) = exp7(z-1).
M Agg(t) = PN(MX(t)) = exp[7(MX(t) - 1)].]
In general, for any Compound Poisson distribution, MA g g(t) = exp[λ(MX(t) - 1)].
Exercise: Frequency is given by a Poisson with mean 7.

Severity is given by an Exponential with mean 1000. Frequency and severity are independent.
What is the Moment Generating Function for the aggregate losses?
[Solution: For the Exponential, MX(t) = 1/(1 - θt) = 1/(1 - 1000t), t < 0.001.
M Agg(t) = exp(λ(MX(t) - 1)) = exp(7(1/(1 - 1000t) -1)) = e(7000t)/(1-1000t) , t < 0.001.]
The p.g.f. of the Negative Binomial Distribution is: [1 - β(z-1)]-r. Thus for any Compound Negative
Binomial distribution, MAgg(t) = [1 - β(MX(t) - 1)]-r, for MX(t) < 1 + 1/β.
The probability generating function of the Aggregate Losses can be written in terms of the p.g.f. of
the frequency and p.g.f. of the severity:56
PAgg(t) = PN[PX(t)].
Exercise: What is the p.g.f of an Exponential distribution?

[Solution: The m.g.f. the Exponential distribution is: 1/(1 - θt). In general P[t] = M[ln(t)].
1
Thus for an Exponential distribution P(t) = .]
1 - θ ln[t]
Exercise: Frequency is given by a Poisson with mean 7.

Severity is given by an Exponential with mean 1000. Frequency and severity are independent.
What is the Probability Generating Function for the aggregate losses?
[Solution: The p.g.f. the Exponential distribution is: P(t) = 1/(1 - 1000 ln(t)), while that for the Poisson
is: P(z) = e7(z-1). Thus the p.g.f of aggregate losses is:
exp[7((1/(1 - 1000 ln(t)) -1] = exp[7000ln(t)/(1 - 1000 ln(t))] = t 7000/(1-1000ln(t)).]
Recall that a compound frequency distribution is mathematically equivalent to an aggregate

distribution. Therefore, for a compound frequency distribution, PN(z) = P1 (P2 (z)).
Exercise: What is the p.g.f of a compound Geometric-Poisson frequency distribution?

[Solution: For the Geometric primary distribution: P1 (t) = 1/{1 - β(t - 1)}. For the Poisson secondary
distribution: P2 (t) = exp[λ(t - 1)]. PN(t) = P1 (P2 (t)) = 1/{1 - β(exp[λ(t - 1)] - 1)}. ]
Exercise: Frequency is a compound Geometric-Poisson distribution, with β = 3 and λ = 7.

Severity is Exponential with mean 10. Frequency and severity are independent.
What is the p.g.f. of aggregate losses?
[Solution: The p.g.f for the frequency is: PN(t) = 1/(1 - 3(exp[7(t - 1)] - 1)).
The p.g.f. for the Exponential severity is: PX(t) = 1/(1 - 10 ln(t)).
PAgg(t) = PN(PX(t)) = 1/(1 - 3(exp[7(1/(1 - 10 ln(t)) - 1)] - 1)) =
1/{1- 3(exp[70 ln(t)/(1 - 10 ln(t))] - 1)} = 1/(4 - 3t70/(1-10ln(t))).]
In general, for a compound frequency distribution and an independent severity:

PAgg(t) = PN[PX(t)] = P1 [P2 (PX(t))].
56
This is the same result as for compound frequency distributions; the mathematics are identical.
The Laplace Transform for the aggregate distribution is: LA(z) = E[e-zA] = PN[LX[z]].57
The Characteristic Function for the aggregate distribution is: ϕA(z) = E[eizA] = PN[ϕX[z]].
Mixtures:
Exercise: Assume one has a two point mixture of distributions:

H(x) = pF(x) + (1-p)G(x). What is the Moment Generating Function of H?
[Solution: MH(t) = EH[ext] = pEF[ext] + (1-p)EG[ext] = pMF(t) + (1-p)MG(t).]
Thus the Moment Generating Function of a mixture is a mixture of the Moment

Generating Functions.58 In particular (1/4) + (3/4){1/(1 - 40t)} is the m.g.f. of a 2-point mixture of a
point mass at zero and an Exponential distribution with mean 40, with weights 1/4 and 3/4.
Exercise: What is the m.g.f. of a 60%-40% mixture of Exponentials with means of 3 and 7?
[Solution: 0.6/(1 - 3t) + 0.4/(1 - 7t).]
One can apply these same ideas to continuous mixtures. For example, assume the frequency of
each insured is Poisson with parameter λ, with the λ parameters varying across the portfolio via a
Gamma distribution with parameters α and θ.59 Then the Moment Generating Function of the
frequency distribution for the whole portfolio is the mixture of the individual Moment Generating
Functions.
For a given value of λ, the Poisson has an m.g.f. of exp[λ(et - 1)].
The Gamma density of λ is: f(λ) = λα−1 e−λ/θ θ−α / Γ(α).

∞ ∞
∫0 exp[λ(et - 1)] λ α - 1 e- λ / θ θ - α / Γ(α) dλ = {θ−α / Γ(α)} ∫0 λα - 1 exp[-λ(1 + 1/ θ - et )] dλ =

{θ−α / Γ(α)} (1 + 1/θ - et)−α Γ(α) = (θ + 1 - θet)−α = {1 - θ(et - 1)}−α.
This is the m.g.f. of a Negative Binomial Distribution, with r = α and β = θ. Therefore, the mixture of
Poissons via a Gamma, with parameters α and θ, is a Negative Binomial Distribution, with r = α and
β = θ.60
57
58
This applies equally well to n-point mixtures and continuous mixtures of distributions.
59
This is the well known Gamma-Poisson frequency process.
60
This same result was derived using Probability Generating Functions in “Mahlerʼs Guide to Frequency
Distributions.”
Policy Modifications:
Let F(x) be the ground up severity distribution.

Let PNumberLoss be the probability generating function of the number of losses.
Assume there is a deductible, d. Then the expected number of (non-zero) payments is less than
the expected number of losses.
The number of (non-zero) payments can be thought of as coming from a compound process.
First one generates a random number of losses. Then each loss has S(d) chance of being a non-
zero payment, independent of any other loss. This is mathematically equivalent to a compound
distribution with secondary distribution that is Bernoulli with q = S(d).61
The probability generating function of this Bernoulli is P(z) = 1 + S(d)(z - 1) = F(d) + S(d)z.
Therefore, the probability generating function of this compound situation is:
PNumberPayments(z) = PNumberLoss(F(d) + S(d)z).62
With a deductible, the severity distribution is altered.63

The per loss variable is zero with probability F(d) and GPerLoss(y) = F(y + d) for y > 0.
∞ ∞
Therefore, MPerLoss(t) = E[ety] = F(d)et0 +
∫0 ety f(y + d) dy = F(d) + ∫0 ety f(y + d) dy .
The distribution of the (non-zero) payments has been truncated and shifted from below at d.
G PerPayment(y) = {F(y + d) - F(d)}/S(d) for y > 0. gPerPayment(y) = f(y + d)/S(d) for y > 0.
∞
Therefore, MPerPayment(t) = E[ety] =
∫0 ety f(y + d)/ S(d) dy .
Therefore, MPerLoss(t) = F(d) + S(d) MPerPayment(t).64
As discussed previously, the aggregate distribution can be thought of either in terms of the per loss
variable or the per payment variable.
61
This same mathematical idea was used in proving thinning results in “Mahlerʼs Guide to Frequency Distributions.”
62
See Section 8.6 and Equation 9.30 in Loss Models.
63
64
Therefore, MAggregate(t) = PNumberLoss(MPerLoss(t)), and

M Aggregate(t) = PNumberPayments(MPerPayment(t)).
PNumberPayments(MPerPayment(t)) = PNumberLoss[F(d) + S(d)MPerPayment(t)] =

PNumberLoss(MPerLoss(t)), confirming that these two versions of MAggregate(t) are equal.
One can compute the moment generating function after the effects of coverage modifications.
Exercise: Prior to the effects of a deductible, the sizes of loss follow an Exponential distribution with
mean 8000. For a deductible of 1000, determine the moment generating function for the size of the
non-zero payments by the insurer.
∞
[Solution: MPerPayment(t) = E[ety] =
∫0 ety f(y + d)/ S(d) dy =
∞ ∞
∫0 ety {e - (y + 1000) / 8000 / 8000} / e - 1000 / 8000 dy = ∫0 exp[-y(1/ 8000 - t)] dy / 8000 =
{1/(1/8000 - t)} / 8000 = 1/(1 - 8000t), for t < 1/8000.

Comment: Due to the memoryless property of the Exponential, the non-zero payments are also an
Exponential distribution with mean 8000.]
Exercise: Prior to the effects of a deductible, the sizes of loss follow an Exponential distribution with
mean 8000. For a deductible of 1000, determine the moment generating function of the payment
per loss variable.
[Solution: MPerLoss(t) = F(d) + S(d) MPerPayment(t) =
1 - e-1000/8000 + e-1000/8000/(1 - 8000t), for t < 1/8000. ]
Exercise: Prior to the effects of a maximum covered loss, the sizes of loss follow an Exponential
distribution with mean 8000. For a maximum covered loss of 25,000, determine the moment
generating function for the size of payments by the insurer.
[Solution: For the data censored from above at 25,000, there is a density of:
e-x/8000/8000 for x < 25,000, and a point mass of probability of e-25000/8000 = e-3.125 at 25,000.
Therefore, M(t) = E[ext] =
25,000
∫0 {e- x / 8000 / 8000} ext dx + e-3.125e25000t =
x = 25,000
ext - x / 8000 / (1 - 8000t)] + e25000t-3.125 =
x =0
1 - 8000t e 25,000(t - 1/ 8000)

(1 - e25000t-3.125)/(1 - 8000t) + e25000t-3.125 = .]
1 - 8000t
Problems:
4.1 (1 point) Let f(2) = 0.3, f(5) = 0.6, f(11) = 0.1. Let M be the moment generating function of f.
What is M(0.4)?
A. Less than 11
E. At least 14
4.2 (2 points) An aggregate loss distribution is defined by

Prob(N = n) = 2n / (n! e2 ), n = 0, 1, 2,... and f(x) = 62.5x2 e-5x, x > 0.
What is the Moment Generating Function of the distribution of aggregate losses?
A. exp[2{1/(1 - 5t)3 - 1}]
B. exp[2{1/(1 - 5t) - 1}]
C. exp[2{1/(1 - t/5)3 - 1}]
D. exp[2{1/(1 - t/5) - 1}]
4.3 (2 points) Frequency is given by a Binomial Distribution with m =10 and q = 0.3.
The size of losses are either 100 or 250, with probability 80% and 20% respectively.
What is the Moment Generating Function for Aggregate Losses at 0.01?
A. Less than 1600
E. At least 1900
4.4 (2 points) Y follows a Gamma Distribution with α = 3 and θ = 100. Z = 40 + Y.

What is the Moment Generating Function of Z?
A. (1 - 140t)-3
B. (1 - 140e40t)-3
C. e40t (1 - 100t)-3
D. (1 - 100t e40t)-3

Assume the m.g.f. of a distribution is: M(t) = exp[10{1 - 1 - 0.6t }].
4.5 (1 point) What is the mean of this distribution?

A. 2.8 B. 2.9 C. 3.0 D. 3.1 E. 3.2
4.6 (2 points) What is the variance of this distribution?

A. Less than 0.7
E. At least 1.0
4.7 (3 points) What is the skewness of this distribution?

A. Less than 0.7
E. At least 1.0
4.8 (1 point) After uniform inflation of 20%, what is the Moment Generating Function?
A. exp[12{1 - 1 - 0.6t }]
B. exp[12{1 - 1 - 0.72t }]
C. exp[10{1 - 1 - 0.6t }]
D. exp[10{1 - 1 - 0.72t }]
4.9 (1 point) X and Y each follow this distribution. X and Y are independent.
Z = X + Y. What is the Moment Generating Function of Z?
A. exp[10{1 - 1 - 0.6t }]
B. exp[40{1 - 1 - 0.6t }]
C. exp[10{1 - 1 - 1.2t }]
D. exp[40{1 - 1 - 1.2t }]
4.10 (3 points) The distribution of aggregate losses is compound Poisson with λ = 5.

The Moment Generating Function of Aggregate Losses is: M(t) = exp[5/(1-7t)3 - 5].
What is the second moment of the severity distribution?
A. Less than 550
E. At least 700
4.11 (2 points) What is the integral:

∞
∫0 x - 3 / 2 exp[-(a2x + b2 / x)] dx ?
Hint: make use of the fact that the density of an Inverse Gaussian Distribution integrates to unity from
zero to infinity.
A. π e-ba / b
B. π e-2ba / b
C. π ae-ba / b
D. π ae-2ba / b
4.12 (2 points) Calculate the Moment Generating Function for an Inverse Gaussian Distribution with
parameters µ and θ. Hint: Use the result of the previous problem.
A. exp[(θ/µ) {1 - 1 - tµ / θ }], t ≤ θ/µ.
B. exp[(θ/µ) {1 - 1 - 2tµ / θ }], t ≤ θ/ 2µ.
C. exp[(θ/µ) {1 - 1 - tµ2 / θ }], t ≤ θ/µ2 .
D. exp[(θ/µ) {1 - 1 - 2tµ2 / θ }], t ≤ θ/ 2µ2 .

4.13 (2 points ) Frequency is given by a Negative Binomial Distribution with r = 3 and β = 1.2.
The size of losses are uniformly distributed on (8, 23).
What is the Moment Generating Function for Aggregate Losses?
A. t3 / {1.2t - 0.08(e23t - e8t)}3
B. t3 / {2.2t - 0.08(e23t - e8t)}3
C. t3 / {1.2t - (e23t - e8t)/75}3
D. t3 / {2.2t - (e23t - e8t)/75}3
4.14 (2 points) The probability of snowfall in any day in January is 20%. If it snows during a day, the
amount of snowfall in inches that day is Gamma distributed with α = 3 and θ = 1.7.
Each day is independent of the others.
What is the Moment Generating Function for the amount of snow during January?
A. 0.2 + 0.8(1 - 1.7t)-93
B. {0.2 + 0.8(1 - 1.7t)-3}31
C. 0.8 + 0.2(1 - 1.7t)-93
D. {0.8 + 0.2(1 - 1.7t)-3}31
4.15 (3 points) The number of people in a group arriving at

The Restaurant at the End of the Universe is Logarithmic with β = 3.
The number of groups arriving per hour is Poisson with mean 10.
Determine the distribution of the total number of people arriving in an hour.
4.16 (2 points) The characteristic function is defined as E[eitX].

What is the characteristic function of a Normal Distribution with mean µ and standard deviation σ?
4.17 (3 points) A collective risk model has the following properties:

14 - 12z + 6z2 - z3
(i) The frequency distribution has probability generating function: P(z) = .
7 (2 - z)3
3 - 20t + 100t2
(ii) The severity distribution has moment generating function: M(t) = .
3 (1 - 10t)2
Calculate the probability of zero aggregate losses in a year.

(A) 29% (B) 31% (C) 33% (D) 35% (E) 37%
4.18 (CAS Part 2 Exam, 1965, Q. 37) (1.5 points) The random variable x has a distribution
specified by the Moment Generating Function M(t) = et (1 - t)-2
What is the Moment Generating Function of the random variable y, where y = 3x + 2?
b e-bX for x > 0, where b > 0.
If M(t) is the moment-generating function of X, then M(-6b) is which of the following?
A. 1/7 B. 1/5 C. 1/(7b) D. 1/(5b) E. +∞
4.20 (2, 5/83, Q.37) (1.5 points) Let X have the probability density function
f(x) = (8/9)x/9, for x = 0, 1, 2, . . . What is the moment-generating function of X?
A. 1/(9 - 8et) B. 9/(9 - 8et) C. 1/(8et) D. 9/(8et) E. 9 - 8et
4.21 (2, 5/85, Q.13) (1.5 points) Let the random variable X have moment-generating function
1
M(t) = , for t < 1. Find E(X3 ).
(1 - t) 2
A. -24 B. 0 C. 1/4 D. 24 E. Cannot be determined from the information given.
4.22 (4B, 5/85, Q.47) (3 points) Given that M'(t) = r M(t) / [1- (1 - p)et], where M(t) represents the
moment generating function of a distribution. Which of the following represents, respectively, the
mean and variance of this distribution?
A. mean = r/p variance = r/p2
B. mean = r(1-p)/p variance = r/p2
C. mean = r/p variance = r(1-p)/p2
D. mean = r(1-p)/p variance = r(1-p)/p2
4.23 (2, 5/88, Q.25) (1.5 points) Let the random variable X have moment generating function
M(t) = exp[3t + t2 ]. What is E(X2 )?
A. 1 B. 2 C. 3 D. 9 E. 11
4.24 (2, 5/90, Q.12 and Course 1 Sample Exam, Q.26) (1.7 points)
⎛ 2 + et ⎞ 9
Let X be a random variable with moment-generating function M(t) = for -∞ < t < ∞.
⎝ 3 ⎠
What is the variance of X?
A. 2 B. 3 C. 8 D. 9 E. 11
4.25 (5A, 5/95, Q.21) (1 point) Which of the following are true for the moment generating function
M S(t) for the aggregate claims distribution S = X1 + X2 +...+ XN?
1. If the Xiʼs are independent and identically distributed with m.g.f. MX(t), and the
number of claims, N, is fixed, then MS(t) = MX(t)N.
2. If the Xiʼs are independent and identically distributed with m.g.f. MX(t), and the
number of claims, N has m.g.f. MN(t), then MS(t) = MN[exp(MX(t))].
3. If the Xiʼs are independent and identically distributed with m.g.f. MX(t), and N is
Poisson distributed, then MS(t) = exp[λ(MX(t) - 1)].
A. 1 B. 3 C. 1, 2 D. 1, 3 E. 2, 3
4.26 (2, 2/96, Q.30) (1.7 points) Let X and Y be two independent random variables with moment
generating functions MX(t) = exp[t2 + 2t], MY(t) = exp[3t2 + t].
Determine the moment generating function of X + 2Y.
A. exp[t2 + 2t] + 2exp[3t2 + t] B. exp[t2 + 2t] + exp[12t2 + 2t]
C. exp[7t2 + 4t] D. 2exp[4t2 + 3t]
E. exp[13t2 + 4t]
4.27 (5A, 5/96, Q.22) (1 point) Let M(t) denote the moment generating function of a claim amount
distribution. The number of claims distribution is Poisson with a moment generating function
exp[λ(exp(t)-1)].
What is the moment generating function of the compound Poisson Distribution?
A. λ(M(t) -1) B. exp[λexp[M(t) -1]] C. λexp[M(t) -1]
D. exp[λ[M(t) -1]] E. None of A, B, C, D
4.28 (Course 151 Sample Exam #1, Q.9) (1.7 points) For S = X1 + X2 + ... + XN:
(i) X1 , X2 ... each has an exponential distribution with mean 1/β.
(ii) the random variables N, X1 , X2 ,... are mutually independent.
(iii) N has a Poisson distribution with mean 1.0.
(iv) MS(1.0) = 3.0.
Determine β.
(A) 1.9 (B) 2.0 (C) 2.1 (D) 2.2 (E) 2.3

Aggregate claims has a compound Poisson distribution with λ = ln(4) and individual claim amounts
2- x
probability function given by: f(x) = , x = 1, 2, 3,....
x ln(2)
Which of the following is true about the distribution of aggregate claims?
(A) Binomial with q = 1/2.
(B) Binomial with q = 1/4.
(C) Negative Binomial with r = 2 and β = 1.
(D) Negative Binomial with r = 4 and β = 1.
(E) Negative Binomial with r = 2 and β = 3.
4.30 (1, 5/00, Q.35) (1.9 points) A company insures homes in three cities, J, K, and L.
Since sufficient distance separates the cities, it is reasonable to assume that the losses occurring in
these cities are independent.
The moment generating functions for the loss distributions of the cities are:
M J(t) = (1 - 2t)-3. M K(t) = (1 - 2t)-2.5. M L (t) = (1 - 2t)-4.5.
Let X represent the combined losses from the three cities. Calculate E(X3 ).
(A) 1,320 (B) 2,082 (C) 5,760 (D) 8,000 (E) 10,560
4.31 (IOA 101, 9/00, Q.9) (3.75 points) The size of a claim, X, which arises under a certain type
of insurance contract, is to be modeled using a gamma random variable with parameters α
and θ (both > 0) such that the moment generating function of X is given by
M(t) = (1 - θt)−α, t < 1/θ.

By using the cumulant generating function of X, or otherwise, show that the coefficient of
skewness of the distribution of X is given by 2/ α .
4.32 (1, 11/00, Q.11) (1.9 points) An actuary determines that the claim size for a certain class of
accidents is a random variable, X, with moment generating function
1
M X(t) = .
(1 - 2500 t)4
Determine the standard deviation of the claim size for this class of accidents.
(A) 1,340 (B) 5,000 (C) 8,660 (D) 10,000 (E) 11,180
4.33 (1, 11/00, Q.27) (1.9 points) Let X1 , X2 , X3 be a random sample from a discrete distribution
with probability function p(0) = 1/3 and p(1) = 2/3.
Determine the moment generating function, M(t), of Y = X1 X2 X3 .
(A) 19/27 + 8et/27
(B) 1 + 2et
(C) (1/3 + 2et/3)3
(D) 1/27 + 8e3t/27
(E) 1/3 + 2e3t/3
4.34 (IOA 101, 4/01, Q.5) (2.25 points) Show that the probability generating function for a
Binomial Distribution with parameters m and q is P(z) = (1 - q + qz)m.
Deduce the moment generating function.
4.35 (IOA 101, 4/01, Q.6) (2.25 points) Let X have a normal distribution with mean µ and
standard deviation σ, and let the ith cumulant of the distribution of X be denoted κi.
Given that the moment generating function of X is M(t) = exp[µt + σ2t2 /2],
determine the values of κ2, κ3, and κ4.
4.36 (IOA 101, 4/01, Q.7) (1.5 points) The number of policies (N) in a portfolio at any one time is
modeled as a Poisson random variable with mean 10.
The number of claims (Xi) arising on a policy is also modeled as a Poisson random variable with
mean 2, independently for each policy and independent of N.
N
Determine the moment generating function for the total number of claims, ∑ Xi ,
i=1
arising for the portfolio of policies.
4.37 (1, 5/03, Q.39) (2.5 points) X and Y are independent random variables with common
moment generating function M(t) = exp[t2 /2]. Let W = X + Y and Z = Y - X.
Determine the joint moment generating function, M(t1 , t2 ), of W and Z.
(A) exp[2t1 2 + 2t2 2 ] (B) exp[(t1 - t2 )2 ] (C) exp[(t1 + t2 )2 ]
(D) exp[2t1 t2 ] (E) exp[t1 2 + t2 2 ]

4.1. D. M(t) = E[ext] = 0.3e2t + 0.6e5t + 0.1e11t.

M(.4) = 0.3e.8 + 0.6e2 + 0.1e4.4 = 0.6676 + 4.4334 + 8.1451 = 13.25.
4.2. C. Frequency is Poisson, with λ = 2 and p.g.f.: P(z) = exp(2(z-1)).

Severity is Gamma with α = 3 and θ = 1/5 and m.g.f.: 1/(1 - t/5)3 .
M A(t) = PN(MX(t)) = exp[2{1/(1-t/5)3 - 1}].
4.3. A. The p.g.f. of the Binomial Frequency is: P(z) = (1+.3(z-1))10. The m.g.f. for severity is:
M(t) = 0.8 e100t + 0.2e250t. MA(t) = PN(MX(t)) = {1 + (0.3)(0.8 e100t + 0.2e250t - 1)}10.
M A(0.01) = {1 + (0.3)( 0.8 e1 + 0.2e2.5 - 1)}10 = 1540.
4.4. C. The m.g.f. of Y is MY(t) = (1-100t)-3. E[ezt] = E[e(y+40)t] = E[eyt e40t] = e40t E[eyt].
Therefore, the m.g.f of Z is: MZ(t) = e40t MY(t) = e4 0 t (1-100t)- 3.
4.5. C. M(t) = exp[10{1 - 1- 0.6t }]. Mʼ(t) = M(t) 3 / 1- 0.6t . mean = Mʼ(0) = 3.
4.6. D. M(t) = exp[10{1 - 1- 0.6t }]. Mʼ(t) = M(t) 3 / 1- 0.6t . Mean = Mʼ(0) = 3.
Mʼʼ(t) = Mʼ(t) 3 / 1- 0.6t + M(t) 0.9 / (1 - 0.6t)1.5.
Second moment = Mʼʼ(0) = (3)(3) + 0.9 = 9.9. Variance = 9.9 - 32 = 0.9.
4.7. D. Mʼʼ(t) = Mʼ(t) 3 / 1- 0.6t + M(t) 0.9 / (1- 0.6t)1.5.

M'''(t) = M''(t)3/ 1- 0.6t + 2M'(t)(.9)(1 - 0.6t)-3/2 + M(t)(.81)(1 - 0.6t)-5/2. Mʼʼ(0) = 9.9.
M'''(0) = (9.9)(3) + (2)(3)(0.9) + (1)(0.81) = 35.91 = third moment. Third central moment =
35.91 - (3)(9.9)(3) + 2(33 ) = 0.81. Thus the skewness = 0.81 /(0.91.5) = 0.949.
Comment: This is an Inverse Gaussian Distribution with µ = 3 and θ = 30, with mean 3,
variance: 33 / 30 = 0.9, and coefficient of variation: 0.9 / 3 = 0.3162.
The skewness of an Inverse Gaussian is three times the CV:
(3)(0.3162) = 0.949 = 3 3 / 30 = 3 µ / θ .
4.8. D. In general McX(t) = E[etcx] = MX(ct). In this case we have multiplied x by 1.20, so the new
m.g.f is exp[10{1 - 1 - 0.6(1.2t)}] = exp[10{1 - (1 - 0.72t) }].

Alternately, prior to inflation this is an Inverse Gaussian Distribution with µ = 3 and θ = 30.
Under uniform inflation, both parameters are multiplied by the inflation factor, so after inflation we
have µ = 3.6 and θ = 36. The m.g.f. of an Inverse Gaussian is:
exp[(θ/µ){1 - 1 - 2µ2 t / θ )}]. After inflation this is: exp[10{1 - (1 - 0.72t) }].
4.9. E. M X+Y(t) = MX(t) MY(t) = exp[10{1 - 1 - 0.6t }] exp[10{1 - 1 - 0.6t }] =
exp[20{1 - 1 - 0.6t }].

Comment: This is the moment generating function of another Inverse Gaussian, but with µ = 6 and
θ = 120, rather than µ = 3 and θ = 30. In general, the sum of two independent identically distributed
Inverse Gaussian Distributions with parameters µ and θ, is another Inverse Gaussian Distribution,
but with parameters 2µ and 4θ.
4.10. B. For a Compound Poisson, MA(t) = exp[λ(MX(t)-1)]. Thus MX(t) = 1/(1-7t)3 .
This is the moment Generating Function of a Gamma Distribution, with parameters α = 3 and θ = 7.
Thus it has a mean of: (3)(7) = 21, a variance of: 3(72 ) = 147,
and second moment of: 147 + 212 = 588.
Alternately, MXʼ(t) = 21/(1-7t)4 . MXʼʼ(t) = 588/(1-7t)4 .
second moment of the severity = MXʼʼ(0) = 588.
Alternately, Mʼ(t) = 105M(t)(1-7t)-4. Mʼ(0) = 105 = mean of aggregate losses.
Mʼʼ(t) = 2940M(t)(1-7t)-5 + 105Mʼ(t)(1-7t)-4.
Mʼʼ(0) = 2940 + (105)(105) = 13965 = 2nd moment of the aggregate losses.
Variance of the aggregate losses = 13965 - 1052 = 2940.
For a compound Poisson, variance of the aggregate losses = λ (2nd moment of severity).
Therefore, 2nd moment of severity = 2940/5 = 588.
Comment: One could use the Cumulant Generating Function, which is defined as the natural log of
the Moment Generating Function. ψ(t) = ln M(t) = ln[exp(5/(1-7t)3 - 5))] =
5/(1-7t)3 - 5. ψʼ(t) = 105/(1-7t)4 . ψʼʼ(t) = 2940/(1-7t)5 . Variance = ψʼʼ(0) = 2940.
4.11. B. The density of an Inverse Gaussian with parameters µ and θ is:

θ -1.5 θ -1.5
f(x) = x exp(-θ(x/µ -1)2 / (2x)) = x exp(-θx/ 2µ2 + θ/µ - θ/ 2x).
2π 2π
Let a2 = θ/ 2µ2 and b2 = θ/ 2, then θ = 2b2 and µ = b/a.
b2 -1.5 b -1.5
Then f(x) = x exp(-a2 + 2ba - b2 /x) = e2ba x exp(-a2 - b2 /x).
π π
∞
∫0 x - 1.5 exp(-a2 - b2 / x) dx = 1. ⇒
b
Since this integrates to unity: e2ba
π
∫0 x - 1.5 exp(-a2 - b2 / x) dx = π e- 2 b a / b.
Comment: This is a special case of a Modified Bessel Function of the Third Kind, K-0.5.
See for example, Appendix C of Insurance Risk Models by Panjer & Willmot.
4.12. D. The Moment Generating Function is the expected value of etx.

∞ ∞
θ - 1.5
M(t) = ∫0 etx f(x) dx = ∫0 etx 2π
x exp[-θx / (2µ2) + θ / µ - θ / (2x)] dx =
∞
θ
eθ/µ
2π ∫0 x - 1.5 exp[-{θ / (2µ2) - t}x - θ / (2x)] dx.
Provided t ≤ θ/ 2µ2 , let a2 = θ/ 2µ2 - t, and b2 = θ/ 2.
Then the integral is of the type in the previous problem and has a value of: π e-2ba / b.
θ
Therefore, M(t) = eθ/µ { π e-2ba / b} =
2π
eθ/µ exp[-(θ/µ) 1 - 2tµ2 / θ ] = exp[(θ/µ) {1 - 1 - 2tµ 2 / θ }].
We required that t ≤ θ/ (2µ2 ), so that a2 ≥ 0; M(t) only exists for t ≤ θ / (2µ2 ).

θ
Comment: Note that ba = b2 a2 = (θ / 2) ( - t) = {θ/ (2µ)} 1 - 2tµ2 / θ .
2µ2
4.13. B. The p.g.f. of the Negative Binomial Frequency is:

P(z) = (1 - 1.2(z-1))-3 = (2.2 - 1.2z)-3.
The m.g.f. for the uniform severity is: M(t) = (exp(23t) - exp(8t))/ (15t).
M A (t) = PN(MX(t)) = {2.2 - 1.2((exp(23t) - exp(8t))/(15t) }-3 =
{2.2 - 0.08((exp(23t) - exp(8t))/t)}-3 = t3 /{2.2t - 0.08(e2 3 t - e8 t) }3 .
4.14. D. The frequency for a single day is Bernoulli, with q = 0.2 and p.g.f. P(z) = 0.8 + 0.2z.
The Gamma severity has m.g.f. M(t) = (1- 1.7t)-3. Thus the m.g.f of the aggregate losses for one
day is P(M(t)) = 0.8 +.2(1- 1.7t)-3. The m.g.f for 31 independent days is the 31st power of that for a
single day, {0.8 + 0.2(1 - 1.7t)- 3}3 1.
Alternately, the frequency for 31 days is Binomial with m = 31, q = 0.2, and p.g.f.
P(z) = (0.8 + 0.2z)31. Thus the m.g.f. for the aggregate losses is P(M(t)) = (.8 +.2(1- 1.7t)-3)31.
4.15. This is a compound distribution with primary distribution a Poisson and secondary distribution
a Logarithmic. Alternately, it is a Compound Poisson with severity Logarithmic.
ln[1 - β (z -1)]
PLogarithmic(z) = 1 - .
ln[1 + β]
PPoisson(z) = exp[λ(z-1)].
ln[1 - β (z -1)] ln[1 - β (z -1)]
PAggregate(z) = PPoisson[PLogarithmic(z)] = PPoisson[1 - ] = exp[-λ ]
ln[1 + β] ln[1 + β]
= exp[ln[1 - β(z-1)]]-λ /ln(1+β) = {1 - β(z-1)}-λ /ln(1+β).

PNegativeBinomial(z) = {1 - β(z - 1)}-r.
Thus the aggregate distribution is Negative Binomial, with r = λ/ln(1+β) = 10/ln(4) = 7.213.
The aggregate number of people is Negative Binomial with r = 7.213 and β = 3.
Comment: Beyond what you should be asked on your exam.
See Example 7.5 in Loss Models, not on the syllabus,
or Section 6.8 in Insurance Risk Models by Panjer and Willmot, not on the syllabus.
The mean of the Logarithmic Distribution is 3/ln(4).
(Mean number of groups) (Average Size of Groups) = (10){3/ln(4)} = 30/ln(4) = {10/ln(4)}(3) =
Mean of the Negative Binomial.
The variance of the Logarithmic Distribution is: 3{4 - 3/ln(4)} / ln(4).
(Mean number of groups) (Variance of Size of Groups)
+ (Average Size of Groups)2 (Variance of Number of Groups) =
(10)3{4 - 3/ln(4)}/ln(4) + {3/ln(4)}2 (10) = 120/ln(4) - 90/{ln(4)}2 + 90/{ln(4)}2 = 120/ln(4) =
{10/ln(4)}(3)(4) = Variance of the Negative Binomial.
4.16. tX is Normal, with tµ and tσ. If tX is Normal, the exp[tX] is LogNormal.
Thus E[etX] is the mean of LogNormal with tµ and tσ: exp[tµ + (tσ)2 /2] = exp[tµ + t2 σ2/2].
This is the moment generating function. The characteristic function is:
E[eitX] = exp[tµ + (it)2 σ2/2] = exp[tµ - t2 σ2/2].
Comment: See Definition 7.7 in Loss Models, not on the syllabus.
4.17. C. MAgg(t) = Pfreq[Msev(t)]. Therefore, PAgg(z) = Pfreq[Msev(lnz)].

Density at 0 for the aggregate is: PAgg(0).
Now as z approaches zero, ln[z] approaches -∞.
100t 2
As ln[z] approaches -∞, Msev(ln[z]) approaches: = 1/3.
3 (10t)2
14 - 12 / 3 + 6 / 9 - 1/ 27
Thus PAgg(z) approaches Pfreq[1/3] = = 0.328.
7 (2 - 1/ 3)3
1
Alternately, we can rewrite the moment generating function of severity as: 1/3 + (2/3) .
(1 - 10t)2
Thus the severity is a one thirds / two thirds weighting of a pointmass of probability at 0 and a
Gamma Distribution with α = 2 and θ = 100.
The probability that the severity distribution is zero is 1/3.
In general, Prob[Agg = 0] = Pfreq[Prob[ severity = 0]] = Pfreq[1/3] = 0.328.
Comment: In general, M[ln(t)] = E[exp[x ln(t)]] = E[tx] = P[t].
The fact that Prob[Agg = 0] = Pfreq[Prob[ severity = 0]] is the first step in the “Recursive Method”,
discussed in my subsequent section on the Panjer Algorithm.
4.18. MY(t) = E[exp[yt]] = E[exp[3xt + 2t]] = e2t E[exp[x 3t]] = e2t MX(3t)
= e2t e3t (1 - 3t)-2 = e5 t (1 - 3t)- 2.
4.19. A. An Exponential with θ = 1/b. M(t) = 1/(1 - θt) = 1/(1 - t/b). M(-6b) = 1/(1 + 6) = 1/7.
4.20. A. A Geometric Distribution with β = 8. P(z) = 1/{1 - β(z-1)} = 1/(9 - 8z).

M(t) = P(et) = 1/(9 - 8et ).
4.21. D. Mʼʼʼ(t) = 24/(1 - t)5 . E[X3 ] = Mʼʼʼ(0) = 24.

Alternately, this is a Gamma Distribution, with α = 2, θ = 1, and E[X3 ] = θ3α(α+1)(α+2) = 24.
4.22. C. The moment generating function is always unity at zero; M(t) = E[etX],
M(0) = E[1] = 1. The mean is Mʼ(0) = M(0)r/p = r/p. Mʼʼ(t) = d (r M(t) / [1-(1-p)et] / dt =
r{Mʼ(t)(1-(1-p)et) + M(t)(1-p)et} / {1-(1-p)et}2 .
The second moment is Mʼʼ(0) = r{pMʼ(0) + M(0)(1-p)} / p2 = r{r +(1-p)} / p2 .
Therefore, the variance = r{r +(1-p)} / p2 - (r/p)2 = r(1-p)/p2 .
Comment: This is the Moment Generating Function of the number of Bernoulli trials, each with
chance of success p, it takes until r successes. The derivation of the Negative Binomial Distribution
involves the number of failures rather than the number of trials. See “Mahlerʼs Guide to Frequency
Distributions.” Thus the variable here is:
r + (a Negative Binomial with parameters r and β = (1-p)/p).
This variable has mean: r + (r)(1-p)/p = r/p and variance: rβ(1+β) = r (1-p)/p2 .
Note that the m.g.f of this variable is:
M(t) = ert (m.g.f of a Negative Binomial) = ert (1 - β(et -1))-r = ert p r(1 - (1-p)(et))-r.
Mʼ(t) = rert p r(1 - (1-p)(et))-r + r(1-p)(et)ert p r(1 - (1-p)(et))-(r+1) =
r M(t) + r(1-p)et M(t) /[1- (1-p)et] = rM(t){ 1- (1-p)et + (1-p)et}/[1- (1-p)et] = rM(t)/[1- (1-p)et].
4.23. E. Mʼ(t) = (3 + 2t)M(t). Mʼʼ(t) = 2M(t) + (3 + 2t)2 M(t). E[X2 ] = Mʼʼ(0) = 2 + 9 = 11.
Comment: A Normal Distribution with mean 3 and variance 2. E[X2 ] = 32 + 2 = 11.
4.24. A. Mʼ(t) = 9et(2 + et)8 /39 . E[X] = Mʼ(0) = 3.

Mʼʼ(t) = 72et(2 + et)7 /39 + 9et(2 + et)8 /39 . E[X2 ] = Mʼʼ(0) = 8 + 3 = 11. Variance = 11 - 32 = 2.
Alternately, lnM(t) = 9 ln(2 + et) - 9 ln(3). d(lnM(t))/dt = 9et/(2 + et).
d2 (lnM(t))/dt2 = 9et/(2 + et) - 9e2t/(2 + et)2 . Var[X] = d2 (ln MX(t)) / dt2 | t =0 = 3 - 1 = 2.
Alternately, M(t) = {(2 + et)/3}9 ⇒ P(z) = {(2 + z)/3}9 = (1 + (z - 1)/3)9 .

This is the probability generating function of a Binomial Distribution with q = 1/3 and m = 9.
Variance = mq(1-q) = (9)(1/3)(1 - 1/3) = 2.
4.25. D. 1. True. 2. False. MS(t) = PN(MX(t)) = MN(ln(MX(t))).

3. True. For a Poisson distribution, PN(z) = exp(λ(z - 1)).
For a Compound Poisson, MS(t) = PN(MX(t)) = exp(λ(MX(t)-1)).

4.26. E. M 2Y(t) = MY(2t) = exp[12t2 + 2t]. MX+2Y(t) = MX(t)M2Y(t) = exp[13t2 + 4t].

Comment: X is Normal with mean 2 and variance 2.
Y is Normal with mean 1 and variance 6. 2Y is Normal with mean 2 and variance 24.
X + 2Y is Normal with mean 4 and variance 26. For a Normal, M(t) = exp[µt + σ2t2 /2].
4.27. D. For a Compound Poisson, MA(t) = MN(ln(MX(t))) = exp(λ(MX(t)-1)).
4.28. A. The moment generating function of aggregate losses can be written in terms of those of
the frequency and severity: MN(ln(MX(t)) = PN(MX(t)). For a Poisson Distribution, the probability
generating function is eλ(z-1). For an Exponential Distribution with mean 1/β, the moment generating
function is 1/(1- t/β) = β/(β-t). Therefore, the moment generating function of the aggregate losses is:
exp[λ(MX(t)-1)] = exp[λ(β/(β-t)-1)] = exp[λt/(β-t)]. In this case λ = 1, so MS(t) = exp[t/(β-t)].
We are given MS(1) = 3, so that 3 = exp[1/(β-1)]. Therefore, β = 1+ 1/ln(3) = 1.91.

Comment: In this question, S is used for Aggregate Losses, as is done in Loss Models.
4.29. C. The probability generating function of the aggregate distribution can be written in terms of
the p.g.f. of the frequency and severity: Paggregate(z) = Pfrequency(Pseverity(z)).
The frequency is Poisson, with p.g.f. P(z) = exp[λ(z-1)] = exp[ln(4)(z-1)].
The severity has p.g.f. of P(z) = E[zx] = (1/ln(2)){z/2 + (z/2)2 /2 + (z/2)3 /3 + (z/2)4 /4 + ...} =
(1/ln(2))(-ln(1 - z/2)) = (1/ln(2))ln(2/(2 - z)) = (1/ln(2))(ln(2) - ln(2 - z)) = 1 - ln(2 - z)/ln(2).
Paggregate(z) = exp[ln(4)(Pseverity(z) - 1)] = exp[ln(4){- ln(2 - z)/ln(2)}] = exp[-2 ln(2-z)] = (2-z)-2.
The p.g.f. of a Negative Binomial is [1 - β(z-1)]-r. Comparing probability generating functions, the
aggregate losses are a Negative Binomial with r = 2 and β = 1.
Comments: The severity (or secondary distribution in the compound frequency distribution) is a
Logarithmic distribution as per Appendix B of Loss Models, with β = 1.
Thus it has p.g.f. of P(z) = 1 - ln[1- β(z-1)]/ln(1+β) = 1 - ln(2-z)/ln(2).
This is a compound Poisson-Logarithmic distribution.
In general, a Compound Poisson-Logarithmic distribution with a parameters λ and β, is a Negative
Binomial distribution with parameters r = λ/ln(1+β) and β.
In this case r = ln(4)/ln(1+1) = 2ln(2)/ ln(2) = 2.
ln(1 - y) = -y - y2 /2 - y3 /3 - y4 /4 - ..., for |y| < 1, follows from taking a Taylor Series.
4.30. E. MX(t) = MJ(t) MK(t) ML (t) = (1 - 2t)-10.
M Xʼ(t) = 20 (1 - 2t)-11. MXʼʼ(t) = 440 (1 - 2t)-12. MXʼʼʼ(t) = 10560 (1 - 2t)-13.
E[X3 ] = MXʼʼʼ(0) = 10,560.
Alternately, the three distributions are each Gamma with θ = 2, and α = 3, 2.5, and 4.5.
Therefore, their sum is Gamma with θ = 2, and α = 3 + 2.5 + 4.5 = 10.
E[X3 ] = θ3 α(α+1)(α+2) = (8)(10)(11)(12) = 10,560.
4.31. ψ(t) = ln M(t) = -α ln (1 - θt). ψʼ(t) = αθ / (1 - θt). ψʼʼ(t) = αθ2 / (1 - θt)2 .

ψʼʼʼ(t) = 2αθ3 / (1 - θt)3 . Var[X] = ψʼʼ(0) = αθ2 .
Third Central Moment of X = ψʼʼʼ(0) = 2αθ3 . Skewness = 2αθ3 /(αθ2 )1.5 = 2/ α .
Comment: One could instead get the moments from the Appendix attached to the exam or use the
Moment Generating Function to get the moments. Then the Third Central Moment is:
E[X3 ] - 3 E[X] E[X2 ] + 2E[X]3 .
4.32. B. MX(t) = (1 - 2500t)-4. MXʼ(t) = 10,000 (1 - 2500t)-5. E[X] = MXʼ(0) = 10,000.
M Xʼʼ(t) = 125 million (1 - 2500t)-6. E[X2 ] = MXʼʼ(0) = 125 million.
Standard deviation = 125 million - 100002 = 5000.

Alternately, ln MX(t) = -4ln(1 - 2500t). d ln MX(t)/ dt = 10000/(1 - 2500t).
d2 ln MX(t)/ dt2 = 25,000,000/(1- 2500t)2 .
Var[X] = d2 ln MX(0)/ dt2 = 25,000,000. Standard deviation = 5000.
Alternately, the distribution is Gamma with θ = 2500 and α = 4.
Variance = αθ2 = (4)(25002 ). Standard deviation = (2)(2500) = 5000.
4.33. A. Prob[Y = 1] = Prob[X1 = X2 = X3 = 1] = (2/3)3 = 8/27. Prob[Y = 0] = 1 - 8/27 = 19/27.
M Y(t) = E[eyt] = Prob[Y = 0] e0t + Prob[Y = 1] e1t = 19/27 + 8et /27.
4.34. P(z) = E[zn ] = Σ f(n) zn . For a Bernoulli, P(z) = f(0)z0 + f(1)z1 = 1 - q + qz.
The Binomial is the sum of m independent, identically distributed Bernoullis, and therefore has
P(z) = (1 - q + qz)m.
M(t) = P(et) = (1 - q + qet)m.
4.35. The cumulant generating function is: ψ(t) = ln M(t) = µt + σ2t2 /2.
ψʼ(t) = µ + σ2t. κ1 = ψʼ(0) = µ = mean.
ψʼʼ(t) = σ2. κ2 = ψʼʼ(0) = σ2 = variance.
ψʼʼʼ(t) = 0. κ3 = ψʼʼʼ(0) = 0 = third central moment.
ψʼʼʼʼ(t) = 0. κ4 = ψʼʼʼʼ(0) = 0.
Comment: κi is the coefficient of ti/i! in the cumulant generating function.
4.36. For the Poisson Distribution, P(z) = exp[λ(z - 1)]. ⇒ M(t) = exp[λ(et - 1)].
Pprimary(z) = exp[10(z - 1)]. Msecondary(t) = exp[2(et - 1)].
M compound(t) = Pprimary[Msecondary(t)] = exp[10 {exp[2(et - 1)] - 1}].
Alternately, Pcompound(z) = Pprimary[Psecondary(z)] = exp[10 {exp[2(z - 1)] - 1}].
Then let z = et, in order to get the moment generating function of the compound distribution.
4.37. E. The joint moment generating function of W and Z:

M(t1 , t2 ) ≡ E[exp[t1 w + t2 z]] = E[exp[t1 (x+y) + t2 (y-x)]] = E[exp[y(t1 + t2 ) + x(t1 - t2 )]] =
E[exp[y(t1 + t2 )]] E[exp[x(t1 - t2 )]] = MY(t1 + t2 ) MX(t1 - t2 ) = exp[(t1 + t2 )2 /2] exp[(t1 - t2 )2 /2] =
exp[t1 2 /2 + t2 2 /2 + t1 t2 ] exp[t1 2 /2 + t2 2 /2 - t1 t2 ] = exp[t1 2 + t2 2 ].

X and Y are two independent unit Normals, each with mean 0 and standard deviation 1.
E[W] = 0. Var[W] = Var[X] + Var[Y] = 1 + 1 = 2. E[Z] = 0. Var[Z] = Var[X] + Var[Y] = 1 + 1 = 2.
Cov[W, Z] = Cov[X + Y, Y - X] = -Var[X] + Var[Y] + Cov[X, Y] - Cov[X, Y] = -1 + 1 = 0.
Corr[W, Z] = 0. W and Z are bivariate Normal, with µW = 0, σW2 = 2, µZ = 0, σZ2 = 2, ρ = 0.
For a bivariate Normal, M(t1 , t2 ) = exp[µ1t1 + µ2t2 + σ12t1 2 /2 + σ22t2 2 /2 + ρσ1σ2t1 t2 ].

See for example, Introduction to Probability Models, by Ross.
2016-C-3, Aggregate Distributions §5 Moments, HCM 10/21/15, Page 110
Section 5, Moments of Aggregate Losses
number of exposures, whatever they are for the particular situation, then
Losses = (Frequency) (Severity).
Thus Mean Aggregate Loss = (Mean Frequency) (Mean Severity).
Exercise: The number of claims is given by a Negative Binomial Distribution with

r = 4.3 and β = 3.1. The size of claims is given by a Pareto Distribution with α = 1.7 and
θ = 1400. What is the expected aggregate loss?
[Solution: The mean frequency is rβ = (4.3)(3.1) = 13.33. The mean severity is θ/(α-1) =
1400 / 0.7 = 2000. The expected aggregate loss is: (13.33)(2000) = 26,660.]
Since they depend on both the number of claims and the size of claims, aggregate losses have
more reasons to vary than do either frequency or severity individually. Random fluctuation occurs
when one rolls dice, spins spinners, picks balls from urns, etc. The observed result varies from time
period to time period due to random chance. This is also true for the aggregate losses observed for
a collection of insureds. The variance of the observed aggregate losses that occurs due to random
fluctuation is referred to as the process variance. That is what will be discussed here.65
Independent Frequency and Severity:

• The number of claims for a single exposure period is given by a Binomial Distribution
with q = 0.3 and m = 2.
• The size of the claim will be either 50 with probability 80%, or 100 with probability 20%.
65
The process variance is distinguished from the variance of the hypothetical means as discussed in “Mahlerʼs Guide
to Buhlmann Credibility.”
Exercise: Determine the variance of the aggregate losses.

[Solution: List the possibilities and compute the first two moments:
Situation Probability Aggregate Loss Square of Aggregate Loss
0 claims 49.00% 0 0
1 claim @ 50 33.60% 50 2500
1 claim @ 100 8.40% 100 10000
2 claims @ 50 each 5.76% 100 10000
2 claims: 1 @ 50 & 1 @ 100 2.88% 150 22500
2 claims @ 100 each 0.36% 200 40000
Overall 100.0% 36 3048
For example, the probability of 2 claims is: 0.32 = 9%. We divide this 9% among the possible
claim sizes: 50 and 50 @ (0.8)(0.8) = 64%, 50 and 100 @ (0.8)(0.2) = 16%,
100 and 50 @ (0.2)(0.8) = 16%, 100 and 100 @ (0.2)(0.2) = 4%.
(9%)(64%) = 5.76%, (9%)(16% + 16%) = 2.88%, (9%)(4%) = 0.36%.
One takes the weighted average over all the possibilities. The average Pure Premium is 36.
The second moment of the Pure Premium is 3048.
Therefore, the variance of the pure premium is: 3048 - 362 = 1752.]
In this case, since frequency and severity are independent one can make use of the following
formula:66
(Process) Variance of Aggregate Loss =

(Mean Freq.) (Variance of Severity) + (Mean Severity)2 (Variance of Freq.)
σAgg2 = µFreq σSev2 + µSev2 σFreq2.
Memorize this formula for the variance of the aggregate losses when frequency and severity are
independent! Note that each of the two terms has a mean and a variance, one from frequency and
one from severity. Each term is in dollars squared; that is one way to remember that the mean
severity (which is in dollars) enters as a square while that for mean frequency (which is not in dollars)
does not.
In the above example, the mean frequency is mq = 0.6 and the variance of the frequency is:
mq(1 - q) = (2)(0.3)(0.7) = 0.42. The average severity is 60 and the variance of the severity is:
(0.8)(50 - 60)2 + (0.2)(100 - 60)2 = 400. Thus the process variance of the aggregate losses is:
(0.6)(400) + (602 )(0.42) = 1752, which matches the result calculated previously.
66
See equation 9.9 in Loss Models. Note Loss Models uses S for aggregate losses rather than A, and N for
frequency rather than F. I have used X for severity in order to follow Loss Models. This formula can also be used to
compute the process variance of the pure premium, when frequency and severity are independent.
One can rewrite the formula for the process variance of the aggregate losses in terms of
coefficients of variation by dividing both sides by the square of the mean aggregate loss:67
CVAgg2 = CVFreq2 + CVSev2 / µFreq.
In the example above, CVFreq2 = (2)(0.3)(0.7) / {(2)(0.3)}2 = 1.167, CVSev2 = 400 / 602 = 1.111,
and therefore CVAgg2 = 1.167 + 1.111/0.6 = 1.352 = 1752 / 362 .
Thus the square of the coefficient of variation of the aggregate losses is the sum of the CV2 for
frequency and the CV2 for severity divided by the mean frequency.
An Example of Dependent Frequency and Severity:
On both the exam and in practical applications, frequency and severity are usually independent.
However, here is an example in which frequency and severity are instead dependent.
Assume that you are given the following:

• The number of claims for a single exposure period will be either 0, 1, or 2:
0 60%
1 30%
2 10%
• If only one claim is incurred, the size of the claim will be 50, with probability 80%;
or 100, with probability 20%.
• If two claims are incurred, the size of each claim, independent of the other, will be 50,
with probability 50%; or 100, with probability 50%.
How would one determine the variance of the aggregate losses?
First list the aggregate losses and probability of each of the possible outcomes.
If there is no claim (60% chance) then the aggregate loss is zero.
If there is one claim, then the aggregate loss is either 50 with (30%)(80%) = 24% chance,
or 100 with (30%)(20%) = 6% chance.
If there are two claims then there are three possibilities. There is a (10%)(25%) = 2.5% chance that
there are two claims each of size 50 with an aggregate loss of 100. There is a (10%)(50%) = 5%
chance that there are two claims one of size 50 and one of size 100 with an aggregate loss of 150.
There is a (10%)(25%) = 2.5% chance that there are two claims each of size 100 with an aggregate
loss of 200.
Next, the first and second moments can be calculated by listing the aggregate losses for all the
possible outcomes and taking the weighted average using the probabilities as weights of either the
aggregate loss or its square.
67
The mean of the aggregate losses is the product of the mean frequency and the mean severity.

0 claims 60.0% 0 0
1 claim @ 50 24.0% 50 2500
1 claim @ 100 6.0% 100 10000
2 claims @ 50 each 2.5% 100 10000
2 claims: 1 @ 50 & 1 @ 100 5.0% 150 22500
2 claims @ 100 each 2.5% 200 40000
Overall 100.0% 33 3575
One takes the weighted average over all the possibilities. The average aggregate loss is 33.
The second moment of the aggregate losses is 3575.
Therefore, the variance of the aggregate losses is: 3575 - 332 = 2486.
Note that the frequency and severity are not independent in this case. Rather the severity
distribution depends on the number of claims. For example, the average severity if there is 1 claim
is 60, while the average severity if there are 2 claims is 75.
In general, one can calculate the variance of the aggregate losses in the above manner from the
second and first moments. The variance is: the second moment - (first moment)2 .
The first and second moments can be calculated by listing the aggregate losses for all the possible
outcomes and taking the weighted average, applying the probabilities as weights to either the
aggregate loss or its square. In continuous cases, this will involve taking integrals, rather than sums.
Policies of Different Types:
Let us assume we have a portfolio consisting of two types of policies:
Number Mean Aggregate Variance of Aggregate

Type of Policies Loss per Policy Loss per Policy
A 10 6 3
B 20 9 4
Assuming the results of each policy are independent, then the mean aggregate loss for the portfolio
is: (10)(6) + (20)(9) = 240.
The variance of aggregate loss for the portfolio is: (10)(3) + (20)(4) = 110.68
For independent policies, the means and variances of the aggregate losses for each policy add.
The sum of the aggregate losses from two independent policies has the sum of the means and
variances of the aggregate losses for each policy.
68
Since we are given the variance of aggregate losses, there is no need to compute the variance of aggregate
losses from the mean frequency, variance of frequency, mean severity, and variance of severity.
Exercise: Compare the coefficient of variation of aggregate losses in the above example to that if
one had instead 100 policies of Type A and 200 policies of type B.
[Solution: For the original example, CV = 110 / 240 = 0.043.
For the new example, CV = 1100 / 2400 = 0.0138.
Comment: Note that as we have more policies, all other things being equal, the coefficient of
variation goes down.]
Exercise: For each of the two cases in the previous exercise, using the Normal Approximation
estimate the probability that the aggregate losses will be at least 5% more than their mean.
[Solution: For the original example, Prob[Agg. > 252] ≅ 1 - Φ[(252 - 240)/ 110 ] = 1 - Φ[1.144] =
12.6%.
For the new example, Prob[Agg. > 2520] ≅ 1 - Φ[(2520 - 2400)/ 1100 ] = 1 - Φ[3.618] =
0.015%.]
For a larger portfolio, all else being equal, there is less chance of an extreme outcome in a given year
measured as a percentage of the mean.
Derivation of the Formula for the Process Variance of the Aggregate Losses:
The above formula for the process variance of the aggregate losses for independent frequency and
severity is a special case of a general formula:69
Var(Y) = EZ[VARY(Y | Z)] + VARZ(EY[Y | Z]), where Z and Y are any random variables.
Letting Y be the aggregate losses, A, and Z be the number of claims, N, in the above formula
gives:
Var(Agg) = EN[VARA(A | N)] + VARN(EA[A | N]) = EN[NσX2 ] + VARN(µXN) =
EN[N]σX2 + µX2 VARN(N) = µFreq σX2 + µX2 σFreq2 .
Where we have used the assumption that the frequency and severity are independent and the
facts:
• For a fixed number of claims N, the variance of the aggregate losses is the variance of
the sum of N independent identically distributed variables each with variance σX2 .
(Since frequency and severity are assumed independent, σX2 is the same for each
value of N.) Such variances add so that VARA(A | N) = NσX2 .

• For a fixed number of claims N, for frequency and severity independent the expected
value of the aggregate losses is N times the mean severity: EA[A | N] = µXN.
• Since with respect to N the variance of the severity acts as a constant :
EN[NσX2 ] = σX2 EN[N] = µFreq σX2 .
• Since with respect to N the mean of the severity acts as a constant :
VARN(µXN) = µX2 VARN(N) = µX2 σFreq2 .
Letʼs apply this derivation to the previous example. You were given the following:
• For a given risk, the number of claims is given by a Binomial Distribution with
q = 0.3 and m = 2.
• The size of the claim will be either 50 with probability 80%, or 100 with probability 20%.
• frequency and severity are independent.
There are only three possible values of N: N=0, N=1, or N=2. If N = 0 then A = 0. If N = 1 then
either A = 50 with 80% chance or A = 100 with 20% chance. If N = 2 then A = 100 with 64%
chance, A = 150 with with 32% chance or A = 200 with 4% chance.
69
As discussed in “Mahlerʼs Guide to Buhlmann Credibility”:
Total Variance = Expected Value of the Process Variance + Variance of the Hypothetical Means.
We then get :
N Probability Mean A Square of Mean Second Moment of Var of A
Given N A Given N of A Given N Given N
0 49% 0 0 0 0
1 42% 60 3600 4000 400
2 9% 120 14400 15200 800
Mean 36 2808 240
For example, given two claims, the second moment of the aggregate losses is:
(64%)(1002 ) + (32%)(1502 ) + (4%)(2002 ) = 15,200. Thus given two claims the variance of the
aggregate losses is: 15,200 - 1202 = 800.
Thus EN[VARA(A | N)] = 240, and VARN(EA[A | N]) = 2808 - 362 = 1512. Thus the variance of the
aggregate losses is EN[VARA(A | N)] + VARN(EA[A | N]) = 240 + 1512 = 1752, which matches
the result calculated above. The (total) process variance of the aggregate losses has been split into
two pieces. The first piece calculated as 240, is the expected value over the possible numbers of
claims of the process variance of the aggregate losses for fixed N.
The second piece calculated as 1512, is the variance over the possible numbers of the claims of the
mean aggregate loss for fixed N.
Poisson Frequency:
Assume you are given the following:

• For a given risk, the number of claims for a year is Poisson with mean 7.
• The size of the claim will be 50 with probability 80%, or 100 with probability 20%.
• frequency and severity are independent.
Exercise: Determine the variance of the aggregate losses for this risk.
[Solution: µFreq = σFreq2 = 7. µSev = 60. σSev2 = 400.
σAgg2 = µFreq σSev2 + µSev2 σFreq2 = (7)(400) + (602 )(7) = 28,000.]
In the case of a Poisson Frequency with independent frequency and severity the formula for the
process variance of the aggregate losses simplifies. Since µFreq = σFreq2 :
σAgg2 = µFreq σSev2 + µSev2 σFreq2 = µFreq(σSev2 + µSev2 ) = µFreq (2nd moment of severity).
The variance of a Compound Poisson is: λ (2nd moment of severity).

In the example above, the second moment of the severity is: (0.8)(502 ) + (0.2)(1002 ) = 4000.
Thus σAgg2 = λ (2nd moment of the severity) = (7)(4000) = 28,000.
As a final example, assume you are given the following:

• For a given risk, the number of claims for a year is Poisson with mean 3645.
• The severity distribution is LogNormal, with parameters µ = 5 and σ = 1.5.
Exercise: Determine the variance of the aggregate losses for this risk.
[Solution: The second moment of the severity = exp(2µ + 2σ2 ) = exp(14.5) = 1,982,759.
Thus σAgg2 = λ(2nd moment of the severity) = (3645)(1,982,759) = 7.22716 x 109 .]
Formula in Terms of Moments of Severity:
It may sometimes be useful to rewrite the variance of the aggregate loss in terms of the first and
second moments of the severity:
σAgg2 = µFreqσSev2 + µSev2 σFreq2 = µFreq(E[Sev2 ] - E[Sev]2 ) + E[Sev]2 σFreq2

= µFreqE[Sev2 ] + E[Sev]2 (σFreq2 - µFreq).
For a Poisson frequency distribution the final term is zero, σAgg2 = λE[Sev2 ].
For a Negative Binomial frequency distribution, σAgg2 = rβE[Sev2 ] + E[Sev]2 rβ2.
For the Binomial frequency distribution, σAgg2 = mqE[Sev2 ] - E[Sev]2 mq2 .
Normal Approximation:
For frequency and severity independent, for large numbers of expected claims, the observed
aggregate losses are approximately normally distributed. The more skewed the severity
distribution, the higher the expected frequency has to be for the Normal Approximation to produce
worthwhile results.
For example, continuing the example above, the mean Poisson frequency is 3645,
and the mean severity = exp[µ + 0.5σ2 ] = exp(6.125) = 457.14.
Thus the mean aggregate losses is: (3645)(457.14) = 1,666,292.
One could ask what the chance of the observed aggregate losses being between 1.4997 million
and 1.8329 million. Since the variance of the aggregate losses is 7.22716 x 109 , the standard
deviation of the aggregate losses is 85,013.
Thus the probability of the observed aggregate losses being within ±10% of 1.6663 million is
approximately: Φ[(1.8329 - 1.6663 million) / 85,013] - Φ[(1.4997 - 1.6663 million) / 85,013] =
Φ[1.96] - Φ[-1.96] = 0.975 - (1 - 0.975) = 95%.
LogNormal Approximation:
When one has other than a large number of expected claims, the distribution of aggregate losses
typically has a significantly positive skewness. Therefore, it makes sense to approximate the
aggregate losses with a distribution that also has a positive skewness.70 Loss Models illustrates
how to use a LogNormal Distribution to approximate the Aggregate Distribution.71 One applies the
method of moments to fit a LogNormal Distribution with the same mean and variance as the
Aggregate Losses.
Exercise: For a given risk, the number of claims for a year is Negative Binomial with
β = 3.2 and r = 14. The severity distribution is Pareto with parameters α = 2.5 and θ = 10.
Determine the mean and variance of the aggregate losses.
[Solution: The mean frequency is: (3.2)(14) = 44.8. The variance of frequency is:
(3.2)(1 + 3.2)(14) = 188.16. The mean severity is 10/(2.5 - 1) = 6.667.
The second moment of severity is: 2θ2 / {(α-1)(α-2)} = 200 / {(0.5)(1.5)} = 266.67.
The variance of the severity is: 266.67 - 6.6672 = 222.22.
Thus the mean of the aggregate losses is: (44.8)(6.667) = 298.7.
The variance of the aggregate losses is: (44.8)(222.22) + (6.6672 )(188.16) = 18,319.]
70
The Normal Distribution being symmetric has zero skewness.
71
See Example 9.4 of Loss Models. Actuarial Mathematics, at pages 388-389 not on the Syllabus, demonstrates
how to use a “translated Gamma Distribution.” “Approximations of the Aggregate Loss Distribution,” by Papush,
Patrik, and Podgaits, CAS Forum Winter 2001, recommends that if one uses a 2 parameter distribution, one use the
Gamma Distribution. Loss Models mentions that one could match more than the first two moments by using
distributions with more than two parameters.
Exercise: Fit a LogNormal Distribution to the aggregate losses in the previous exercise, by matching
the first two moments.
[Solution: mean = 298.7. second moment = 18,319 + 298.72 = 107,541.
Matching the mean of the LogNormal and the data: exp(µ + 0.5 σ2) = 298.7.
Matching the second moment of the LogNormal and the data: exp(2µ + 2σ2) = 107,541.
Divide the second equation by the square of the first equation:
exp(2µ + 2σ2) / exp(2µ + σ2) = exp(σ2) = 1.205.
⇒ σ = 0.1867 = 0.432. ⇒ µ = ln(298.7) - σ2/2 = 5.606.

Comment: The Method of Moments as discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”]
Then one can use the approximating LogNormal Distribution to answer questions about the
aggregate losses.
Exercise: Using the LogNormal approximation, estimate is the probability that the aggregate losses
are less than 500?
[Solution: Φ[(ln(500) - 5.606)/0.432] = Φ[1.41] = 0.9207.]
Exercise: Using the LogNormal approximation, estimate is the probability that the aggregate losses
are between 200 and 500?
[Solution: Φ[(ln(500)-5.606)/0.432] - Φ[(ln(200)-5.606)/0.432] = Φ[1.41] - Φ[-0.71] =
0.9207 - 0.2389 = 0.6818.]
Higher Moments of the Aggregate Losses:
When frequency and severity are independent, just as one can write the variance or coefficient of
variation of the aggregate loss distribution in terms of quantities involving frequency and severity,
one can write higher moments in this manner. For example, the third central moment of the aggregate
losses can be written as:72
third central moment of the aggregate losses =

(mean frequency)(3rd central moment of severity) +
3(variance of frequency)(mean severity)(variance of severity) +
(mean severity)3 (3rd central moment of frequency).
Note that each term is in dollars cubed.

72
See Equation 9.9 in Loss Models. Also, see either Actuarial Mathematics or Practical Risk Theory for Actuaries, by
Daykin, et. al. As shown in the latter, one can derive this formula via the cumulant generating function.
This formula can be written in terms of skewnesses as follows:

E[Agg - E[Agg]]3 = µFreq σX3 γX + 3 σFreq2 µX σX2 + σFreq3 γFreq µX3 .
Therefore, the skewness of the aggregate losses is:
γAgg = {µFreq σX3 γX + 3 σFreq2 µX σX2 + σFreq3 γFreq µX3 } / σAgg3 .
This can also be written in terms of coefficients of variation rather than variances:
(CVX3 γX / µFreq 2) + (3 CVFreq 2 CV X2 / µ Freq) + CVFreq 3 γFreq
γAgg = .
CVAgg3
Exercise: If the frequency is Negative Binomial with r = 27 and β = 7/3, then what are the mean,
coefficient of variation and skewness?
1 + β
[Solution: µFreq = rβ = 63, CVFreq = = 0.2300, and γFreq = (1+2β) / (1+ β)βr = 0.391.]
βr
Exercise: If the severity is given by a Pareto Distribution with α = 4 and θ = 3, then what are the
mean, coefficient of variation and skewness?
[Solution: E[X] = θ/(α-1) = 1. E[X2 ] = 2θ2/{(α-1)(α-2)} = 3. E[X3 ] = 6θ2/{(α-1)(α-2)(α-3)} = 27.
µX = E[X] = 1. Var[X] = E[X2 ] - E[X]2 = 2. CVX = 2 /1 = 1.414.

3rd central moment = E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 = 20. γX = 20/21.5 = 7.071.]
Exercise: The frequency is Negative Binomial with r = 27 and β = 7/3.

The severity is Pareto with α = 4 and θ = 3. Frequency and severity are independent.
What are the mean, coefficient of variation and skewness of the aggregate losses?
[Solution: µAgg = µFreqµX = 63, CVAgg2 = CVFreq2 + CVX2 / µFreq = 0.0846 and therefore
(CVX3 γX / µFreq 2) + (3 CVFreq 2 CV X2 / µ Freq) + CVFreq 3 γFreq

C VAgg = 0.291, and γAgg = =
CVAgg3
{(1.4143 )(7.071)/632 + 3(1.4142 )(0.232 )/63 + (0.233 )(0.391)} / 0.2913 = 0.0148/0.0246 = 0.60.
Note that the variance of the aggregate losses in this case is:
σAgg2 = µFreq σX2 + µX2 σFreq2 = (63)(2) + (1)(210) = 336.
Comment: Actuarial Mathematics in Table 12.5.1, not on the Syllabus, has the following formula for
the third central moment of a Compound Negative Binomial:
r{βE[X3 ] + 3 β2E[X]E[X2 ] + 2 β3E[X]3 }. In this case, this formula gives a third central moment of:
(207){(7/3)(27) + (3)(7/3)2 (1)(3) + (2)(7/3)3 (13 ) = 3710.
Thus the skewness is: 3710 / (3361.5) = 0.60.]
As the expected number of claims increases, the skewness of the aggregate losses → 0, making
the Normal Approximation better. As discussed above, when the skewness of the aggregate
losses is significant, one can approximate the aggregate losses via a LogNormal.73
For the Poisson Distribution the mean = λ, the variance = λ, and the skewness is 1/ λ .74
Therefore, for a Poisson Frequency, σFreq3 γFreq = λ1.5/ λ = λ, and the third central moment of the
aggregate losses is: E[Agg - E[Agg] ]3 = µFreq σX3 γX + 3 σFreq2 µX σX2 + σFreq3 γFreq µX3 =
λ (third central moment of severity) + 3λ µXσX2 + λ µX3 =
λ {E[X3 ] - 3µXE[X2 ] + 2µX3 + 3µX (E[X2 ] - µX2 ) + µX3 } = λ (third moment of the severity).
The Third Central Moment of a Compound Poisson Distribution is:

(mean frequency) (third moment of the severity).
For a Poisson Frequency, the variance of the aggregate losses is: λ(2nd moment of severity).
Therefore, skewness of a compound Poisson =
(third moment of the severity) / { λ (2nd moment of severity)1.5}.
Exercise: Frequency is Poisson with mean 3.1. Severity is discrete with: P[X=100] = 2/3,
P[X=500] = 1/6, and P[X=1000] = 1/6. Frequency and Severity are independent.
What is the skewness of the distribution of aggregate losses?
[Solution: The second moment of the severity is:
(2/3)(1002 ) + (1/6)(5002 ) + (1/6)(10002 ) = 215,000. The third moment of the severity is:
(2/3)(1003 ) + (1/6)(5003 ) + (1/6)(10003 ) = 188,166,667.
The skewness of a compound Poisson =
(third moment of the severity) / { λ (2nd moment of severity)1.5} =
188,166,667/ { 3.1 (215000)1.5} = 1.072.
Comment: Since the skewness is a dimensionless quantity that does not depend on the scale, we
would have gotten the same answer if we had instead worked with a severity distribution with all of
the amounts divided by 100: P[X=1] = 2/3, P[X=5] = 1/6, and P[X=10] = 1/6.]
E[X4 ]
The Kurtosis of a Compound Poisson Distribution is: 3 + .
E[X2] 2 λ
73
As discussed in Actuarial Mathematics, when the skewness of the aggregate losses is significant, one can
approximate with a translated Gamma Distribution rather than a Normal Distribution.
74
Per Claim Deductibles:75
If frequency and severity are independent, then the aggregate losses depend on the number of
losses greater than the deductible amount and the size of the losses truncated and shifted by the
deductible amount. If frequency is Poisson with parameter λ, then the losses larger than d are also
Poisson, but with parameter: S(d) λ.76
Exercise: Frequency is Poisson with λ = 10.

If 19.75% of losses are large, what is the frequency distribution of large losses?
[Solution: It is Poisson with λ = (10)(19.75%) = 1.975.]
If frequency is Negative Binomial with parameters β and r, then the losses larger than d are also
Negative Binomial, but with parameters: S(d)β and r.77
Exercise: Frequency is Negative Binomial with r = 2.4 and β = 1.1.

If 77.88% of losses are large, what is the frequency distribution of large losses?
[Solution: It is Negative Binomial with r = 2.4 and β = (1.1)(0.7788) = 0.8567.]
One can then look at the non-zero payments by the insurer. Their sizes are distributed as the original
distribution truncated and shifted by d.78 The mean of the non-zero payments = the mean of the
severity distribution truncated and shifted by d: {E[X] - (E[X ∧ d]} / S(d).
Exercise: For a Pareto with α = 4 and θ = 1000, compute the mean of the non-zero payments given
a deductible of 500.
[Solution: For the Pareto E[X] = θ/(α-1) = 1000/ 3 = 333.33. The limited expected value is
E[X ∧ 500] = {θ/(α−1)} {1−(θ/(θ+500))α−1} = 234.57. S(500) = 1/(1+500/1000)4 = 0.1975.

(E[X] - E[X ∧ 500])/S(500) = (333.33 - 234.57)/0.1975 = 500.
Alternately, the mean of the data truncated and shifted at 500 is the mean excess loss at 500.
For the Pareto, e(x) = (x+θ)/(α-1). e(500) = (500+1000)/(4-1) = 500.
Alternately, the distribution truncated and shifted (from below) at 500 is
G(x) = {F(x+500) - F(500)}/S(500) = {(1+500/1000)-4 - (1+(x+500)/1000)-4}/(1+500/1000)-4 =
1 - (1+x/1500)-4}. This is a Pareto with α = 4 and θ = 1500, and mean 1500/(4-1) = 500.]
75
A per claim deductible operates on each loss individually. This should be distinguished from an aggregate
deductible which applies to the aggregate losses, as discussed below in the section on stop loss premiums.
76
Where S(d) is the survival function of the severity distribution prior to the impact of the deductible.
77
78
Thus a Pareto truncated and shifted at d, is another Pareto with parameters α and
θ+d. Therefore the above Pareto distribution truncated and shifted at 500, with parameters 4 and
1000 + 500 = 1500, has a variance of (15002 )(4)/{(4-2)(4-1)2 } = 500,000.
The Exponential distribution has a similar nice property. An Exponential Distribution truncated and
shifted at d, is another Exponential with the same mean. Thus if one has an Exponential with
θ = 2000, and one truncates and shifts at 500 (or any other value) one gets another exponential with
θ = 2000. Thus the mean of the severity truncated and shifted is 2000, and the variance is:
20002 = 4,000,000
For any severity distribution, given a deductible of d, the variance of the non-zero payments =
the variance of the severity distribution truncated and shifted by d is:79
{E[X2 ] - E[(X ∧ d)2 ] - 2d{E[X] - (E[X ∧ d]}}/S(d) - {{E[X] - (E[X ∧ d]}/S(d)}2 .
Exercise: For a Pareto with α = 4 and θ = 1000, use the above formula to compute the variance of
the non-zero payments given a deductible of 500.
[Solution: E[X2 ] = (2)(10002 )/ {(4-1)(4-2)} = 333,333.
E[(X ∧ 500)2 ]= E[X2 ] {1 - (1+ 500/ θ)1−α[1+ (α-1)500/ θ]} = 86,420.
E[X ∧ 500] = 234.57. E[X] = θ/(α-1) = 1000/ 3 = 333.33. S(500) = 0.1975.
{E[X2 ] - E[(X ∧ d)2 ] - 2d{E[X] - (E[X ∧ d]}}/S(d) - {{E[X] - (E[X ∧ d]}/S(d)}2 =
{333,333 - 86420 - (2)(500(333.33 - 234.57)} / 0.1975 - 5002 = 500,000.
Comment: I have used a formula from Section 32 of “Mahlerʼs Guide to Loss Distributions” for the
second limited moment of a Pareto, which is not given in Loss Models.]
We can then combine the frequency and severity after the effects of the per claim deductible, in
order to work with the aggregate losses.
Exercise: Frequency is Poisson with λ = 10. Severity is Pareto with α = 4 and θ = 1000.
Severity and frequency are independent. There is a per claim deductible of 500.
What are the mean and variance of the aggregate losses excess of the deductible?
[Solution: The frequency of non-zero payments is Poisson with mean (0.1975)(10) = 1.975.
The severity of non-zero payments is Pareto with α = 4 and θ = 1500.
The mean aggregate loss is (1.975)(1500/3) = 987.5.
The variance of this compound Poisson is:
1.975 (2nd moment of Pareto) = (1.975)(15002 ) 2/ {(4-1)(4-2)} = 1,481,250.]
79
Exercise: Frequency is Negative Binomial with r = 2.4 and β = 1.1. Severity is Exponential with
θ = 2000. Severity and frequency are independent. There is a per claim deductible of 500.
What is the mean and variance of the aggregate losses excess of the deductible?
[Solution: S(500) = e-500/2000 = 0.7788. The frequency of non-zero payments is: Negative
Binomial with r = 2.4 and β = (1.1)(0.7788) = 0.8567. The mean frequency of non-zero payments
is: (2.4)(0.8567) = 2.056. The variance of the number of non-zero payments is:
(2.4)(0.8567)(1.8567) = 3.818. The severity of non-zero payments is Exponential with θ = 2000.
The mean non-zero payment is 2000.
The variance of size of non-zero payments is 20002 = 4 million.
The mean of the aggregate losses excess of the deductible is: (2.056)(2000) = 4112.
The variance of the aggregate losses excess of the deductible is:
(2.056)(4 million) + (3.818)(20002 ) = 23.5 million.
Comment: See Course 3 Sample Exam, Q.20. ]
Maximum Covered Losses:80
Assume the severity follows a LogNormal Distribution with parameters µ = 8 and σ = 2. Assume
frequency is Poisson with λ = 100. The mean severity is: exp(µ + σ2/2) = e10 = 22026.47, and the
mean aggregate losses are: 2,202,647. The second moment of the severity is: exp(2µ + 2σ2) =
e24 = 26.49 billion. Thus the variance of the aggregate losses is: (100)(26.49 billion) = 2649 billion.
Exercise: If there is a $250,000 maximum covered loss, what are the mean and variance of the
aggregate losses paid by the insurer?
[Solution: E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x{1 - Φ[(lnx − µ)/σ]}.
E[X ∧ 250,000] = e10 Φ[(ln(250,000) - 8 - 4 )/2] + (250,000){1 - Φ[(ln(250,000) − 8)/2]} =
(22026)Φ[.2146] + (250,000)(1−Φ[2.2146]) =
(22026)(0.5850) + (250,000)(1- 0.9866) = 16235.
E[(X ∧ x)2 ] = exp[2µ + 2σ2]Φ[{ln(x) − (µ+ 2σ2)} / σ] + x2 {1- Φ[{ln(x) − µ} / σ] }.
E[(X ∧ 250,000)2 ] = exp(24)Φ[-1.7854] + 62.5 billion{1-Φ[2.2146]} =

(26.49 billion) (0.03710) + (62.5 billion) (1- 0.9866) = 1.820 billion.
The frequency is unaffected by the maximum covered loss.
Thus the mean aggregate losses are (100)(16,235) = 1.62 million.
The variance of the aggregate losses is: (100)(1.820 billion) = 182.0 billion.]
80
We note how the maximum covered loss reduces both the mean and variance of the aggregate
losses. By cutting off the effect of the heavy tail of the severity, the maximum covered loss has a
significant impact on the variance of the aggregate losses.
In general one can do similar calculations for any severity distribution and frequency distribution.
The moments of the severity are calculated using the limited expected moments, as shown in
Appendix A of Loss Models, while the frequency is unaffected by the maximum covered loss.
Maximum Covered Losses and Deductibles:
If one has both a maximum covered loss u and a per claim deductible d, then the severity is the
layer of loss between d and u, while the frequency is the same as that in the presence of just the
deductible. The first moment of the nonzero payments is:
{E[X ∧ u] - (E[X ∧ d]}} / S(d), while the second moment of the nonzero payments is:
{E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - (E[X ∧ d]}}/S(d).81
Exercise: If in the previous exercise there were a per claim deductible of $50,000 and a maximum
covered loss of $250,000, what would be the mean and variance of the aggregate losses paid by
the insurer?
[Solution: E[X ∧ 50,000] = 10,078. E[X ∧ 250,000] = 16,235.
E[(X ∧ 250,000)2 ] = 1.820 billion. E[(X ∧ 50,000)2 ] = 0.323 billion.
S(50,000) = 1 - Φ[(ln(50000) - 8)/2] = 1 - Φ(1.410) = 1 - 0.9207 = 0.0793.
The first moment of the nonzero payments is: (16235 - 10078)/0.0793 = 77,642.
The second moment of the nonzero payments is:
(1.820 billion - 0.323 billion - (2)(50000)(16235 - 10078))/0.0793 = 11.11 billion.
The frequency of nonzero payments is Poisson with λ = (100)(0.0793) = 7.93.
Thus the mean aggregate losses are: (7.93)(77642) = 616 thousand.
The variance of the aggregate losses is: (7.93)(11.11 billion) = 88.1 billion.]
We can see how such calculations involving aggregate losses in the presence of both a deductible
and a maximum covered loss, can quickly become too time consuming for exam conditions.
81
Compound Frequency Distributions:82
Poisson with mean 1.3.
In addition, assume that the number of passengers dropped off at the hotel by each taxicab is
Binomial with q = 0.4 and m = 5.
The number of passengers dropped off by each taxicab is independent of the number of taxicabs
that arrive and is independent of the number of passengers dropped off by any other taxicab.
Then the aggregate number of passengers dropped off per minute at the Heartbreak Hotel is an
example of a compound frequency distribution.
Compound distributions are mathematically equivalent to aggregate distributions,

with a discrete severity distribution.
Poisson ⇔ Frequency.
Binomial ⇔ Severity.
Thus although compound distributions are not on the syllabus, on your exam, one could describe
the above situation as a collective risk model with Poisson frequency and Binomial Severity.
Aggregate Distribution Compound Frequency Distribution

Frequency ⇔ Primary (# of cabs)
Severity ⇔ Secondary (# of passengers per cab)
σC 2 = µf σs 2 + µs 2 σf 2 .
f ⇔ frequency or first (primary)

s ⇔ severity or secondary.
82
Discussed more extensively in “Mahlerʼs Guide to Frequency Distributions.”
Process Covariances:83
Assume a claims process in which frequency and severity are independent of each other, and the
claim sizes are mutually independent random variables with a common distribution.84 Then let each
claim be divided into two pieces in a well-defined manner not dependent on the number of claims.
For convenience, we refer to these two pieces as primary and excess.85
Let: Tp = Total Primary Losses Te = Total Excess Losses

Xp = Primary Severity Xe = Excess Severity
N = Frequency
Then as will be proved subsequently:

COV[Tp , Te ] = E[N] COV[Xp , Xe ] + VAR[N] E[Xp ] E[Xe ].86
Exercise: Assume severity follows an Exponential Distribution with θ = 10,000.

The first 5000 of each claim is considered primary losses. Xp = Primary Severity = X ∧ 5000.
Excess of 5000 is considered excess losses. Xe = Excess Severity = (X - 5000)+.
Determine the covariance of Xp and Xe .
Hint: ∫ x e - x / θ / θ dx = -x e-x/θ - θ e-x/θ. ∫ x2 e- x / θ / θ dx = -x2 e-x/θ - 2xθ e-x/θ - 2θ2 e-x/θ.

[Solution: E[Xp ] = E[X ∧ 5000] = (10,000) (1 - e-5000/10,000) = 3935.
E[Xe ] = 10,000 - 3935 = 6065.
5000 ∞
E[Xp Xe ] =
∫0 x 0 e - x / 10000 dx +
∫ x (x - 5000) e- x / 10000 dx
5000
∞ ∞
= ∫ x2 e- x / 10000 dx - 5000
∫ x e - x / 10000 dx
5000 5000
= {(50002 ) +(2)(5000)(10,000) + (2)(10,0002 )} e-0.5 - (5000){5000 + 10,000}e-0.5 =

151.633 million.
Cov[Xp , Xe ] = E[Xp Xe ] - E[Xp ]E[Xe ] = 151.633 million - (3935)(6065) = 127.767 million.]
83
Beyond what you will be asked on your exam.
84
In other words, assume the usual collective risk model.
85
In a single-split experience rating plan, the first $5000 of each claim might be primary and then anything over
$5000 would contribute to the excess losses.
86
This is a generalization of the formula we had for the process variance of aggregate losses.
See Appendix A of Howard Mahlerʼs discussion of Glenn Meyersʼ “An Analysis of Experience Rating”, PCAS 1987.
Exercise: Assume severity follows an Exponential Distribution with θ = 10,000.

The first 5000 of each claim is primary losses. Excess of 5000 is excess losses.
Frequency is Negative Binomial with r = 3 and β = 0.4.
Determine the covariance of the total primary and total excess losses.
[Solution: COV[Tp , Te ] = E[N] COV[Xp ,Xe ] + VAR[N] E[Xp ] E[Xe ] =
(3)(0.4) (127.761 million) + (3)(0.4)(1.4) (3935)(6065) = 193.408 million.]
Proof of the Result for Process Covariances:
The total primary losses Tp is the sum of the individual primary portions of claims Xp (i), where
i runs from 1 to N, the number of claims. Similarly, Te is a sum of Xe (i).
Since N is a random variable, both frequency and severity contribute to the covariance of Tp and Te .
To compute the covariance of Tp , and Te , begin by calculating E[Tp Te | N=n].
⎡ n n ⎤
Fix the number of claims n and find E ⎢
⎢ ∑
Xp(i) ∑ Xe(i) ⎥.
⎥
⎣ i=1 i=1 ⎦
Expanding the product yields n2 terms of the form Xp (i) Xe (j).
From the definition of covariance, when i = j the expected value of the term is:
E[Xp (i) Xe (i)] = COV[Xp (i), Xe (i)] + E[Xp (i)] E[Xe (i)].
Otherwise, for i ≠ j, X(i) and X(j) are independent and E[Xp (i) Xe (j)] = E[Xp (i)] E[Xe (j)].
⎡ n n ⎤
Thus E⎢
⎢ ∑ Xp(i) ∑ Xe(i) ⎥ = n COV[Xp , Xe ] + n2 E[Xp ] E[Xe ].
⎥
⎣ i=1 i=1 ⎦
Now, by general considerations of conditional expectations: E[Tp Te ] = EN[ E[Tp Te | N=n] ].
Thus, taking the expected value of the above equation with respect to N gives:
E[Tp Te ] = E[N] COV[Xp , Xe ] + E[N2 ] E[Xp ] E[Xe ]
= E[N] COV[Xp , Xe ] + {Var[N] + E[N]2 } E[Xp ] E[Xe ].
COV[Tp , Te ] = E[Tp Te ] - E[Tp ] E[Te ]

= E[N] COV[Xp , Xe ] + {Var[N] + E[N]2 } E[Xp ] E[Xe ] - E[N] E[Xp ] E[N] E[Xe ]
= E[N] COV[Xp , Xe ] + VAR[N] E[Xp ] E[Xe ].
Problems:

• mean frequency = 13
• variance of the frequency = 37
• mean severity = 300
• variance of the severity = 200,000
• frequency and severity are independent
What is the variance of the aggregate losses?
B. At least 5 million but less than 6 million
C. At least 6 million but less than 7 million
D. At least 7 million but less than 8 million
E. At least 8 million
5.2 (2 points) A six-sided die is used to determine whether or not there is a claim. Each side of the
die is marked with either a 0 or a 1, where 0 represents no claim and 1 represents a claim. Two sides
are marked with a 0 and four sides with a 1. In addition, there is a spinner representing claim
severity. The spinner has three areas marked 2, 5 and 14. The probabilities for each claim size are:
2 20%
5 50%
14 30%
The die is rolled and if a claim occurs, the spinner is spun.
What is the variance for a single trial of this risk process?
A. Less than 24
E. At least 27

• Number of claims for an insured follows a Poisson distribution with mean 0.25.
• The amount of a single claim has a uniform distribution on [0, 5000]
• Number of claims and claim severity are independent.
Determine the variance of the aggregate losses for this insured.
B. At least 2.1 million but less than 2.2 million
C. At least 2.2 million but less than 2.3 million
D. At least 2.3 million but less than 2.4 million

• For a given risk, the number of claims for a single exposure period will be 1,
with probability 4/5; or 2, with probability 1/5.
• If only one claim is incurred, the size of the claim will be 50, with probability 3/4;
or 200, with probability 1/4.
• If two claims are incurred, the size of each claim, independent of the other, will be 50,
with probability 60%; or 150, with probability 40%.
Determine the variance of the aggregate losses for this risk.
A. Less than 4,000
E. At least 5,500
5.5 (2 points) A large urn has many balls of three different kinds in the following proportions:
Type of Ball: Proportion
Red 70%
Green $50 20%
Green $200 10%
The risk process is as follows:
1. Set the aggregate losses equal to zero.
2. Draw a ball from the urn.
3. If the ball is Red then Exit, otherwise continue to step 4.
4. If the ball is Green add the amount shown to the aggregate losses and return to step 2.
Determine the process variance of the aggregate losses for a single trial of this risk process.
A. Less than 9,000
E. At least 12,000
5.6 (3 points) Assume there are 3 types of risks. Whether or not there is a claim is determined by
whether a six-sided die comes up with a zero or a one, with a one indicating a claim. If a claim occurs
then its size is determined by a spinner.
Type Number of die faces with a 1 rather than a 0 Claim Size Spinner
I 2 $100 70%, $200 30%
II 3 $100 50%, $200 50%
III 4 $100 30%, $200 70%
Determine the variance of aggregate annual losses for a portfolio of 300 risks, consisting of 100 risks
of each type.
• Number of claims follows a Poisson distribution with mean of 5.

• Claim severity is independent of the number of claims and has the following
probability density function: f(x) = 3.5 x-4.5, x >1.
5.7 (2 points) Determine the variance of the aggregate losses.

A. Less than 11.0
E. At least 11.9
5.8 (2 points) Using the Normal Approximation, estimate the probability that the aggregate losses
will exceed 11.
A. 10% B. 12% C. 14% D. 16% E. 18%
5.9 (2 points) Approximating with a LogNormal Distribution, estimate the probability that the
aggregate losses will exceed 11.
A. Less than 12%
E. At least 18%
5.10 (2 points) Determine the skewness of the aggregate losses.

A. Less than 0.4
E. At least 1.0
5.11 (2 points) Determine the variance of the aggregate losses, if there is a maximum covered loss
of 5.
A. Less than 11.0
E. At least 11.9

• Number of claims follows a Poisson distribution with mean µ.
• The amount of a single claim has an exponential distribution given by:
f(x) = e-x/θ / θ , x > 0, θ > 0
• Number of claims and claim severity distributions are independent.
5.12 (2 points ) Determine the variance of the aggregate losses.

A. µθ B. µθ2 C. 2µθ D. 2µθ2 E. None of A, B, C, or D.
5.13 (2 points ) Determine the skewness of the aggregate losses.

1 θ 3 3θ
A. B. C. D. E. None of A, B, C, or D.
2µ 2µ 2µ 2µ

• The frequency distribution follows the Poisson process with mean 3.
• The second moment about the origin for the severity distribution is 200.
• Frequency and Severity are independent.

What is the variance of the aggregate losses?
A. 400 B. 450 C. 500 D. 550 E. 600
5.15 (3 points) The number of accidents any particular automobile has during a year is Poisson with
mean 0.03. The damage to an automobile due any single accident is uniformly distributed over the
interval from 0 to 3000. Using the Normal Approximation, what is the minimum number of
independent automobiles that must be insured so that the probability that the aggregate annual
losses exceed 160% of expected is at most 5%?
A. 295 B. 305 C. 315 D. 325 E. 335

• For baseball player Don, the number of official at bats in a season is Poisson with λ = 600.
• For Don the probabilities of the following types of hits per official at bat are:
Single 22%, Double 4%, Triple 1%, and Home Run 5%.
• Donʼs contract provides him incentives of: $2000 per single, $4000 per double,
$6000 per triple and $8000 per home run ($2000 per base.)
What is the chance that next year Don will earn at most $700,000 from his incentives?
A. 84% B. 86% C. 88% D. 90% E. 92%

There are three types of risks. For each type of risk, the frequency and severity are independent.
Type Frequency Distribution Severity Distribution
Ι Binomial: m = 8, q = 0.4 Pareto: α = 4, θ = 1000
ΙΙ Poisson: λ = 3 LogNormal: µ = 7, σ = 0.5
ΙΙΙ Negative Binomial: r = 3, β = 2 Gamma: α = 3, θ = 200
5.17 ( 2 points) For a risk of Type Ι, what is the variance of the aggregate losses?
5.18 ( 2 points) For a risk of Type ΙΙ, what is the variance of the aggregate losses?
5.19 ( 2 points) For a risk of Type ΙΙΙ, what is the variance of the aggregate losses?
5.20 ( 2 points) Assume one has a portfolio made up of 55 risks of Type Ι, 35 risks of Type ΙΙ, and
10 risks of Type ΙΙΙ. Each risk in the portfolio is independent of all the others.
For this portfolio, what is the variance of the aggregate losses?
A. 310 million B. 320 million C. 330 million D. 340 million E. 350 million
5.21 (5 points) For a risk of Type ΙΙΙ, what is the skewness of the aggregate losses?
A. Less than 1.0
E. At least 1.3

• The severity distribution is an Exponential distribution with θ = 5000, prior to the impact of
any deductible or maximum covered loss.
• The number of losses follows a Poisson distribution with λ = 2.4, prior to the impact of any
deductible.
5.22 (1 point) What are the mean aggregate losses excess of a 1000 per claim deductible?
A. 9,600 B. 9,800 C. 10,000 D. 10,200 E. 10,400
5.23 (2 points) What is the standard deviation of the aggregate losses excess of a 1000 per claim
deductible?
A. 8,700 B. 9,000 C. 9,300 D. 9,600 E. 9,900
5.24 (1 point) What are the mean aggregate losses if there is a 10,000 maximum covered loss and
no deductible?
A. Less than 9,800
E. At least 10,400
5.25 (2 points) What is the standard deviation of the aggregate losses if there is a 10,000 maximum
covered loss?
A. Less than 6,000
E. At least 9,000
5.26 (2 points) What are the mean aggregate losses if there is both a 1000 per claim deductible
and a 10,000 maximum covered loss?
A. 7,800 B. 8,000 C. 8,200 D. 8,400 E. 8,600
5.27 (3 points) What is the standard deviation of the aggregate losses if there is both a 1000 per
claim deductible and a 10,000 maximum covered loss?
A. Less than 6,500
E. At least 8,000
Use the following size of loss data from ABC Insurance for the next 6 questions:
Range number of losses
0-1 60
1-3 30
3-5 20
5-10 10
120
Assume a uniform distribution of loss sizes within each interval.
In addition there are 5 losses of size greater than 10: 12, 15, 17, 20, 30.
5.28 (2 points) Calculate the mean.

A. 2.3 B. 2.5 C. 2.7 D. 2.9 E. 3.1
5.29 (3 points) Calculate the variance.

A. less than 15.5
E. at least 17.0
5.30 (2 points) Calculate e(7).

A. less than 6.0
E. at least 7.5
5.31 (2 points) The annual number of losses for ABC Insurance is Poisson with mean 40.
What is the coefficient of variation of its aggregate annual losses?
(A) 0.3 (B) 0.4 (C) 0.5 (D) 0.6 (E) 0.7
ABC Insurance buys reinsurance for the layer 10 excess of 5 (the layer from 5 to 15).
How much does the reinsurer expect to pay per year due to losses by ABC Insurance?
(A) 19 (B) 20 (C) 21 (D) 22 (E) 23
ABC Insurance buys reinsurance for the layer 10 excess of 5. What is the coefficient of variation of
the annual payment by the reinsurer due to losses by ABC Insurance?
(A) 0.6 (B) 0.7 (C) 0.8 (D) 0.9 (E) 1.0

• The severity distribution is a Pareto distribution with α = 3.2 and θ = 20,000, prior to the
impact of any deductible.
• The number of losses follows a Negative Binomial with r = 4.1 and β = 2.8, prior to the
impact of any deductible.
• There is a 50,000 per claim deductible.
What is the chance that the aggregate losses excess of the deductible are greater than 15,000?
A. Less than 20%
E. At least 35%

• The claim frequency for each policy is Poisson.
• The expected mean frequencies differ across the portfolio of policies.
• The mean frequencies are Gamma Distributed across the portfolio with α = 5 and θ = 0.4.
• Claim severity has a mean of 20 and a variance of 300.
• Claim frequency and severity are independent.
5.35 (3 points) If an insurer has sold 200 independent policies, determine the probability that the
aggregate loss for the portfolio will exceed 110% of the expected loss.
A. 7.5% B. 8.5% C. 9.5% D. 10.5% E. 11.5%
5.36 (2 points) Determine the minimum number of independent policies that would have to be sold
so that the probability that the aggregate loss for the portfolio will exceed 110% of the expected
loss does not exceed 1%. Use the Normal Approximation.
A. 400 B. 500 C. 600 D. 700 E. 800
5.37 (2 points) Frequency has mean 10 and variance 20. Severity has mean 1000 and variance
200,000. Severities are independent of each other and of the number of claims.
Let σ be the standard deviation of the aggregate losses.
Let σʼ be the standard deviation of the aggregate losses, given that 8 claims have occurred.
Calculate σ/σʼ.
(A) 2.9 (B) 3.1 (C) 3.3 (D) 3.5 (E) 3.7

• An insurer issues a policy that pays for hospital stays.
• Each hospital stay results in room charges and other charges.
• Total room charges for a hospital stay have mean 5000 and standard deviation 8000.
• Total other charges for a hospital stay have mean 2000 and standard deviation 3000.
• The correlation between total room charges and total other charges for a hospital stay is 0.6.
• The insurer reimburses 100% for room charges and 75% for other charges.
• The number of annual admission to the hospital is Binomial with m = 4 and q = 0.1.
Determine the standard deviation of the insurer's annual aggregate payments for this policy.
(A) 5500 (B) 6000 (C) 6500 (D) 7000 (E) 7500

• An insurer issues a policy that pays for loss plus loss adjustment expense.
• Losses follow a Gamma Distribution with α = 4 and θ = 1000.
• Loss Adjustment Expenses follow a Gamma Distribution with α = 3 and θ = 200.
• The correlation between loss and loss adjustment expense is 0.8.
• The number of annual claims is Poisson with λ = 0.6.
Determine the standard deviation of the insurer's annual aggregate payments for this policy.
(A) 3800 (B) 3900 (C) 4000 (D) 4100 (E) 4200
5.40 (2 points) For aggregate claims A, you are given:

∞
(i) fA(x) = ∑ p* n (x) 3n e - 3 / n!
n=0
(ii) x p(x)
1 0.5
2 0.3
3 0.2
Determine Var[A].
(A) 7.5 (B) 8.5 (C) 9.5 (D) 10.5 (E) 11.5
5.41 (3 points) Aggregate Losses have a mean of 100 and a variance of 90,000.
Approximating the aggregate distribution by a LogNormal Distribution, estimate the probability that
the aggregate losses are greater than 2000.
(A) 0.1% (B) 0.2% (C) 0.3% (D) 0.4% (E) 0.5%

• The number of claims follows a Poisson distribution with a mean of 7.
• Claim severity has a Pareto Distribution with α = 3 and θ = 100.
Approximating the aggregate losses by a LogNormal Distribution, estimate the probability that the
aggregate losses will exceed 1000.
A. 1% B. 2% C. 3% D. 4% E. 5%
5.43 (2 points) The number of losses is Poisson with mean λ.

The ground up distribution of size of loss is Exponential with mean θ.
Let B be the variance of aggregate payments if there is a deductible b.
Let C be the variance of aggregate payments if there is a deductible c > b.
Determine the ratio of C/B.
When is this ratio less than one, equal to one, and greater than one?
5.44 (2 points) Frequency is Poisson with λ = 3.

The size of loss distribution is Exponential with θ = 400.
There is an ordinary deductible of 500.
Calculate the variance of the aggregate payments excess of the deductible.
E. 280,000 or more
5.45 (3 points) The number of losses is Poisson with mean λ.

The ground up distribution of size of loss is Pareto with parameters α > 2, and θ.
Let B be the variance of aggregate payments if there is a deductible b.
Let C be the variance of aggregate payments if there is a deductible c > b.
Determine the ratio of C/B.
When is this ratio less than one, equal to one, and greater than one?
Use the following information for the next six questions:

The distribution of aggregate losses has a mean of 20 and a variance of 100.
5.46 (1 point) Approximate the distribution of aggregate losses by a Normal Distribution with mean
and variance equal to that of the aggregate losses.
Estimate the probability that the aggregate losses are greater than 42.
A. Less than 2.0%
E. At least 3.5%
5.47 (3 points) Approximate the distribution of aggregate losses by a LogNormal Distribution with
mean and variance equal to that of the aggregate losses.
A. 1.5% B. 2.0% C. 2.5% D. 3.0% E. 3.5%
5.48 (3 points) Approximate the distribution of aggregate losses by a Gamma Distribution with
n-1
Hint: Γ[n; λ] = 1 - ∑ e -λ λi / i! .
i=0
A. Less than 2.0%

E. At least 3.5%
5.49 (4 points) Approximate the distribution of aggregate losses by an Inverse Gaussian

Distribution with mean and variance equal to that of the aggregate losses.
exp[-x2 / 2]
Use the following approximation for x > 3: 1 - Φ[x] ≅ (1/x - 1/x3 + 3/x5 - 15/x7 ).
2π
A. Less than 2.0%
E. At least 3.5%
5.50 (4 points) If Y follows a Poisson Distribution with parameter λ, then for c > 0, cY follows an
“Over-dispersed Poisson” Distribution with parameters c and λ.
Approximate the distribution of aggregate losses by an Over-dispersed Poisson Distribution with
A. Less than 2.0%
E. At least 3.5%
5.51 (4 points) Approximate the distribution of aggregate losses by an Inverse Gamma

Distribution with mean and variance equal to that of the aggregate losses.
n-1
Hint: Γ[n; λ] = 1 - ∑ e -λ λi / i! .
i=0
A. Less than 2.0%

E. At least 3.5%
5.52 (2 points) The frequency for each insurance policy is Poisson with mean 2.
The cost per loss has mean 5 and standard deviation 12.
The number of losses and their sizes are all mutually independent.
Determine the minimum number of independent policies that would have to be sold so that the
probability that the aggregate loss for the portfolio will exceed 115% of the expected loss does not
exceed 2.5%. Use the Normal Approximation.
(A) 500 (B) 600 (C) 700 (D) 800 (E) 900
5.53 (3 points) Frequency is Negative Binomial with r = 4 and β = 3.

The size of loss distribution is Exponential with θ = 1700.
An insurance company sold policies as follows:

Number of Policy Probability of
Policies Maximum Claim Per Policy
10,000 25 3%
15,000 50 5%
You are given:
(i) The claim amount for each policy is uniformly distributed between 0 and the policy
maximum.
(ii) The probability of more than one claim per policy is 0.
(iii) Claim occurrences are independent.
5.54 (2 points) What is the variance of aggregate losses?

E. 800,000 or more
5.55 (1 point) What is the probability that aggregate losses are greater than 24,000?
A. Less than 4%
E. At least 7%
5.56 (3 points) The number of Property Damage Liability claims is Poisson with mean λ.
The size of Property Damage Liability claims has mean 10 and standard deviation 15.
The number of Bodily Injury Liability claims is Poisson with mean λ/3.
The size of Bodily Injury Liability claims has mean 24 and standard deviation 60.
Let P = the 90th percentile of the aggregate distribution of Property Damage Liability.
Let B = the 90th percentile of the aggregate distribution of Bodily Injury Liability.
B/P = 1.061.
Using the Normal Approximation, determine λ.
A. 60 B. 70 C. 80 D. 90 E. 100

• You are given five years of observed aggregate losses:
Year Aggregate Loss ($ million)
2006 31
2007 38
2008 36
2009 41
2010 41
• Frequency is Poisson with mean 3000.
• Severity follows a Pareto Distribution.
• Inflation is 4% per year.
Using the method of moments to fit the aggregate distribution to the data,
estimate the probability that an individual loss will be of size greater than $20,000 in 2012.
A. Less than 5%
E. At least 20%
5.58 (3 points) Let S be the aggregate loss and N be the number of claims.
Given the following information, determine the variance of S.
N Probability E[S | N] E[S2 | N]
0 20% 0 0
1 40% 100 50,000
2 30% 250 150,000
3 10% 400 300,000
A. 60,000 B. 65,000 C. 70,000 D. 75,000 E. 80,000
5.59 (3 points) Frequency is Binomial with m = 5 and q = 0.4.

Severity is LogNormal with µ = 6 and σ = 0.3.
Using the Normal Approximation, estimate the probability that the aggregate losses are greater than
150% of their mean.
A. 20% B. 25% C. 30% D. 35% E. 40%

• Annual claim occurrences follow a Zero-Modified Negative Binomial Distribution
with pM
0 = 40%, r = 2 and β = 0.4.
• Each claim amount follows a Gamma Distribution with α = 3 and θ = 500.

• Claim occurrences and amounts are independent.
Determine the variance of aggregate annual losses.
5.61 (2 points) You are given six years of aggregate losses:

111, 106, 98, 120, 107, 113.
Use the sample variance together with the Normal Approximation, in order to estimate the
probability that the aggregate losses next year are less than 100.
A. 9% B. 10% C. 11% D. 12% E. 13%
5.62 (3 points) The number of losses per year has a Poisson distribution with a mean of 0.35.
There are three types of claims:
Type of Claim Mean Frequency Mean Severity Coefficient of Variation of Severity
I 0.20 100 5
II 0.10 200 4
III 0.05 300 3
The number of claims of one type is independent of the number of claims of the other types.
Determine the variance of the distribution of annual aggregate losses.
(A) 150,000 (B) 165,000 (C) 180,000 (D) 195,000 (E) 210,000
5.63 (2 points) The distribution of the number of claims is:

n f(n)
1 40%
2 30%
3 20%
4 10%
The natural logarithm of the sizes of claims are Normal distributed with mean 6 and variance 0.7.
Determine the variance of the distribution of annual aggregate losses.
(A) 1.0 million (B) 1.1 million (C) 1.2 million (D) 1.3 million (E) 1.4 million
5.64 (2 points) X has the following distribution:

Prob[X = 0] = 20%, Prob[X = 1] = 30%, Prob[X = 2] = 50%.
Y is the sum of X independent Normal random variables, each with mean 3 and variance 5.
What is the variance of Y?
A. 8 B. 9 C. 10 D. 11 E. 12
5.65 (4 points) For liability insurance, the number of accidents per year is Poisson with mean 10%.
The number of claimants per accident follows a zero-truncated Binomial Distribution with
m = 4 and q = 0.2.
The size of each claim follows a Gamma Distribution with α = 3 and θ = 10,000.
Determine the coefficient of variation of the aggregate annual losses.
A. 3.2 B. 3.4 C. 3.6 D. 3.8 E. 4.0
5.66 (5 points) The Spring & Sommers Company has 2000 employees.
Spring & Sommers provides a generous disability program for its employees.
A disabled employee is paid 2/3 of his or her weekly salary.
The company self-insures the first 5 weeks of any disability and has an insurance policy that will
cover any disability payments beyond 5 weeks.
Occurrences of disability among employees are independent of one another.
Assume that an employee can suffer at most one disability per year.
Disabilities have duration:
1 week 30%
2 weeks 20%
3 weeks 10%
4 weeks 10%
5 or more weeks 30%
There are two types of employees:
Type Number of Employees Weekly Salary Annual Probability of a Disability
1 1500 600 5%
2 500 900 8%
Determine the coefficient of variation of the distribution of total annual payments Spring & Sommers
pays its employees for disabilities, excluding any amounts paid by the insurance policy.
(A) 0.10 (B) 0.15 (C) 0.20 (D) 0.25 (E) 0.30

• Annual claim occurrences follow a Zero-Modified Poisson Distribution with pM
0 = 25% and λ = 0.1.
• Each claim amount follows a LogNormal Distribution with µ = 8 and σ = 0.6.

• Claim occurrences and amounts are independent.
Determine the variance of aggregate annual losses.

• The number of claims for each policy is Negative Binomial.
• The claim amount for each policy is Gamma distributed.
• For a policy, the number of claims and their amounts are independent.
• Different policies are independent of each other.
• An insurance company sold 1300 policies as follows:
Number of Policies Negative Binomial Gamma Distribution
800 r = 0.4 and β = 0.1 α = 3 and θ = 50
500 r = 0.3 and β = 0.2 α = 4 and θ = 40
5.68 (3 points) What is the variance of aggregate losses?

E. 2.2 million or more
5.69 (2 points) What is the probability that aggregate losses are between 8000 and 9000?
A.12% B. 14% C. 16% D. 18% E. 20%
5.70 (3 points) The distribution of the number of claims is:

p 0 = 0.3, p1 = 0.3, p2 = 0.2, and p3 = 0.2.
The severity distribution is Inverse Gamma with α = 4 and θ = 60.
Determine the second moment of the distribution of annual aggregate losses.
(A) 1200 (B) 1400 (C) 1600 (D) 1800 (E) 2000
5.71 (3 points) An insurance policy covers 12 oil tankers all ten years old and of the same size.
• The number of claims for each tanker is independent of the number of claims for any other tanker.
• Each tanker has an annual Poisson frequency with mean λ.
• Over all oil tankers of age ten of this size, λ follows an Exponential Distribution with mean 0.146.
• Severity is discrete in millions of dollars: f(20) = 1/3. f(40) = 1/6. f(60) = 1/2.
Using the Normal Approximation, estimate the probability that the annual losses for this policy are
greater than $100 million.
A. 20% B. 25% C. 30% D. 35% E. 40%
5.72 (4 points) You are given the following for a collective risk model:
• The frequency distribution is a member of the (a, b, 0) class.
• Severity is uniform from 0 to ω.
• The mean aggregate loss is 108.
• The variance of aggregate loss is 3024.
You are given the following for a second collective risk model:
• The frequency distribution is the same of that of the first model.
• Severity is Exponential.
• The mean aggregate loss is 648.
• The variance of aggregate loss is 186,624.
Determine the probability of two losses.
(A) 0.16 (B) 0.18 (C) 0.20 (D) 0.22 (E) 0.24

The size of loss distribution is Pareto with α = 4 and θ = 2500.
B. At least 9 million, but less than 10 million
C. At least 10 million, but less than 11 million
D. At least 11 million, but less than 12 million
E. 12 million or more

(i) N = 0 with probability 70%, N = 1 with probability 20%, and N = 2 with probability 10%.
(ii) S is the sum of n independent, identically distributed Normal variables with µ = 20 and σ = 13.
Determine the variance of S.
(A) 220 (B) 240 (C) 260 (D) 280 (E) 300
5.75 (4, 5/85, Q.35) (2 points) Suppose x is a claim-size variable which is gamma distributed with
probability density function: f(x) = ar xr-1 e-ax / Γ(r)
where a and r are > 0 and x > 0, mean = r/a, variance = r/a2 .
N
Let T = total losses = ∑ xi , where N is a positive integer.
i=1
Assume the number of claims is independent of their amounts.

If E[N] = λ, Var[N] = 2λ, which of the following is the variance of T?
A. λr (2r+ λ) / a2 B. 2λ + r/a2 C. 2λr/a2 D. λr (2r+ 1) / a2
5.76 (4, 5/87, Q.44) (1 point) Let Xi be independent, identically distributed claim-size variables
which are gamma-distributed, with parameters α and θ.
N
Let T = total losses = ∑Xi where N is a positive integer.
i=1
Assume the number of claims is independent of their amounts.

If E[N] = m, VAR[N] = 3m, which of the following is the variance of T?
A. 3m + αθ2 B. mα(3α + 1)θ2 C. mα(3a + m)θ2 D. 3mαθ2
E. None A, B, C, or D.

For aggregate claims S, you are given:
∞ ⎛n + 2⎞
(i) fS(x) = ∑ p *n (x) ⎜ ⎟ (0.6)3 (0.4)n .
n=0 ⎝ n ⎠
(ii) x p(x)
1 0.3
2 0.6
3 0.1
Determine Var[S].
(A) 7.5 (B) 8.5 (C) 9.5 (D) 10.5 (E) 11.5

The policies of a building insurance company are classified according to the location of the building
insured:
Number of
Claim Policies Claim
Region Amount in Region Probability
A 20 300 0.01
B 10 500 0.02
C 5 600 0.03
D 15 500 0.02
E 18 100 0.01
There is at most one claim per policy and if there is a claim it is for the stated amount.
Using the normal approximation, relative security loadings are computed for each region such that
the probability that the total claims for the region do not exceed the premiums collected from policies
in that region is 0.95.
The relative security loading is defined as: (premiums / expected losses) - 1.
Which region pays the largest relative security loading?
(A) A (B) B (C) C (D) D (E) E
5.79 (Course 151 Sample Exam #2, Q.13) (1.7 points) For aggregate claims
N
S= ∑ Xi , you are given:
i=1
(i) Xi has distribution

x p(x)
1 p
2 1-p
(ii) Λ is a Poisson random variable with parameter 1/p
(iii) given Λ = λ, N is Poisson with parameter λ
(iv) the number of claims and claim amounts are mutually independent
(v) Var(S) = 19/2.
Determine p.
(A) 1/6 (B) 1/5 (C) 1/4 (D) 1/3 (E) 1/2

For an insured portfolio, you are given:
(i) the number of claims has a Geometric distribution with β = 1/3.
(ii) individual claim amounts can take values of 3, 4 or 5 with equal probability.
(iii) the number of claims and claim amounts are independent.
(iv) the premium charged equals expected aggregate claims plus the variance of
aggregate claims.
Determine the exact probability that aggregate claims exceeds the premium.
(A) 0.01 (B) 0.03 (C) 0.05 (D) 0.07 (E) 0.09

Let S be the aggregate claims for a collection of insurance policies.
You are given:
The size of claims has mean E[X] and second moment E[X2 ].
G is the premium with relative security loading η, (premiums / expected losses) - 1.
S has a compound Poisson distribution with parameter λ.
R = S/G (the loss ratio).
Which of the following is an expression for Var(R)?
E[X2] 1 E[X2 ] 1 E[X2] 2 1
(A) (B) (C)
E[X] 1+ η E[X]2 λ (1+ η) E[X]2 λ (1+ η)
E[X2 ] 1 E[X2] 2 1
(D) (E)
E[X]2 λ (1+ η)2 E[X]2 λ (1+ η)2

An insurer has a portfolio of 40 independent policies. For each policy you are given:
• The probability of a claim is 1/8 and there is at most one claim per policy.
• The benefit amount given that there is a claim has an Inverse Gaussian distribution
with µ = 400 and θ = 8000.
Using the Normal approximation, determine the probability that the total claims for the portfolio are
greater than 2900.
(A) 0.03 (B) 0.06 (C) 0.09 (D) 0.12 (E) 0.15

An insurance company is selling policies to individuals with independent future lifetimes and identical
mortality profiles. For each individual, the probability of death by all causes is 0.10 and the
probability of death due to accident is 0.01. Each insurance policy pays the following benefits:
10 for accidental death
1 for non-accidental death
The company wishes to to have at least a 95% probability that premiums with a relative security
loading of 0.20 are adequate to cover claims.
The relative security loading is: (premiums / expected losses) - 1.
Using the normal approximation, determine the minimum number of policies that must be sold.
(A) 1793 (B) 1975 (C) 2043 (D) 2545 (E) 2804
5.84 (5A, 11/94, Q.20) (1 point) The probability of a loss in a given period is 0.01.
The probability of more than one loss in a given period is 0. Given that a loss occurs, the damage
is assumed to be uniformly distributed over the interval from 0 to 10,000.
What is the variance of the aggregate loss experience within the given time period?
E. Greater than or equal to 350,000
5.85 (5A, 5/95, Q.36) (2 points) Suppose S is a compound Poisson distribution of aggregate
claims with a mean number of claims = 3 for a collection of insurance policies over a single premium
period. The first and second moments of the individual claim amount distribution are 100 and
15,000 respectively.
The aggregate premium was determined by applying a relative security loading,
(premiums / expected losses) - 1, of 0.1 to the expected aggregate claim amount and by ignoring
expenses. Determine the mean and variance of the loss ratio.
5.86 (5A, 5/96, Q.36) The XYZ Insurance Company insures 500 risks. For each risk, there is a
10% probability of having a claim, but no more than one claim is possible.
The individual claim amount distribution is given by f(x) = 0.001exp(-x/1000), for x > 0.
Assume that the risks are independent.
a. (1.5 points) What is the expectation and standard deviation of
S = X1 + X2 + ... + X500 where Xi is the loss on insured unit i?
b. (1 point) Assuming no expenses, using the Normal Approximation, estimate the premium
per risk necessary so that there is a 95% chance that the premiums are sufficient
to pay the resulting claims.
5.87 (5A, 5/97, Q.39) (2 points) For a one-year term life insurance policy, suppose the insurer
agrees to pay a fixed amount if the insured dies.
You are given the following information regarding the binomial claim distribution for this policy:
E[x] = 30 Var[x] = 29,100.
Calculate the amount of the death payment and the probability that the insured will die within the
next year.
5.88 (5A, 5/98, Q.37) (2 points) For a collection of homeowners policies, assume:
i) S represents the aggregate claim amount for the entire collection of policies.
ii) G is the aggregate premium collected.
iii) G = 1.2 E(S)
iv) The distribution for the number of claims is Poisson with λ = 5.
v) The claim amounts are identically distributed random variables that are uniform over the
interval (0,10).
vi) The number of claims and the claim amounts are mutually independent.
Find the variance of the loss ratio, S/G.
5.89 (5A, 11/98, Q.23) (1 point) The distribution of aggregate claims, S, is compound Poisson
with λ = 3. Individual claim amounts are distributed as follows:
x p(x)
1 0.40
2 0.20
3 0.40
Which of the following is the closest to the normal approximation of Pr[S > 9]?
A. 8% B. 11% C. 14% D. 17% E. 20%
5.90 (5A, 5/99, Q.24) (1 point) You are given the following information concerning the claim
severity, X, and the annual aggregate amount of claims, S:
E[X] = 50,000. Var[X] = 500,000,000. Var[S] = 30,000,000.
Assume that the claim sizes (X1 , X2 , ...) are identically distributed random variables and the number
of claims sizes are mutually independent.
Assume that the number of claims (N) follows a Poisson distribution.
What is the likelihood that there will be at least one claim next year?
A. Less than 5%
D. At least 95%
E. Cannot be determined from the above information.
5.91 (5A, 5/99, Q.38) (2.5 points) For a particular line of business, the aggregate claim amount S
follows a compound Poisson distribution. The aggregate number of claims N, has a mean of 350.
The dollar amount of each individual claim, xi, i = 1,..., N is uniformly distributed over the interval from
0 to 1000. Assume that N and the Xi are mutually independent random variables.
Using the Normal Approximation, calculate the probability that S > 180,000.
5.92 (5A, 11/99, Q.38) (3 points) Use the following information:

• An insurer has a portfolio of 14,000 insured properties as shown below.
Property Value Number of Properties
$20,000 3000
$35,000 4000
$60,000 5000
$75,000 2000
• The annual probability of a claim for each of the insured properties is 0.04.
• Each property is independent of the others.
• Assume only total losses are possible.
• In order to reduce risk, the insurer buys reinsurance with a retention of $30,000 on each
property. (For example, in the case of a loss of $75,000, the insurer would pay
$30,000, while the reinsurer would pay $45,000.)
• The annual reinsurance premium is set at 125% of the expected excess annual claims.
Calculate the probability that the total cost (retained claims plus reinsurance cost) of insuring the
properties will exceed $28,650,000 in any year.
5.93 (Course 3 Sample Exam, Q.20) You are given:

• An insuredʼs claim severity distribution is described by an exponential distribution:
F(x) = 1 - e-x/1000.
• The insuredʼs number of claims is described by a negative binomial distribution with:
β = 2 and r = 2.
• A 500 per claim deductible is in effect.
Calculate the standard deviation of the aggregate losses in excess of the deductible.
A. Less than 2000
E. At least 5000
5.94 (Course 3 Sample Exam, Q.25)

For aggregate losses S = X1 + X2 + . . . + XN, you are given:
• N has a Poisson distribution with mean 500.
• X1 , X2 , ... have mean 100 and variance 100.
• N, X1 , X2 , ... are mutually independent.
You are also given:
• For a portfolio of insurance policies, the loss ratio is the ratio of the aggregate losses
to aggregate premiums collected.
• The premium collected is 1.1 times the expected aggregate losses.
Using the normal approximation to the compound Poisson distribution, calculate the probability that
the loss ratio exceeds 0.95.
5.95 (IOA 101, 4/00, Q.2) (2.25 points) Insurance policies providing car insurance are such that the
sizes of claims are normally distributed with mean 1,870 and standard deviation 610. In one month
50 claims are made. Assuming that claims are independent, calculate the probability that the total of
the claim sizes is more than 100,000.

Standard
Mean Deviation
Number of Claims 8 3
Individual Losses 10,000 3,937
Using the normal approximation, determine the probability that the aggregate loss will exceed
150% of the expected loss.
(A) Φ(1.25) (B) Φ(1.5) (C) 1 - Φ(1.25) (D) 1 - Φ(1.5) (E) 1.5 Φ(1)
5.97 (3, 5/00, Q.19) (2.5 points) An insurance company sold 300 fire insurance policies as follows:
Number of Policy Probability of
Policies Maximum Claim Per Policy
100 400 0.05
200 300 0.06
You are given:
(i) The claim amount for each policy is uniformly distributed between 0 and the policy maximum.
(ii) The probability of more than one claim per policy is 0.
(iii) Claim occurrences are independent.
Calculate the variance of the aggregate claims.
(A) 150,000 (B) 300,000 (C) 450,000 (D) 600,000 (E) 750,000
5.98 (3, 11/00, Q.8 & 2009 Sample Q.113) (2.5 points)
The number of claims, N, made on an insurance portfolio follows the following distribution:
n Pr(N=n)
0 0.7
2 0.2
3 0.1
If a claim occurs, the benefit is 0 or 10 with probability 0.8 and 0.2, respectively.
The number of claims and the benefit for each claim are independent. Calculate the probability that
aggregate benefits will exceed expected benefits by more than 2 standard deviations.
(A) 0.02 (B) 0.05 (C) 0.07 (D) 0.09 (E) 0.12
5.99 (3, 11/00, Q.32 & 2009 Sample Q.118) (2.5 points) For an individual over 65:
(i) The number of pharmacy claims is a Poisson random variable with mean 25.
(ii) The amount of each pharmacy claim is uniformly distributed between 5 and 95.
(iii) The amounts of the claims and the number of claims are mutually independent.
Determine the probability that aggregate claims for this individual will exceed 2000 using the normal
approximation.
(A) 1 - Φ(1.33) (B) 1 - Φ(1.66) (C) 1 - Φ(2.33) (D) 1 - Φ(2.66) (E) 1 - Φ(3.33)
5.100 (3, 5/01, Q.29 & 2009 Sample Q.110) (2.5 points)
You are the producer of a television quiz show that gives cash prizes.
The number of prizes, N, and prize amounts, X, have the following distributions:
n Pr(N = n) x Pr (X=x)
1 0.8 0 0.2
2 0.2 100 0.7
1000 0.1
Your budget for prizes equals the expected prizes plus the standard deviation of prizes.
Calculate your budget.
(A) 306 (B) 316 (C) 416 (D) 510 (E) 518
5.101 (3, 11/01, Q.7 & 2009 Sample Q.98) (2.5 points) You own a fancy light bulb factory.
Your workforce is a bit clumsy – they keep dropping boxes of light bulbs. The boxes have varying
numbers of light bulbs in them, and when dropped, the entire box is destroyed.
You are given:
Expected number of boxes dropped per month: 50
Variance of the number of boxes dropped per month: 100
Expected value per box: 200
Variance of the value per box: 400
You pay your employees a bonus if the value of light bulbs destroyed in a month is less than
8000.
Assuming independence and using the normal approximation, calculate the probability that you will
pay your employees a bonus next month.
(A) 0.16 (B) 0.19 (C) 0.23 (D) 0.27 (E) 0.31
5.102 (3, 11/02, Q.6 & 2009 Sample Q.91) (2.5 points) The number of auto vandalism claims
reported per month at Sunny Daze Insurance Company (SDIC) has mean 110 and variance 750.
Individual losses have mean 1101 and standard deviation 70.
The number of claims and the amounts of individual losses are independent.
Using the normal approximation, calculate the probability that SDICʼs aggregate auto
vandalism losses reported for a month will be less than 100,000.
(A) 0.24 (B) 0.31 (C) 0.36 (D) 0.39 (E) 0.49
5.103 (CAS3, 11/03, Q.24) (2.5 points) Zoom Buy Tire Store, a nationwide chain of retail tire
stores, sells 2,000,000 tires per year of various sizes and models.
Zoom Buy offers the following road hazard warranty:
"If a tire sold by us is irreparably damaged in the first year after purchase, we'll replace it free,
regardless of the cause."
The average annual cost of honoring this warranty is $10,000,000, with a standard deviation of
$40,000.
Individual claim counts follow a binomial distribution, and the average cost to replace a tire is $100.
All tires are equally likely to fail in the first year, and tire failures are independent.
Calculate the standard deviation of the replacement cost per tire.
A. Less than $60
E. At least $75
5.104 (CAS3, 11/03, Q.25) (2.5 points) Daily claim counts are modeled by the negative binomial
distribution with mean 8 and variance 15. Severities have mean 100 and variance 40,000.
Severities are independent of each other and of the number of claims.
Let σ be the standard deviation of a day's aggregate losses.
On a certain day, 13 claims occurred, but you have no knowledge of their severities.
Let σʼ be the standard deviation of that day's aggregate losses, given that 13 claims occurred.
Calculate σ/σʼ - 1.
A. Less than -7.5%
B. At least -7.5%, but less than 0
C. 0
D. More than 0, but less than 7.5%
E. At least 7.5%
Computer maintenance costs for a department are modeled as follows:
(i) The distribution of the number of maintenance calls each machine will need in a year
is Poisson with mean 3.
(ii) The cost for a maintenance call has mean 80 and standard deviation 200.
(iii) The number of maintenance calls and the costs of the maintenance calls are all
mutually independent.
The department must buy a maintenance contract to cover repairs if there is at least a 10%
probability that aggregate maintenance costs in a given year will exceed 120% of the expected
costs. Using the normal approximation for the distribution of the aggregate maintenance costs,
calculate the minimum number of computers needed to avoid purchasing a maintenance contract.
(A) 80 (B) 90 (C) 100 (D) 110 (E) 120
5.106 (SOA3, 11/03, Q.33 & 2009 Sample Q.88) (2.5 points) A towing company provides all
towing services to members of the City Automobile Club. You are given:
(i) Towing Distance Towing Cost Frequency
0-9.99 miles 80 50%
10-29.99 miles 100 40%
30+ miles 160 10%
(ii) The automobile owner must pay 10% of the cost and the remainder is paid by the City
Automobile Club.
(iii) The number of towings has a Poisson distribution with mean of 1000 per year.
(iv) The number of towings and the costs of individual towings are all mutually independent.
Using the normal approximation for the distribution of aggregate towing costs, calculate the
probability that the City Automobile Club pays more than 90,000 in any given year.
(A) 3% (B) 10% (C) 50% (D) 90% (E) 97%
5.107 (CAS3, 5/04, Q.19) (2.5 points) A company has a machine that occasionally breaks down.
An insurer offers a warranty for this machine. The number of breakdowns and their costs are
independent.
The number of breakdowns each year is given by the following distribution:
# of breakdowns Probability
0 50%
1 20%
2 20%
3 10%
The cost of each breakdown is given by the following distribution:
Cost Probability
1,000 50%
2,000 10%
3,000 10%
5,000 30%
To reduce costs, the insurer imposes a per claim deductible of 1,000.
Compute the standard deviation of the insurer's losses for this year.
A. 1,359 B. 2,280 C. 2,919 D. 3,092 E. 3,434
5.108 (CAS3, 5/04, Q.22) (2.5 points) An actuary determines that claim counts follow a negative
binomial distribution with unknown β and r. It is also determined that individual claim amounts are
independent and identically distributed with mean 700 and variance 1,300.
Aggregate losses have mean 48,000 and variance 80 million.
Calculate the values for β and r.
A. β = 1.20, r = 57.19
B. β = 1.38, r = 49.75
C. β = 2.38, r = 28.83
D. β = 1,663.81, r = 0.04
E. β = 1,664.81, r = 0.04
5.109 (CAS3, 5/04, Q.38) (2.5 points)

You are asked to price a Workers' Compensation policy for a large employer.
The employer wants to buy a policy from your company with an aggregate limit of 150% of total
expected loss. You know the distribution for aggregate claims is Lognormal.
You are also provided with the following:
Mean Standard Deviation
Number of claims 50 12
Amount of individual loss 4,500 3,000
Calculate the probability that the aggregate loss will exceed the aggregate limit.
A. Less than 3.5%
E. At least 6.5%
5.110 (CAS3, 5/04, Q.39) (2.5 points) PQR Re provides reinsurance to Telecom Insurance
Company. PQR agrees to pay Telecom for all losses resulting from "events", subject to a $500
per event deductible.
For providing this coverage, PQR receives a premium of $250.
Use a Poisson distribution with mean equal to 0.15 for the frequency of events.
Event severity is from the following distribution:
Loss Probability
250 0.10
500 0.25
750 0.30
1,000 0.25
1,250 0.05
1,500 0.05
• i = 0%
Using the normal approximation to PQR's annual aggregate losses on this contract, what is the
probability that PQR will payout more than it receives?
A. Less than 12%
E. 15% or more
5.111 (CAS3, 11/04, Q.31) (2.5 points)

The mean annual number of claims is 103 for a group of 10,000 insureds.
The individual losses have an observed mean and standard deviation of 6,382 and 1,781,
respectively. The standard deviation of the aggregate claims is 22,874.
Calculate the standard deviation for the annual number of claims.
A. 1.47 B. 2.17 C.4.72 D. 21.73 E. 47.23
5.112 (CAS3, 11/04, Q.32) (2.5 points)

An insurance policy provides full coverage for the aggregate losses of the Widget Factory.
The number of claims for the Widget Factory follows a negative binomial distribution with mean 25
and coefficient of variation 1.2. The severity distribution is given by a lognormal distribution with
mean 10,000 and coefficient of variation 3.
To control losses, the insurer proposes that the Widget Factory pay 20% of the cost of each loss.
Calculate the reduction in the 95th percentile of the normal approximation of the insurer's loss.
A. Less than 5%
E. At least 35%
5.113 (SOA3, 11/04, Q.15 & 2009 Sample Q.125) (2.5 points) Two types of insurance claims
are made to an insurance company. For each type, the number of claims follows a Poisson
distribution and the amount of each claim is uniformly distributed as follows:
Type of Claim Poisson Parameter λ for Number of Claims Range of Each Claim Amount
I 12 (0, 1)
II 4 (0, 5)
The numbers of claims of the two types are independent and the claim amounts and claim
numbers are independent.
Calculate the normal approximation to the probability that the total of claim amounts exceeds 18.
(A) 0.37 (B) 0.39 (C) 0.41 (D) 0.43 (E) 0.45
5.114 (CAS3, 5/05, Q.8) (2.5 points) An insurance company increases the per claim deductible of
all automobile policies from $300 to $500.
The mean payment and standard deviation of claim severity are shown below.
Deductible Mean Payment Standard Deviation
$300 1,000 256
$500 1,500 678
The claims frequency is Poisson distributed both before and after the change of deductible.
The probability of no claim increases by 30%, and the probability of having exactly one claim
decreases by 10%.
Calculate the percentage increase in the variance of the aggregate claims.
A. Less than 30%
E. 90% or more
5.115 (CAS3, 5/05, Q.9) (2.5 points) Annual losses for the New Widget Factory can be modeled
using a Poisson frequency model with mean of 100 and an exponential severity model with mean of
$10,000. An insurance company agrees to provide coverage for that portion of any individual loss
that exceeds $25,000.
Calculate the standard deviation of the insurer's annual aggregate claim payments.
E. $39,000 or more
5.116 (CAS3, 5/05, Q.40) (2.5 points)

An insurance company has two independent portfolios.
In Portfolio A, claims occur with a Poisson frequency of 2 per week and severities are distributed as
a Pareto with mean 1,000 and standard deviation 2,000.
In Portfolio B, claims occur with a Poisson frequency of 1 per week and severities are distributed as
a log-normal with mean 2,000 and standard deviation 4,000.
Determine the standard deviation of the combined losses for the next week.
A. Less than 5,500
E. 5,800 or more
For a collective risk model the number of losses has a Poisson distribution with λ = 20.
The common distribution of the individual losses has the following characteristics:
(i) E[X] = 70
(ii) E[X ∧ 30] = 25
(iii) Pr(X > 30) = 0.75
(iv) E[X2 | X > 30] = 9000
An insurance covers aggregate losses subject to an ordinary deductible of 30 per loss.
Calculate the variance of the aggregate payments of the insurance.
(A) 54,000 (B) 67,500 (C) 81,000 (D) 94,500 (E) 108,000
Note: This past exam question has been rewritten.
The repair costs for boats in a marina have the following characteristics:
Boat Number of Probability that Mean of repair Variance of repair
type boats repair is needed cost given a repair cost given a repair
Power boats 100 0.3 300 10,000
Sailboats 300 0.1 1000 400,000
Luxury yachts 50 0.6 5000 2,000,000
At most one repair is required per boat each year.
The marina budgets an amount, Y, equal to the aggregate mean repair costs plus the standard
deviation of the aggregate repair costs.
Calculate Y.
(A) 200,000 (B) 210,000 (C) 220,000 (D) 230,000 (E) 240,000
5.119 (SOA M, 5/05, Q.40 & 2009 Sample Q.171) (2.5 points) For aggregate losses, S:
(i) The number of losses has a negative binomial distribution with mean 3 and variance 3.6.
(ii) The common distribution of the independent individual loss amounts is uniform
from 0 to 20.
Calculate the 95th percentile of the distribution of S as approximated by the normal distribution.
(A) 61 (B) 63 (C) 65 (D) 67 (E) 69
5.120 (CAS3, 11/05, Q.30) (2.5 points) On January 1, 2005, Dreamland Insurance sold 10,000
insurance policies that pay $100 for each day 2005 that a policyholder is in the hospital.
The following assumptions were used in pricing the policies:
• The probability that a given policyholder will be hospitalized during the year is 0.05.
No policyholder will be hospitalized more than one time during the year.
• If a policyholder is hospitalized, the number of days spent in the hospital follows a
lognormal distribution with µ = 1.039 and σ = 0.833.
Using the normal approximation, calculate the premium per policy such that there is a 90%
probability that total premiums will exceed total losses.
A. Less than 21.20
E. At least 22.10
5.121 (CAS3, 11/05, Q.34) (2.5 points) Claim frequency follows a Poisson process with rate of
10 per year. Claim severity is exponentially distributed with mean 2,000.
The method of moments is used to estimate the parameters of a lognormal distribution for the
aggregate losses. Using the lognormal approximation, calculate the probability that annual
aggregate losses exceed 105% of expected annual losses.
A. Less than 34.5%
E. At least 37.5%
5.122 (SOA M, 11/05, Q.34 & 2009 Sample Q.210) (2.5 points) Each life within a group
medical expense policy has loss amounts which follow a compound Poisson process with λ = 0.16.
Given a loss, the probability that it is for Disease 1 is 1/16.
Loss amount distributions have the following parameters:
Mean per loss Standard Deviation per loss
Disease 1 5 50
Other diseases 10 20
Premiums for a group of 100 independent lives are set at a level such that the probability
(using the normal approximation to the distribution for aggregate losses) that aggregate
losses for the group will exceed aggregate premiums for the group is 0.24.
A vaccine which will eliminate Disease 1 and costs 0.15 per person has been discovered.
Define:
A = the aggregate premium assuming that no one obtains the vaccine, and
B = the aggregate premium assuming that everyone obtains the vaccine and the cost of the
vaccine is a covered loss.
Calculate A/B.
(A) 0.94 (B) 0.97 (C) 1.00 (D) 1.03 (E) 1.06
(i) The number of losses per year has a Poisson distribution with λ = 10.
(ii) Loss amounts are uniformly distributed on (0, 10).
(iii) Loss amounts and the number of losses are mutually independent.
(iv) There is an ordinary deductible of 4 per loss.
Calculate the variance of aggregate payments in a year.
(A) 36 (B) 48 (C) 72 (D) 96 (E) 120
5.124 (2 points) In the previous question, SOA M, 11/05, Q. 38,

calculate the variance of the amount paid by the insurance company for one claim,
5.125 (1 point) In SOA M, 11/05, Q. 38, calculate the variance of YP.
5.126 (SOA M, 11/05, Q.40) (2.5 points) Lucky Tom deposits the coins he finds on the way to
work according to a Poisson process with a mean of 22 deposits per month.
5% of the time, Tom deposits coins worth a total of 10.
The amounts deposited are independent, and are independent of the number of deposits.
Calculate the variance in the total of the monthly deposits.
(A) 180 (B) 210 (C) 240 (D) 270 (E) 300
5.127 (CAS3, 11/06, Q.29) (2.5 points)

Frequency of losses follows a binomial distribution with parameters m = 1,000 and q = 0.3.
Severity follows a Pareto distribution with parameters α = 3 and θ = 500.
Calculate the standard deviation of the aggregate losses.
A. Less than 7,000
E. At least 8,500
Aggregate losses are modeled as follows:
(i) The number of losses has a Poisson distribution with λ = 3.
(ii) The amount of each loss has a Burr (Burr Type XII, Singh-Maddala) distribution with α = 3, θ = 2,
and γ = 1.
(iii) The number of losses and the amounts of the losses are mutually independent.
Calculate the variance of aggregate losses.
(A) 12 (B) 14 (C) 16 (D) 18 (E) 20
For an aggregate loss distribution S:
(i) The number of claims has a negative binomial distribution with r = 16 and β = 6.
(ii) The claim amounts are uniformly distributed on the interval (0, 8).
(iii) The number of claims and claim amounts are mutually independent.
Using the normal approximation for aggregate losses, calculate the premium such that the
probability that aggregate losses will exceed the premium is 5%.

(i) Aggregate losses follow a compound model.
(ii) The claim count random variable has mean 100 and standard deviation 25.
(iii) The single-loss random variable has mean 20,000 and standard deviation 5000.
Determine the normal approximation to the probability that aggregate claims exceed 150% of
expected costs.
(A) 0.023 (B) 0.056 (C) 0.079 (D) 0.092 (E) 0.159
5.1. B. σAgg2 = µFreq σSev2 + µSev2 σFreq2 = (13)(200000) + (3002 )(37) = 5,930,000.
5.2. C. Frequency is Bernoulli with q = 2/3, with mean = 2/3 and variance = (2/3)(1/3) = 2/9.
Mean severity = 7.1, variance of severity = 72.1 - 7.12 = 21.69.
Thus σAgg2 = µFreq σSev2 + µSev2 σFreq2 = (2/3)(21.69) + (7.12 )(2/9) = 25.66.
For the severity the mean and the variance are computed as follows:
Size Square of
Probability of claim Size of Claim
20% 2 4
50% 5 25
30% 14 196
Mean 7.1 72.1
5.3. A. Since the frequency and severity are independent, the variance of the aggregate losses =
(mean frequency)(variance of severity) + (mean severity)2 (variance of frequency)
= 0.25 {(variance of severity) + (mean severity)2 } = 0.25 (2nd moment of the severity)
5000
= (0.25 / 5000) ∫0 x2 dx = (0.25 / 5000) (5000)3 / 3 = 2,083,333.
5.4. E. The average aggregate loss is 106. The second moment of the aggregate losses is
16940. Therefore, the variance = 16940 - 1062 = 5704.
1 claim @ 50 60.0% 50 2500
1 claim @ 200 20.0% 200 40000
2 claims @ 50 each 7.2% 100 10000
2 claims: 1 @ 50 & 1 @ 150 9.6% 200 40000
2 claims @ 150 each 3.2% 300 90000
Overall 100.0% 106 16940
For example, the chance of 2 claims with one of size 50 and one of size 150 is the chance of having
two claims times the chance given two claims that one will be 50 and the other 150:
(.2){(2)(0.6)(0.4)} = 9.6%. In that case the aggregate loss is 50 + 150 = 200.
One takes the weighted average over all the possibilities.
Comment: Note that the frequency and severity are not independent.
5.5. A. Think of a green ball as a claim and a red ball as ending the process.
The number of green balls we pick prior to getting a red ball is the number of claims.
Severity and frequency are independent.
Frequency is Geometric with: β = (1 - 0.7) / 0.7 = 0.4286.
The mean frequency is β = 0.4286, variance is β(1+β) = 0.6122.
Given we have a claim, in other words a green ball, the mean severity is:
{(20%)(50) + (10%)(200)} / (20% + 10%) = 100.
The variance of the severity = (2/3)(50-100)2 + (1/3) (200-100)2 = 5000.
σAgg2 = µFreq σSev2 + µSev2 σFreq2 = (0.4286)(5000) + (1002 ) (0.6122) = 8265.
Comment: The chance of 0 claims is 0.7 = 1/(1+β), the chance of 1 claim is (0.3)(0.7),
the chance of 2 claims is (0.32 )(0.7), etc.
5.6. A. Use for each type the formula: σAgg2 = µFreq σSev2 + µSev2 σFreq2 .
For example for Type I: Bernoulli claim frequency with q = 1/3. Thus, µfreq = 1/3, σfreq2 = (1/3)(2/3).
µsev = 130, σsev2 = (70%)(302 ) + (30%)(702 ) = 2100.
Type I variance of aggregate = (1/3)(2100) + (2/9)(1302 ) = 4456.

A Priori Process
Type Chance of Mean Variance Mean Variance Variance
Risk Risk Freq. of Freq. Severity Severity of Agg.
I 0.333 0.333 0.222 130 2100 4456
II 0.333 0.500 0.250 150 2500 6875
III 0.333 0.667 0.222 170 2100 7822
Variance for the portfolio is: (100)(4456) + (100)(6875) + (100)(7822) = 1,915,300.
5.7. D. Since the frequency and severity are independent, the variance of the aggregate losses=
(mean frequency)(variance of severity) + (mean severity)2 (variance of frequency)
= 5 {(variance of severity) + (mean severity)2 } = 5 (2nd moment of the severity).
Second moment of the severity is:
∞
∫1 x2 (3.5 x- 4.5 ) dx = 2.333.

Therefore variance of aggregate losses = (5)(2.333) = 11.67.
Comment: The Severity is a Single Parameter Pareto Distribution.
∞
5.8. B. The mean severity is: ∫1 x (3.5 x- 4.5 ) dx = 1.4.
Thus the mean aggregate loss is (5)(1.4) = 7. From the solution to the prior question, the variance of
the aggregate losses is: 11.667. Thus the standard deviation of the aggregate losses is
11.667 = 3.416. To apply the Normal Approximation we subtract the mean and divide by the
standard deviation. The probability that the total losses will exceed 11 is approximately:
1 - Φ[(11 - 7)/ 3.416] = 1 - Φ(1.17) = 12.1%.
5.9. A. Mean = exp(µ + σ2/2) = 7. 2nd moment = exp(2µ + 2σ2) = 11.67 + 72 = 60.67.
Dividing the 2nd equation by the square of the 1st:
exp(2µ + 2σ2) / exp(2µ + σ2) = 60.67 / 72 . ⇒ exp(σ2) = 1.238. ⇒ σ = ln(1.238) = 0.4621.
µ = ln(7) - σ2/2 = 1.839. 1 - Φ((ln(11) - 1.839)/.4621) = 1 - Φ(1.21) = 11.31%.

Comment: Below shown as dots is the aggregate distribution approximated via simulation of
10,000 years, the Normal Approximation shown as the dotted line, and the LogNormal
Approximation shown as the solid line:
0.15
0.125
0.1
0.075
0.05
0.025
5 10 15 20
Here is a similar graph of the righthand tail:
0.01
0.008
0.006
0.004
0.002
16 18 20 22 24
As shown above, the Normal Distribution (dashed) has a lighter righthand tail than the LogNormal
Distribution (solid), with the aggregate distribution (dots) somewhere in between.
For example, S(20) = 0.6% for the LogNormal, while S(20) = 0.007% for the Normal.
For the simulation, S(20) = 0.205%, less than the LogNormal, but more than the Normal.
5.10. D. The severity is a Single Parameter Pareto with α = 3.5 and θ = 1.

It has second moment of: (3.5)(12 )/(3.5-2) = 2.333, and third moment of: (3.5)(13 )/(3.5-3) = 7.
(mean frequency) (third moment of the severity) = (5)(7) = 35.
Variance of the aggregate losses is: λ(2nd moment of severity) = (5)(2.333) = 11.67.
Therefore, skewness = 35/(11.67)1.5 = 0.878.
Alternately, skewness of a compound Poisson =
(third moment of the severity) / { λ (2nd moment of severity)1.5} = 7/{ 5 (2.3331.5)} = 0.878.
5.11. B. Since the frequency and severity are independent, and frequency is Poisson with mean 5,
the process variance of the aggregate losses = (5)(2nd moment of the severity).
The severity distribution is Single Parameter Pareto. F(x) = 1 - x-3.5, prior to the effects of the
maximum covered loss. The 2nd moment of the severity after the maximum covered loss is:
5 x=5
∫ x2 (3.5 x - 4.5 ) dx + (52 )S(5) = -(3.5 / 1.5) x - 1.5 ] + (25)( 5-3.5) = 2.125 + 0.089 = 2.214.
1 x =1
Therefore, the variance of the aggregate losses = (5)(2.214) = 11.07.

Comment: For a Single Parameter Pareto Distribution, E[(X ∧ x)2] = αθ2/(α - 2) - 2θ2/{(α - 2)xα−2} =
(3.5)(12 ) / (3.5 - 2) - (2)(12 ) / {(3.5 - 2)53.5-2} = 2.333 - 0.119 = 2.214.
5.12. D. σAgg2 = µFreq σSev2 + µSev2 σFreq2 = µ(θ2) + (θ)2 µ = 2µθ2.
5.13. C. The third moment of the Exponential severity is 6θ3 . The Third Central Moment of a
Compound Poisson Distribution is: (mean frequency) (third moment of the severity) = µ 6θ3 .
From the solution to the previous question, the variance of the aggregate losses is 2µθ2 .
Therefore, the skewness of the aggregate losses is: µ6θ3 / {2µθ2 }1.5 = 3/ 2µ .
5.14. E. For a Poisson frequency with severity and frequency independent, the process variance
of the aggregate losses = µFreq (2nd moment of the severity) = (3)(200) = 600.
5.15. E. The mean aggregate loss for N automobiles is: N(0.03)(3000/2) = 45N.
Second moment of the uniform distribution from 0 to 3000 is:
variance + square of the mean = 30002 /12 + 15002 = 3,000,000.
Variance of aggregate loss for N automobiles is: N(0.03)(3,000,000) = 90,000N.
Prob[aggregate ≤ 160% of expected] = Φ[(0.6)45N/ 90,000N ] = Φ[0.09/ N ].
We want this probability to be at least 95% ⇒ 0.09/ N ≥ 1.645 ⇒ N ≥ 334.1.

5.16. C. This is a compound Poisson. In units of bases, the mean severity is:
(1)(0.22) + (2)(0.04) + (3)(0.01) + (4)(0.05) = 0.530.
(This is Donʼs expected slugging percentage.)
The second moment of the severity is: (1)(0.22) +(4)(0.04) + (9)(0.01) + (16)(0.05) = 1.27.
Thus the variance of the aggregate losses is (600)(1.27) = 762.
The mean of the aggregate losses is: (600)(.530) = 318 bases. The chance that Don will have at
most $700,000 in incentives, is the chance that he has no more than 350 total bases:
Φ((350.5-318)/ 762 ) = Φ(1.177) = 88.0%.
5.17. E. For the Binomial frequency: mean = mq = 3.2, variance = mq(1-q) = 1.92.
For the Pareto severity: mean = θ / (α-1) = 333.333, second moment = 2θ2 / {(α-1)(α-2)} =
333,333, variance = 333,333 - 333.3332 = 222,222.
Since the frequency and severity are independent:
σAgg2 = µFreq σSev2 + µSev2 σFreq2 = (3.2)(222,222) + (333.3332 )(1.92) = 924,443.
5.18. D. The 2nd moment of a LogNormal Distribution is:

exp(2m + 2σ2) = exp[2(7) + 2(0.52 )] = exp(14.5) = 1,982,759.
Since the frequency is Poisson and the frequency and severity are independent:
σA2 = (mean frequency)(2nd moment of the severity) = (3)(1,982,759) = 5,948,278.
5.19. B. For the Negative Binomial frequency: mean = rβ = 6, variance = rβ(1+β) = 18.
For the Gamma severity: mean = αθ = 600, variance = αθ2 = 120,000.
Since the frequency and severity are independent:
σAgg2 = µFreq σSev2 + µSev2 σFreq2 = (6)(120,000) + (6002 )(18) = 7,200,000.
5.20. C. The variances of independent risks add.

(55)(924,443) + (35)(5,948,278) + (10)(7,200,000) = 331 million.
5.21. D. For the Negative Binomial, mean = rβ = 6, variance = rβ(1+β) = 18,

skewness = (1 + 2β) / (1+ β)(rβ) = 5/ 18 = 1.1785.
For the Gamma severity: mean = αθ = 600, variance = αθ2 = 120,000,
skewness = 2 / α = 1.1547.
From a previous solution, σA2 = 7,200,000. Since the frequency and severity are independent:
γAgg = {µFreq σX3 γX + 3 σFfreq2µXσX2 + σFreq3γFreqµX3} / σAgg3 =
{(6)(120,0001.5)(1.1547) + (3)(18)(600)(120,000) + (181.5)(1.1785)(6003 )}/7,200,0001.5 =

1.222.
Comment: Well beyond what you should be asked on your exam!
The skewness is a dimensionless quantity, which does not depend on the scale. Therefore, we
would have gotten the same answer for the skewness if we had set the scale parameter of the
Gamma, θ = 1, including in the calculation of σA2 .
5.22. B. S(1000) = e-1000/5000 = 0.8187. The frequency of non-zero payments is Poisson with
mean: (2.4)(0.8187) = 1.965. The severity distribution truncated and shifted at 1000 is also an
exponential with mean 5000. The mean aggregate losses excess of the deductible is
(1.965)(5000) = 9825.
5.23. E. The frequency of non-zero payments is Poisson with λ = 1.965. The severity of non-zero
payments is an Exponential distribution with θ = 5000, with second moment 2(50002 ).
Thus the variance of the aggregate losses excess of the deductible is: (1.965)(2)(50002 ).
The standard deviation is: 5000 3.93 = 9912.
5.24. D. For the Exponential: E[X ∧ x] = θ(1-e-x/θ).

E[X ∧ 10,000] = (5000)(1 - e-10000/5000) = 4323.
Thus the mean aggregate losses are: (2.4)(4323) = 10,376.
5.25. D. For the Exponential: E[(X ∧ x)2 ] = 2θ2Γ[3;x/θ] + x2 e-x/θ. E[(X ∧ 10,000)2 ] =
2(50002 )Γ[3; 2] + (100 million)(e-2) = (50 million)(0.3233) + 13.52 million = 29.7 million.
Thus the variance of the aggregate losses is: (2.4)(29.7 million) = 71.3 million,
for a standard deviation of 8443.
Comment: Using Theorem A.1 in Loss Models: Γ[3 ; 2] = 1 - e-2{1 + 2 + 22 /2 } = 0.3233.
5.26. C. S(1000) = e-1000/5000 = 0.8187. The frequency of non-zero payments is Poisson with
mean (2.4)(0.8187) = 1.965. The severity distribution truncated and shifted at 1000 is also an
Exponential with mean 5000. The maximum covered loss reduces the maximum payment to:
10,000 - 1,000 = 9,000. For the Exponential: E[X ∧ x] = θ(1-e-x/θ). Thus, the average non-zero
payment is: E[X ∧ 9,000] = (5000)(1 - e-9000/5000) = 4174. Alternately, the average non-zero
payment is: (E[X ∧ 10,000] - E[X ∧ 1,000])/S(1000) = (4323 - 906)/0.8187 = 4174.
Thus the mean aggregate losses are: (1.965)(4174) = 8202.
5.27. C. The frequency of non-zero payments is Poisson with λ = 1.965. The severity of
non-zero payments is an Exponential distribution with θ = 5000, censored at
10,000 - 1000 = 9000, with second moment E[(X ∧ 9,000)2 ] =
2(50002 )Γ[3; 9000/5000] + (90002 )e-9000/5000 = (50 million)(0.2694) + 13.39 million =
26.86 million. Thus the variance of the aggregate losses is: (1.965)(26.86 million) =
52.78 million, for a standard deviation of 7265.
Comment: Using Theorem A.1 in Loss Models: Γ[3 ; 1.8] = 1 - e-1.8{1 + 1.8 + 1.82 /2 } = 0.2694.
5.28. C. For each interval [a,b], the first moment is (a+b)/2.

Lower Upper Number First Contribution
Endpoint Endpoint of Losses Moment to the Mean
0 1 60 0.50 30.00
1 3 30 2.00 60.00
3 5 20 4.00 80.00
5 10 10 7.50 75.00
245
Mean = ((60)(0.5) + (30)(2) + (20)(4) + (10)(7.5) + 12 + 15 + 17 + 20 + 30)/125 = 2.71.
5.29. D. For each interval [a,b], the second moment is: (b3 - a3 )/(3(b-a)).
Lower Upper Number Second Contribution
Endpoint Endpoint of Losses Moment to 2nd Moment
0 1 60 0.33 20.00
1 3 30 4.33 130.00
3 5 20 16.33 326.67
5 10 10 58.33 583.33
1,060
We add to these contributions, those of each of the large losses; 2nd moment =
{(60)(0.33) + (30)(4.33) + (20)(16.33) + (10)(58.33) + 122 + 152 + 172 + 202 + 302 }/125 =
24.14. Variance = 24.14 - 2.712 = 16.8.
5.30. B. In the interval 5 to 10, 3/5 of the losses are assumed to be of size greater than 7.
There are (3/5)(10) = 6 such losses of average size (7 + 10)/2 = 8.5.
Thus they contribute (6)(8.5 - 7) = 9 to the layer excess of 7.
The 5 large losses contribute: 5 + 8 + 10 + 13 + 23 = 59. e(7) = (9 + 59)/(6 + 5) = 6.2.
5.31. A. Mean aggregate loss is: (40)(2.71) = 108.4.

Variance of aggregate loss is: (40)(2nd moment of severity) = (40)(24.14) = 965.5.
CV of aggregate loss is: 965.5 / 108.4 = 0.29.
5.32. E. In the interval 5 to 10, there are 10 loses of average size 7.5.
Thus they contribute (10)(7.5 - 5) = 25 to the layer from 5 to 15.
The 5 individual large losses contribute: (12 - 5) + (15 - 5) + 10 + 10 + 10 = 47.
The payment per loss is: (25 + 47)/125 = 0.576.
For 40 losses the reinsurer expects to pay: (40)(0.576) = 23.0.
5.33. A. The contributions to this layer from the losses in interval [5, 10], are uniform on
[0, 5]; the second moment is: (53 - 03 )/(3(5-0)) = 8.333. The second moment of the ceded losses
is: {(10)(8.333) + (12 - 5)2 + (15 - 5)2 + 102 + 102 + 102 }/125 = 4.259.
Variance of aggregate ceded losses = (4.259)(40) = 170.4.
CV of aggregate ceded losses = 170.4 / 23.0 = 0.57.
5.34. E. S(50,000) = {(20000/(20000 + 50000)}3.2 = (2/7)3.2 = 0.01815.

Therefore, the frequency of non-zero payments is:
Negative Binomial with r = 4.1 and β = (2.8)(0.01815) = 0.05082.
The mean frequency is: (4.1)(0.05082) = 0.2084.
The variance of the frequency is: (4.1)(.05082)(1.0508) = 0.2190.
Truncating and shifting from below produces another Pareto; the severity of non-zero payments is
also Pareto with α = 3.2 and θ = 20000 + 50000 = 70000. This Pareto has mean 70000/2.2 =
31,818 and variance (3.2)(700002 )/(1.2 (2.22 )) = 2700 million.
Thus the variance of the aggregate losses excess of the deductible is:
(0.2084)(2700 million) + (0.2190)(318182 ) = 784.3 million. The standard deviation is: 28.0
thousand. The mean the aggregate losses excess of the deductible is: (0.2084)(31818) = 6631.
Thus the chance that the aggregate losses excess of the deductible are greater than 15,000 is
approximately: 1 - Φ[(15,000 - 6631)/ 28,000] = 1 - Φ[0.30] = 38.2%.
5.35. B. We are mixing Poisson frequencies via a Gamma, therefore frequency for the portfolio is a
Negative Binomial with r = α = 5 and β = θ = 0.4, per policy,
with mean: (5)(0.4) = 2, and variance: (5)(0.4)(1.4) = 2.8.
The mean loss per policy is: (2)(20) = 40.
The variance of the loss per policy is: (2)(300) + (202 )(2.8) = 1720.
For 200 independent policies, Mean Aggregate Loss = (200)(40) = 8000.
⇒ 110% of mean aggregate loss is 8800.
Variance of Aggregate Loss = (200)(1720) = 344,000.
Prob(Aggregate Loss > 1.1mean) = Prob(Aggregate Loss > 8800) ≅
1 - Φ[(8800 - 8000)/ 344,000 ] = 1 - Φ(1.36) = 8.7%.
5.36. C. For N independent policies, Mean Aggregate Loss = 40N, and

Variance of Aggregate Loss = 1720N.
Prob(Aggregate Loss > 1.1mean) ≅ 1 - Φ[(0.1)40N/ 1720N )] = 1 - Φ(0.09645 N ).
We want this probability to be at most 1% ⇒ 0.09645 N ≥ 2.326 ⇒ N ≥ 582.

Comment: Similar to SOA3, 11/03 Q.4.
5.37. E. µ freq = 10. σfreq2 = 20. µsev = 1000. σsev2 = 200,000.
Variance of the aggregate: µfreqσsev2 + µsev2 σfreq2 = (10)(200,000) + (10002 )(20) = 22,000,000.
⇒ σ = 4690. Now if we know that there have been 8 claims, then the aggregate is the sum of 8
independent, identically distributed severities. ⇒ Var[Aggregate] = 8 Var[Severity] =
(8)(200,000) = 1,600,000. ⇒ σʼ = 1,600,000 = 1265. σ/σʼ = 4690/1265 = 3.7.

5.38. D. Mean severity is: 5000 + (2000)(0.75) = 6500.

Let X = room charges, Y = other charges, Z = payment. Z = X + 0.75Y.
Var[Z] = Var[X + 0.75Y] = Var[X] + 0.752 Var[Y] + (2)(0.75)Cov[X, Y] =
80002 + (0.752 )(30002 ) + (2)(0.75){(.6)(8000)(3000)} = 90.66 million.
Variance of Aggregate = (Mean Freq.)(Variance of Sev.) + (Mean Severity)2 (Var. of Freq.)
= (.4)(90.66 million) + (65002 )(.36) = 51.47 million.
Standard Deviation of Aggregate = 51.47 million = 7174.
5.39. C. Mean loss is: (4)(1000) = 4000. Variance of loss is: (4)(10002 ) = 4 million.
Mean loss adjustment expense is: (3)(200) = 600.
Variance of loss adjustment expense is: (3)(2002 ) = 0.12 million.
Var[Loss + LAE] = Var[Loss] + Var[LAE] + 2Cov[Loss, LAE] =
4 million + 0.12 million + (2) (0.8) (4 million)(0.12 million) = 5.2285 million.
Variance of Aggregate = (Mean Freq.)(Variance of Sev.) + (Mean Severity)2 (Var. of Freq.)
= (0.6)(5.2285 million) + (46002 )(0.6) = 15.83 million.
Standard Deviation of Aggregate = 15.83 million = 3979.
5.40. D. One has to recognize this as a compound Poisson, with p(x) the severity, and frequency
3n e-3/n!. Frequency is Poisson with λ = 3.
The second moment of the severity is: (0.5)(12 ) + (0.3)(22 ) + (0.2)(32 ) = 3.5.
The variance of aggregate losses is: (3)(3.5) = 10.5.
Comment: Similar to Course 151 Sample Exam #1, Q.10.
5.41. C. Matching the mean of the LogNormal and the aggregate distribution:
exp(µ + 0.5 σ2) = 100.
Matching the second moments: exp(2µ + 2σ2) = 90,000 + 1002 = 100,000.

exp(2µ + 2σ2)/exp(2µ + σ2) = exp(σ2) = 10.
⇒ σ = ln(10) = 1.517. ⇒ µ = ln(100) - σ2/2 = 3.455.

Prob[agg. > 2000] ≅ 1 - F(2000) = 1 - Φ[(ln(2000) - 3.455)/1.517] = 1 - Φ[2.73] = 0.0032.
5.42. C. The severity has: mean = θ/(α-1) = 50, and second moment = 2θ2/{(α-1)(α-2)} = 10,000.
The mean aggregate loss is: (7)(50) = 350.
Since the frequency and severity are independent, and frequency is Poisson, the variance of the
aggregate losses = (mean frequency) (2nd moment of the severity) = (7)(10,000) = 70,000.
For the LogNormal Distribution the mean is exp[µ +.5 σ2], while the second moment is
exp[2µ + 2σ2]. Matching the first 2 moments of the aggregate losses to that of the LogNormal
Distribution: exp[µ +0.5 σ2] = 350 and exp[2µ + 2σ2] = 70000 + 3502 = 192,500. We can solve
by dividing the square of the 1st equation into the 2nd equation: exp[σ2] = 192,500 / 3502 = 1.571.
Thus σ = 0.672 and thus µ = 5.632.
Therefore the probability that the total losses will exceed 1000 is approximately:
1 - Φ[(ln(1000) - 5.632) / 0.672] = 1 - Φ[1.90] = 2.9%.
5.43. Due to the memoryless property of the Exponential, the payments excess of a deductible
follow the same Exponential Distribution as the ground up losses.
Thus the second moment of (non-zero) payments is 2θ2.
The number of (non-zero) payments with a deductible b is Poisson with mean:
λS(b) = λe-b/θ.
Therefore, with deductible b, B = variance of aggregate payments = λe-b/θ2θ2.
With deductible c, C = variance of aggregate payments = λe-c/θ2θ2.
C/B = e(b-c)/θ. Since c > b, this ratio is less than one.

Comment: In the case of an Exponential severity, the variance of aggregate payments decreases
as the deductible increases.
5.44. D. Only the non-zero claims contribute to the aggregate losses!

Due to the memoryless property of the Exponential, the payments excess of a deductible follow
the same Exponential Distribution as the ground up losses.
Thus the second moment of (non-zero) payments is: (2)(4002 ) = 320,000.
The number of (non-zero) payments is Poisson with mean: 3 e-500/400 = 0.85951.
Therefore, variance of aggregate payments = (0.85951)(320,000) = 275,045.
Alternately, for the Exponential Distribution, E[X] = θ = 400, and E[X2 ] = 2θ2 = 320,000.
For the Exponential Distribution, E[X ∧ x] = θ (1 - e-x/θ).
E[X ∧ 500] = 400(1 - e-500/400) = 285.40.

For the Exponential, E[(X ∧ x)n ] = n! θn Γ(n+1; x/θ) + xn e-x/θ.
E[(X ∧ 500)2 ] = (2)4002 Γ(3; 500/400) + 5002 e-500/400.
According to Theorem A.1 in Loss Models, for integral α, the incomplete Gamma function
Γ(α; y) is 1 minus the first α densities of a Poisson Distribution with mean y.
Γ(3; y) = 1 - e-y(1 + y + y2 /2). Γ(3; 1.25) = 1 - e-1.25(1 + 1.25 + 1.252 /2) = 0.13153.
Therefore, E[(X ∧ 500)2 ] = (320,000)(0.13153) + 250,000e-1.25 = 113,716.
The first moment of the layer from 500 to ∞ is: E[X] - E[X ∧ 400] = 400 - 285.40 = 114.60.
Second moment of the layer from 500 to ∞ is: E[X2 ] - E[(X ∧ 500)2 ] - (2)(500)(E[X] - E[X ∧ 500]) =
320,000 - 113,716 - (1000)(114.60) = 91,684. The number of losses is Poisson with mean 3.
Thus the variance of the aggregate payments excess of the deductible is: (3)(91,684) = 275,052.
Alternately, one can work directly with the integrals, using integration by parts.
∞
The second moment of the layer from 500 to ∞ is: ∫500 (x - 500)2 e- x / 400 / 400 dx =
∞ ∞ ∞
∫500 x2 e- x / 400 / 400 dx - 2.5 500∫ x e- x / 400 dx + 625 500∫ e- x / 400 dx

x= ∞
= -x2 e- x / 400 - 800 x e- x / 400 - 320,000 e- x / 400 ]
x = 500
x =∞
+ 1000 x e- x / 400 + 400,000 e- x / 400 ] + 320,000 e-1.25 =
x = 500
e-1.25 {250,000 + 400,000 + 320,000 - 500,000 - 400,000 + 250,000} = 91,682.

The variance of the aggregate payments excess of the deductible is: (3)(91,682) = 275,046.
5.45. The payments excess of a deductible follow a Pareto Distribution with parameters α and
θ + d. Thus the second moment of (non-zero) payments is 2(θ + d)2 / {(α - 1)(α - 2)}.
The number of (non-zero) payments with a deductible b is Poisson with mean:
λS(b) = λ{θ/(θ + b)}α.
Therefore, with deductible b, B = λ{θ/(θ + b)}α2(θ + b)2/{(α - 1)(α - 2)}.
With deductible c, C = variance of aggregate payments = λ{θ/(θ + c)}α2(θ + c)2/{(α - 1)(α - 2)}.
C/B = {(θ + b)/(θ + c)}α−2. Since α > 2 and c > b, this ratio is less than one.
Comment: As α approaches 2, the ratio C/B approaches one. For α ≤ 2, the second moment of
the Pareto does not exist, and neither does the variance of aggregate payments.
Here the variance of the aggregate payments decreases as the deductible increases.
In CAS3, 5/05, Q.8, the variance of aggregate payments increases as the deductible increases.
5.46. A. S(42) ≅ 1 - Φ[(42 - 20)/10] = 1 - Φ[2.2] = 1 - 0.9861 = 1.39%.
5.47. E. exp[µ + σ2/2] = 20. exp[2µ + 2σ2] = 100 + 202 = 500. ⇒ exp[σ2] = 500/202 = 1.25.
⇒ σ = 0.4724. ⇒ µ = 2.8842.
S(42) ≅ 1 - Φ[(ln(42) - 2.8842)/.4724] = 1 - Φ[1.81] = 3.51%.
5.48. D. αθ = 20. αθ2 = 100. ⇒ θ = 5. ⇒ α = 4.
S(42) ≅ 1 - Γ[4; 42/5] = 1 - Γ[4; 8.4] = e-8.4(1 + 8.4 + 8.42 /2 + 8.43 /6) = 3.23%.
Comment: An example of the method of moments.
5.49. E. µ = 20. µ3/θ = 100. ⇒ θ = 80.
S(42) ≅ 1 - Φ[(42/20 - 1) 80 / 42 ] - exp[(2)(80)/20]Φ[-(42/20 + 1) 80 / 42 ] =

1 - Φ[1.52] - e8 Φ[-4.278].
Φ[-4.278] = 1 - Φ[4.278] ≅ {exp[-4.2782 /2]}/ 2 π }(1/4.278 - 1/4.2783 + 3/4.2785 - 15/4.2787 ) =
9.423 x 10-9.
S(42) ≅ 1 - 0.9357 - (2981)(9.423 x 10-9) = 0.0643 - 0.0281 = 3.62%.
Comment: An example of the method of moments.
5.50. B. The mean of c times a Poisson is cλ. The variance of c times a Poisson is c2 λ.
cλ = 20. c2 λ = 100. ⇒ c = 5. ⇒ λ = 4. 5N > 42. ⇔ N > 42/5 = 8.2.
S(42) ≅ 1 - e-4(1 + 4 + 42 /2 + 43 /6 + 44 /4! + 45 /5! + 46 /6! + 47 /7! + 48 /8!) = 2.14%.

Comment: Well beyond what you are likely to be asked on your exam! Since Var[cX]/E[cX] =
c2 Var[X]/ (cE[X]) = cVar[X]/E[X], for a c > 1, the Over-dispersed Poisson Distribution has a variance
greater than it mean. See for example “A Primer on the Exponential Family of Distributions”,
by David R. Clark and Charles Thayer, CAS 2004 Discussion Paper Program.
5.51. D. θ / (α - 1) = 20. θ2 / {(α - 1)(α - 2)} = 100 + 202 = 500. ⇒ (α - 1)/(α - 2) = 1.25
⇒ α = 6. ⇒ θ = 100. S(42) ≅ Γ[6; 100/42] = Γ[6; 2.381] =

1 - e-2.381(1 + 2.381 + 2.3812 /2 + 2.3813 /6 + 2.3814 /24 + 2.3815 /120) = 3.45%.
Comment: Beyond what you are likely to be asked on your exam. An example of the method of
moments. Which distribution is used to approximate the Aggregate Distribution can make a very
significant difference!
From lightest to heaviest righthand tail, the approximating distributions are:
Normal, Over-dispersed Poisson, Gamma, Inverse Gaussian, LogNormal, Inverse Gamma.
Here is a table of the inverse of the survival functions for various sizes:
1/S(x)
Distribution 40 50 60 70 80 90 100
Normal 43.96 740.8 31,574 3.5e+6 1.0e+9 7.8e+11 1.5e+15
Over-Disp. Pois. 46.81 352.1 3653 50,171 882,744 1.9e+7 5.2e+8
Gamma 23.60 96.75 436.3 2109 10,736 59,947 312,137
Inv. Gaussian 21.87 68.21 212.4 657.3 2019 6152 18,623
LogNormal 22.60 67.63 192.0 515.9 1315 3194 7423
Inv. Gamma 23.80 60.37 136.9 283.4 544.0 981.9 1683
5.52. B. Mean aggregate per policy: (2)(5) = 10.

Variance of aggregate per policy: λ(2nd moment of severity) = (2)(122 + 52 ) = 338.
For N policies, mean is: 10N, and the variance is: 338N. 115% of the mean is: 11.5N.
Prob[Aggregate > 115% of mean] ≅ 1 - Φ[(11.5N - 10N)/ 338N ] = 1 - Φ[0.08159 N ].
This probability ≤ 2.5%. ⇔ Φ[0.08159 N ] ≥ 97.5%.

Φ[1.960] = 0.975. ⇒ We want 0.08159 N ≥ 1.960. ⇒ N ≥ 577.1.
5.53. D. Due to the memoryless property of the Exponential, the payments excess of a
deductible follow the same Exponential Distribution as the ground up losses.
The size of payments has mean 1700, and variance 17002 = 2.89 million.
For the original Exponential, S(1000) = exp[-1000/1700] = 0.5553.
Thus the number of (non-zero) payments is Negative Binomial with r = 4, and
β = (0.5553)(3) = 1.666.
The number of payments has mean: (4)(1.666) = 6.664, and variance: (4)(1.666)(2.666) = 17.766.
Therefore, the variance of aggregate payments is:
(6.664)(2.89 million) + (17002 )(17.766) = 70.6 million.
Comment: Similar to Course 3 Sample Exam, Q.20.
5.54. B. & 5.55. A. Mean aggregate = (10000)(.03)(12.5) + (15000)(.05)(25) = 22,500.

Policy Type one has a mean severity of 12.5 and a variance of the severity of
(25 - 0)2 /12 = 52.083. Policy Type one has a mean frequency of 0.03 and a variance of the
frequency of (0.03)(0.97) = 0.0291. Thus, a single policy of type one has a variance of aggregate
losses of: (0.03)(52.083) + (12.52 )(0.0291) = 6.109.
Policy Type two has a mean severity of 25 and a variance of the severity of
(50 - 0)2 /12 = 208.333. Policy Type two has a mean frequency of 0.05 and a variance of the
frequency of (0.05)(0.95) = 0.0475. Thus, a single policy of type two has a variance of aggregate
losses of: (0.05)(208.333) + (252 )(0.0475) = 40.104.
Therefore, the variance of the aggregate losses of 10000 independent policies of type one and
15000 policies of type two is: (10000)(6.109) + (15000)(40.104) = 662,650.
Standard Deviation of aggregate losses is: 814.037.
Prob[Aggregate > 24,000] ≅ 1 - Φ[(24,000 - 22,500)/814.037] = 1 - Φ[1.84] = 3.3%.
5.56. C. The aggregate distribution of Property Damage Liability has mean 10λ, and variance
λ(152 + 102 ) = 325λ. Φ[1.282] = 90%. Therefore, P ≅ 10λ + 1.282 325λ = 10λ + 23.11 λ .
The aggregate distribution of Bodily Injury Liability has mean 24λ/3 = 8λ, and variance
(λ/3)(242 + 602 ) = 1392λ. Therefore, B ≅ 8λ + 1.282 1392λ = 8λ + 47.83 λ .

B/P = {8λ + 47.83 λ } / {10λ + 23.11 λ } = 1.061.
⇒ 8 λ + 47.83 = 10.61 λ + 24.52. ⇒ λ = 8.93. ⇒ λ = 79.7.
5.57. D. First inflate all of the aggregate losses to the 2012 level:
(1.046 ) (31,000,000) = 39,224,890.
(1.045 ) (38,000,000) = 46,232,811.
(1.044 ) (36,000,000) = 42,114,908.
(1.043 ) (41,000,000) = 46,119,424.
(1.042 ) (41,000,000) = 44,345,600.
Next we calculate the mean and the second moment of the inflated losses:
Mean = 43.6075 million.
Second Moment = 1908.65 x 1012.
θ
The mean of the aggregate distribution is: λ (first moment of severity) = 3000 .
α−1
The variance of the aggregate distribution is:
2 θ2
λ (second moment of severity) = 3000 .
(α − 1) (α − 2)
Matching the theoretical and empirical moments:
θ
43.6075 million = 3000 . ⇒ θ = 14,539(α - 1).
α−1
2 θ2
1908.65 x 1012 - (43.6075 million)2 = 3000 ⇒ θ2 = 11.73 million (α - 1)(α - 2).
(α − 1) (α − 2)
Dividing the second equation by the square of the first: 1 = 5.549(α - 2)/(α - 1). ⇒ α = 2.220.
⇒ θ = 17,738. ⇒ S(20,000) = {17,738/(17,738 + 20,000)}2.220 = 18.7%.

5.58. C. For example, VARS(S | N = 3) = 300,000 - 160,000 = 140,000.

EN[VARS(S | N)] =
(20%)(0) + (40%)(40,000) + (30%)(87,500) + (10%)(140,000) = 56,250.
N Probability Mean of S Square of Mean Second Moment of Var of S
Given N of S Given N of S Given N Given N
0 20% 0 0 0 0
1 40% 100 10,000 50,000 40,000
2 30% 250 62,500 150,000 87,500
3 10% 400 160,000 300,000 140,000
Mean 155 38,750 56,250
VARN(ES[S | N]) = 38,750 - 1552 = 14,725.
Thus the variance of the aggregate losses is:
EN[VARS(S | N)] + VARN(ES[S | N]) = 56,250 + 14,725 = 70,975.
Comment: We have not assumed that frequency and severity are independent.
The mathematics here is similar to that for the EPV and VHM, used in Buhlmann Credibility.
5.59. A. The Binomial has a mean of: (5)(0.4) = 2, and a variance of: (5)(0.4)(0.6) = 1.2.
The LogNormal distribution has a mean of: exp[6 + 0.32 /2] = 422, a second moment of:
exp[(2)(6) + (2)(0.32 )] = 194,853, and variance of: 194,853 - 4222 = 16,769.
The aggregate losses have a mean of: (2)(422) = 844.
The aggregate losses have a variance of: (2)(16,769) + (4222 )(1.2) = 247,239.
Prob[Aggregate > (1.5)(844)] ≅ 1 - Φ[(.5)(844)/ 247,239 ] = 1 - Φ[0.85] = 19.77%.
5.60. B. The density at zero for the non-modified Negative Binomial is: 1/1.42 = 0.5102.
The mean of the zero-modified Negative Binomial is: (1 - 0.4)(0.8) / (1 - 0.5102) = 0.9800.
The second moment of the zero-modified Negative Binomial is:
(1 - 0.4){(2)(0.4)(1.4) + 0.82 } / (1 - 0.5102) = 2.1560.
Thus the variance of the zero-modified Negative Binomial is: 2.1560 - 0.98002 = 1.1956.
The mean of the Gamma is: (3)(500) = 1500.
The variance of the Gamma is: (3)(5002 ) = 750,000.
Thus the variance of the annual aggregate loss is:
(0.9800)(750,000) + (15002 )(1.1956) = 3,425,100.
5.61. C. The sample mean is 109.167

The sample variance is 54.967.
100 - 109.16
Prob[Aggregate < 100] = Φ[ ] = Φ[-1.24] = 1 - 0.8925 = 10.75%.
54.967
5.62. B. By thinning, each type of claim is Poisson.

For each type, variance of aggregate is: λ (second moment of severity) = λ (mean2 ) (1 + CV2 ).
Variance of Type I: (0.20) (1002 ) (1 + 52 ) = 52,000.
Variance of Type II: (0.10) (2002 ) (1 + 42 ) = 68,000.
Variance of Type III: (0.05) (3002 ) (1 + 32 ) = 45,000.
The variance of the distribution of annual aggregate losses is:
52,000 + 68,000 + 45,000 = 165,000.
Alternately, severity is a mixture, with weights: 20/35, 10/35, and 5/35.
The second moment of the mixture is the mixture of the second moments:
(4/7) (1002 ) (1 + 52 ) + (2/7) (2002 ) (1 + 42 ) + (1/7) (3002 ) (1 + 32 ) = 471,429.
The variance of the distribution of annual aggregate losses is: (0.35)(471,429) = 165,000.
5.63. A. Severity is LogNormal with µ = 6 and σ2 = 0.7.

Mean severity is: exp[6 + 0.7/2] = 572.5.
Second moment of severity is: exp[(2)(6) + (2)(0.7)] = 660,003.
Variance of severity is: 660,003 - 572.52 = 332,247.
Mean frequency is: (40%)(1) + (30%)(2) + (20%)(3) + (10%)(4) = 2.
Second Moment of frequency is: (40%)(12 ) + (30%)(22 ) + (20%)(32 ) + (10%)(42 ) = 5.
Variance of frequency is: 5 - 22 = 1.
The variance of the distribution of annual aggregate losses is:
(2)(332,247) + (572.52 )(1) = 992,250.
5.64. E. X is the discrete frequency, severity is Normal; Y is the aggregate loss.

E[X] = 1.3. E[X2 ] = 2.3. Var[X] = 2.3 - 1.32 = 0.61.
Var[Y] = (1.3) (5) + (32 ) (0.61) = 11.99.
Alternately, this is a mixture:
with probability 20% Y is 0,
with probability 30% Y is Normal with mean 3 and variance 5,
with probability 50% Y is Normal with mean 6 and variance 10.
Thus E[Y] = (0.2)(0) + (0.3)(3) + (0.5)(6) = 3.9.
E[Y2 ] = (0.2)(0) + (0.3)(5 + 32 ) + (0.5)(10 + 62 ) = 27.2.
Var[Y] = 27.2 - 3.92 = 11.99.
5.65. D. The Gamma has a mean of 30,000, and a variance of: (3)(10,0002 ) = 300 million.
The mean of the zero-truncated Binomial is: (4)(0.2) / (1 - 0.84 ) = 1.355.
Thus the mean number of claimants is: (0.1)(1.355) = 0.1355.
Thus the mean annual aggregate loss is: (0.1355)(30,000) = 4065.
The second moment of the non-truncated Binomial is: (4)(0.2)(0.8) + {(4)(0.2)}2 = 1.28.
The second moment of the zero-truncated Binomial is: 1.28 / (1 - 0.84 ) = 2.168.
The annual number of claimants follows a compound Poisson zero-truncated Binomial Distribution,
in other words as if there were a Poisson Frequency and a zero-truncated Binomial severity.
Thus the variance of the number of claimants is: (0.1)(2.168) = 0.2168.
(0.1355)(300 million) + (30,0002 )(0.2168) = 235.77 million.
CV of aggregate loss is: 235.77 million / 4065 = 3.78.
5.66. A. For the portion paid by Spring & Sommers, the mean disability is:
(0.3)(1) + (0.2)(2) + (0.1)(3) + (0.1)(4) + (0.3)(5) = 2.9 weeks.
Second moment is: (0.3)(12 ) + (0.2)(22 ) + (0.1)(32 ) + (0.1)(42 ) + (0.3)(52 ) = 11.1.
Variance is: 11.1 - 2.92 = 2.69.
The number of disabilities from Type 1 are Binomial with m = 1500 and q = 5%.
The mean severity is: (2/3)(600)(2.9) = 1160.
The variance of severity is: (4002 ) (2.69) = 430,400.
For Type 1, the mean aggregate is: (5%)(1500) (1160) = 87,000.
The variance aggregate is: (5%)(1500)(430,400) + (11602 )(1500)(0.05)(0.95) = 128,154,000.
The number of disabilities from Type 2 are Binomial with m = 500 and q = 8%.
The mean severity is: (2/3)(900)(2.9) = 1740.
The variance of severity is: (6002 ) (2.69) = 968,400.
For Type 2, the mean aggregate is: (8%)(500) (1740) = 69,600.
The variance aggregate is: (8%)(500)(968,400) + (17402 )(500)(0.08)(0.92) = 150,151,680.
The total mean aggregate is: 87,000 + 69,600 = 156,600.
The variance of total aggregate is: 128,154,000 + 150,151,680 = 278,305,680.
The coefficient of variation of the distribution of total annual payments is:
278,305,680 / 156,600 = 0.1065.
5.67. D. The mean of the zero-modified Poisson is: (1 - 0.25)(0.1) / (1 - e-0.1) = 0.7881.
The second moment of the zero-modified Poisson is: (1 - 0.25)(0.1 + 0.12 ) / (1 - e-0.1) = 0.8669.
Thus the variance of the zero-modified Poisson is: 0.8669 - 0.78812 = 0.2458.
The mean of the LogNormal is: exp[8 + 0.62 /2] = 3569.
The second moment of the LogNormal is: exp[(2)(8) + (2)(0.62 )] = 18,255,921.
Thus the variance of the LogNormal is: 18,255,921 - 35692 = 5,518,160.
(0.7881)(5,518,160) + (35692 )(0.2458) = 7,479,804.
5.68. D. For Policy Type 1:

mean frequency is: (0.4)(0.1) = 0.04. variance of frequency is: (0.4)(0.1)(1.1) = 0.044.
mean severity is: (3)(50) = 150. variance of severity is: (3)(502 ) = 7500.
Thus, a single policy of type one has a variance of aggregate losses of:
(0.04)(7500) + (1502 )(0.044) = 1290.
For Policy Type 2:
mean frequency is: (0.3)(0.2) = 0.06. variance of frequency is: (0.3)(0.2)(1.2) = 0.072.
mean severity is: (4)(40) = 160. variance of severity is: (4)(402 ) = 6400.
(0.06)(6400) + (1602 )(0.072) = 2272.2.
Therefore, the variance of the aggregate losses of 800 independent policies of type one and 500
policies of type two is: (800)(1290) + (500)(2272.2) = 2,145,600.
5.69. E. Mean aggregate = (800)(0.04)(150) + (500)(0.06)(160) = 9600.

From the previous solution, the variance of the aggregate losses is 2,145,600.
The probability that aggregate losses are between 8000 and 9000 is:
9000 - 9600 8000 - 9600
Φ[ ] - Φ[ ] = Φ[-0.41] - Φ[-1.09] = 34.09% - 13.79% = 20.3%.
2,145,600 2,145,600
5.70. B. The mean frequency is: (0.3)(0) + (0.3)(1) + (0.2)(2) + (0.2)(3) = 1.3.
The second moment of the frequency is: (0.3)(02 ) + (0.3)(12 ) + (0.2)(22 ) + (0.2)(32 ) = 2.9.
The variance of the frequency is: 2.9 - 1.32 = 1.21.
The mean severity is: 60 / (4 - 1) = 20.
The second moment of the severity is: 602 / {(4 - 1)(4 - 2)} = 600.
The variance of the severity is: 600 - 202 = 200.
Thus the mean aggregate is: (1.3)(20) = 26.
The variance of the aggregate is: (1.3)(200) + (202 )(1.21) = 744.
Thus the second moment of the aggregate is: 744 + 262 = 1420.
5.71. D. Frequency is an Exponential-Poisson, with mixed distribution a Geometric with β = 0.146.

For the sum of 12 tankers we get the sum of 12 independent Geometrics:
a Negative Binomial with r = 12 and β = 0.146.
Mean severity is: 20/3 + 40/6 + 60/2 = 43.333.
Second moment of severity is: 202 /3 + 402 /6 + 602 /2 = 2200.
Variance of severity is: 2200 - 43.3332 = 322.25.
Thus the mean aggregate loss for 12 tankers is: (12)(0.146)(43.333) = 75.919.
The variance of the aggregate loss for 12 tankers is:
(12)(0.146)(322.25) + (43.3332 )(12)(0.146)(1.146) = 4334.7.
Prob[Aggregate > 100] = 1 - Φ[(100 - 75.919) / 4334.7 ] = 1 - Φ[0.366] = 35.7%.
Comment: The frequency model was based on “Casualty Rate Prediction for Oil Tankers,” by
Douglas McKenzie, CAS Forum, Summer 1993. McKenzie found that the average frequency
increases with age, but for a given age varies between tankers. The severity distribution is made up
by me, but is based on the fact that for oil tankers about half of the claims are for total losses.
5.72. A. Let N be frequency and X be severity.

For the first model we have:
108 = E[N] E[X] = E[N] ω/2.
3024 = E[X] Var[X] + E[X]2 Var[N] = E[N] ω2 /12 + (ω/2)2 Var[N] = ω2 (E[N]/12 + Var[N]/4).
Letting θ be the mean of the Exponential, for the second model we have:
648 = E[N] E[X] = E[N] θ.
186,624 = E[X] Var[X] + E[X]2 Var[N] = E[N] θ2 + θ2 Var[N] = θ2 (E[N] + Var[N]).
Dividing the two equations for the mean aggregates: 2θ/ω = 648/108 = 6. ⇒ θ = 3ω.
⇒ 186,624 = 9 ω2 (E[N] + Var[N]). ⇒ 20,736 = ω2 (E[N] + Var[N]).

Also, 3024 = ω2 (E[N]/12 + Var[N]/4). ⇒ 12,096 = ω2 (E[N]/3 + Var[N]).
Subtracting these two equations: 8640 = ω2 E[N]2/3. ⇒ 12,960 = ω2 E[N].
However, 108 = E[N] ω/2. ⇒ 216 = E[N] ω.

Dividing these two equations: ω = 12,960 / 216 = 60.
⇒ E[N] = 216 / 60 = 3.6. ⇒ 20,736 = 602 (3.6 + Var[N]). ⇒ Var[N] = 2.16.
Since frequency is a member of the (a, b, 0) class, and E[N] > Var[N], we must have a Binomial.
mq = 3.6, and mq(1-q) = 2.16. ⇒ q = 0.4 and m = 9.
Thus, the probability of two losses is: 36 (0.42 ) (0.67 ) = 16.1%.
⎛ 2500 ⎞4
5.73. C. For the Pareto, S(1000) = = 0.26031.
⎝ 1000 + 2500 ⎠
The number of (non-zero) payments is Poisson with mean: (10)(0.26031) = 2.6031.

The payments excess of a deductible follow a Pareto Distribution with the same alpha,
and new theta equal to the original theta plus the deductible amount.
The payments excess of a deductible follow a Pareto Distribution with α = 4 and θ = 3500.
(2) (35002 )
Thus the second moment of (non-zero) payments is: = 4,083,333.
(4 - 1) (4 - 2)
Therefore, variance of aggregate payments = (2.6031)(4,083,333) = 10,629,325.
5.74. B. E[N] = (0)(0.7) + (1)(0.2) + (2)(0.1) = 0.4.

E[N2 ] = (02 )(0.7) + (12 )(0.2) + (22 )(0.1) = 0.6.
Var[N ] = 0.6 - 0.42 = 0.44.
Var[S] = (0.4)(132 ) + (202 )(0.44) = 243.6.
5.75. D. For frequency independent of severity, the process variance of the aggregate losses is
given by: (Mean Freq.)(Variance of Severity) + (Mean Severity)2 (Variance of Freq.)
= λ(r/a2) + (r/a)2(2λ) = λr(2r + 1) / a2 .
5.76. B. Let X be the claim sizes, then VAR[T] = E[N]VAR[X] + E[X]2VAR[N] =

m(αθ2) + (αθ)2(3m) = m α (3α + 1)θ2.
5.77. E. One has to recognize this as a compound Negative Binomial, with p(x) the severity, and
⎛n + 2⎞
frequency density: ⎜ ⎟ (0.6)3 (0.4)n .
⎝ n ⎠
Frequency is Negative Binomial with r = 3 and β/(1+β) = 0.4, so that β = 2/3.

The mean frequency is: rβ = 2, and the variance of the frequency is: rβ(1+β) = 10/3.
The mean severity is: (0.3)(1) + (0.6)(2) + (0.1)(3) = 1.8.
The second moment of the severity is: (0.3)(12 ) + (0.6)(22 ) + (0.1)(32 ) = 3.6.
Thus the variance of the severity is: 3.6 - 1.82 = 0.36.
The variance of aggregate losses is: (0.36)(2) + (1.82 )(10/3) = 11.52.
Comment: Where p is the density of the severity, f is the density of the frequency, and frequency
∞
and severity are independent, then the density of aggregate losses is: ∑ p * n(x) f(n).
n=0
You should recognize that this is the convolution form of writing an aggregate distribution.
In the frequency density, you have a geometric decline factor of 0.4. So 0.4 looks like β/(1+β) in a
Geometric Distribution. However, we also have the binomial coefficients in front which is one way of
⎛n + 2⎞ ⎛n + 2⎞ r(r +1)...(r + k -1)
writing a Negative Binomial. ⎜ ⎟ = ⎜ ⎟ = (3) ... (n+2) / n! ⇔ .
⎝ n ⎠ ⎝ 2 ⎠ k!
This is the form of the Negative Binomial density in Loss Models, with r = 3.
There are only a few frequency distributions in the Appendix, so when you see something like this,
there are only a few choices to try to match things up.
It is more common for them to just say frequency is Negative Binomial or whatever.
5.78. E. In each case the premium is the mean aggregate loss plus 1.645 standard deviations,
since Φ(1.645) = 0.95.
Thus the relative security loading is 1.645 standard deviations/ mean.
Let A be the fixed amount of the claim, let p be the probability of a claim, and N be the number of
policies. Since we are told that each policy has either zero or one claim, the number of claims for N
policies is Binomial with parameters p and N.
Therefore, the mean aggregate losses is: NpA.
The variance of aggregate losses is: (N(p)(1-p))A2 .
Thus the relative security loading is: (1.645) N(p)(1- p)A2 / (NpA) = 1.645 (1- p) / (Np) .
So the largest relative security loading corresponds to the largest value of (1-p)/(Np).
As shown below this occurs for region E.
Region N p (1-p)/(Np) relative security loading
A 300 0.01 0.330 0.94
B 500 0.02 0.098 0.51
C 600 0.03 0.054 0.38
D 500 0.02 0.098 0.51
E 100 0.01 0.990 1.64
5.79. E. This is a mixed Poisson-Poisson frequency. For a given value of λ, the first moment of a
Poisson frequency is λ. Thus for the mixed frequency, the first moment is E[λ] = 1/p.
For a given value of λ, the second moment of a Poisson frequency is λ + λ2.
Thus for the mixed frequency, the second moment is: E[λ + λ2] = E[λ] + E[ λ2] =
1/p + (second moment of a Poisson with mean 1/p) = 1/p + (1/p + 1/p2 ) = 2/p + 1/p2 .
Thus the variance of the mixed frequency distribution is: 2/p + 1/p2 - 1/p2 = 2/p.
The mean severity is: p + (2)(1-p) = 2 - p.
The second moment of the severity is: p + (4)(1-p) = 4 - 3p.
Thus the variance of the severity is: 4 - 3p - (2-p)2 = p - p2 .
Variance of aggregate losses is:
(variance of severity)(mean frequency) + (mean severity)2 (variance of frequency) =
(p-p2 )(1/p) + (2-p)2 (2/p) = p - 7 + 8/p. Setting this equal to the given 19/2, p - 7 + 8/p = 19/2.
Therefore, 2p2 - 33p + 16 = 0. p = (33 ± 332 - (4)(2)(16) )/4 = (33 ± 31)/4 = 1/2 or 16.
However, in order to have a legitimate severity distribution, we must have 0 ≤ p ≤ 1.
Therefore, p = 1/2.
5.80. B. The mean frequency is β = 1/3. The mean severity is 4. Thus the mean aggregate loss is
(1/3)(4) = 4/3. The second moment of the severity is (9 + 16 + 25)/3 = 50/3.
Thus the variance of the severity is: 50/3 - 42 = 2/3.
The variance of the frequency is: β(1+β) = (1/3)(4/3) = 4/9.
Thus the variance of aggregate losses is: (1/3)(2/3) + (4/9)(42 ) = 66/9.
Thus the premium is: 4/3 + 66/9 = 78/9 = 8.667.
The aggregate losses do not exceed the premiums if: there are 0 claims, there is 1 claim, and
sometimes when there are 2 claims.
The probability of 0 claims is: 1/(1+β) = 0.75. The probability of 1 claim is: β/(1+β)2 = 0.1875.
The probability of 2 claims is: β2/(1+β)3 = 0.046875. If there are two claims, the aggregate losses
are < 8.667 if the claims are of sizes: 3,3; 3,4; 3,5; 4,3; 4,4; 5,3.
This is 6 out of 9 equally likely possibilities, when there are two claims.
Therefore, the probability that the aggregate losses exceed the premiums is:
1 - {0.75 + 0.1875 + (0.046875)(6/9)} = 1 - 0.96875 = 0.03125.
Comment: In spreadsheet form, the probability that the aggregate losses do not exceed the
premiums is calculated as 0.96875:
A B C D
Number of Frequency Probability that Column B
Claims Distribution Aggregate Losses ≤ Premiums times Column C
0 0.75000 1.00000 0.75000
1 0.18750 1.00000 0.18750
2 0.04688 0.66667 0.03125
3 0.01172 0.00000 0.00000
0.96875
5.81. D. G = E[S](1+η) = λE[X](1+η). Var[S] = λE[X2 ].

Var[R] = Var[S/G] = Var[S]/G2 = λE[X2 ]/(λE[X](1+η))2 = E[X2 ]/{λE[X]2 (1+η)2 }.
Comment: The premiums G does not vary, so we can treat G like a constant; G comes out of the
variance as a square. S is a compound Poisson, so its variance is the mean frequency times the
second moment of the severity.
5.82. E. Frequency is Binomial, with mean = (40)(1/8) = 5 and variance = (40)(1/8)(7/8) = 35/8.
The mean severity is µ = 400. The variance of the severity is: µ3/θ = 4003 /8000 = 8000.
Thus the mean aggregate loss is: (5)(400) = 2000 and the variance of aggregate losses is:
(4002 )(35/8) + (5)(8000) = 740,000. Thus the probability that the total dollars of claims for the
portfolio are greater than 2900 is approximately:
1 - Φ[(2900 - 2000)/ 740,000 ] = 1 - Φ[1.05] = 1 - 0.852 = 0.147.
5.83. B. Mean severity is: (0.9)(1) + (0.1)(10) = 1.9.

Second moment of the severity is: (0.9)(12 ) + (0.1)(102 ) = 10.9.
Variance of the severity is: 10.9 - 1.92 = 7.29.
Mean frequency is 0.10. Variance of frequency is: (0.10)(.90) = 0.09.
Mean aggregate loss is: N (0.01)(1.9) = 0.019N.
Variance of aggregate losses is: N {(0.10)(7.29) + (0.09)(1.92 )} = 1.0539N.
A 95% probability corresponds to 1.645 standard deviations greater than the mean,
since Φ(1.645) = 0.95.
Thus, safety loading = 0.2(mean aggregate loss) = 1.645 (standard deviations). Thus,
(0.2)(0.019N) = 1.645 1.0539N . Solving, N = 1.6452 (1.0539)/.0382 = 1975 policies.
Comment: If one knows classical credibility, one can do this problem as follows.
P = 95%, but since one performs only a one-sided test in this case y = 1.645.
k = ±20%. The CV2 of the severity is: 7.29/1.92 = 2.019.
The standard for full credibility is: (y/k)2 (σF2/µF + CVS2 ) = (1.645/.2)2 (.09/.1 + 2.019) =
(67.65)(2.919) = 197.5 claims. This corresponds to 197.5/.1 = 1975 policies.
5.84. D. Mean Freq = 0.01. Variance of freq. = (0.01)(0.99) = 0.0099.

Mean Severity = 5000. Variance of severity = (10000-0)2 /12 = 8,333,333.
Variance of Aggregate Losses = (.01)(8333333) + (.0099)(50002 ) = 330,833.
5.85. The mean of the aggregate losses = (3)(100) = 300.

Since the frequency is Poisson, the variance of aggregate losses =
(mean frequency)(second moment of the severity) = (3)(15000) = 45,000.
Premiums = (300)(1.1) = 330. Mean Loss Ratio = 300/330 = 0.91.
Var(Loss Ratio) = Var(Loss/Premium) = Var(Loss)/3302 = 45,000/3302 = 0.41.
5.86. a. E[S] = (# risks)(mean frequency)(mean severity) = (500)(0.1)(1000) = 50,000.

Var[S] = (# risks){(mean frequency)(var. of sev.) + (mean severity)2 (var. freq.)} =
(500){(0.1)(10002 ) + (10002 )(0.1)(0.9)} = 95,000,000. StdDev{S] = 95,000,000 = 9746.
b. So that there is 95% chance that the premiums are sufficient to pay the resulting claims, the
aggregate premiums = mean + 1.645(StdDev) = 50000 + (1.645)(9746) = 66,033.
Premium per risk = 66033/500 = 132.
5.87. Let the death benefit be b and the probability of death be q.

Then 30 = E[x] = bq and 29100 = Var[x] = q(1-q)b2 .
Thus 29100/302 = (1-q)/q. ⇒ q = 0.03. ⇒ b = 1000.
5.88. Premium = 1.2E[S] = 1.2(5)(0 +10)/2 = 30.

Var[S] = (mean freq)(second moment of severity) = (5)(102 /3) = 166.67.
Var[Loss Ratio] = Var[S/30] = Var[S]/900 = 166.67/900 = 0.185.
Alternately, G = 1.2 E[N]E[X], Var[S] = E[N]Var[X] + E[X]2 Var[N].
Var[S/G] = Var[S]/G2 = {E[N]Var[X] + E[X]2 Var[N]}/ {1.2 E[N]E[X]}2 =
{Var[X]/ (E[N]E[X]2 ) + Var[N]/E[N]2 } / 1.44 = {(100/12)/((5)(25)) +(5)/(25)} / 1.44 = 0.185.
5.89. D. Mean aggregate loss = (3) {(0.4)(1) + (0.2)(2) + (0.4)(3)} = 6.

For a compound Poisson, variance of aggregate losses =
(mean frequency)(second moment of severity) = (3) {(0.4)(12 ) + (0.2)(22 ) + (0.4)(32 )} = 14.4.
Since the severity is discrete, one should use the continuity correction.
Pr[S > 9] ≅ 1 - Φ[(9.5 - 6)/ 14.4 ] = 1 - Φ(0.92) = 1 - 0.8212 = 17.88%.
5.90. A. Since frequency is Poisson, Var[S] = (mean frequency)(second moment of the severity).
30,000,000 = λ (500,000,000 + 50,0002 ). λ = 1/100.
Prob(N ≥ 1) = 1 - Prob(N = 0) = 1 - e−λ = 1 - e-0.01 = 0.995%.
5.91. Mean Aggregate Loss = (350)(500) = 175,000. Since frequency is Poisson,

Variance of Aggregate Loss = (350)(2nd moment of severity) = (350)(10002 /3) =
116.67 million.
Prob (S > 180,000) ≅ 1 - Φ[(180,000 - 175,000)/ 116.67 million] = 1 - Φ[0.46] = 32.3%.
Comment: The second moment of the uniform distribution (a , b) is: (b3 - a3 ) / {3(b-a)}.
5.92. The expected excess annual claims are:

(0)(0.04)(3000) + (5000)(0.04)(4000) + (30000)(0.04)(5000) + (45000)(0.04)(2000) =
10.4 million.
Therefore, the reinsurance cost is: (125%)(10.4 million) = 13 million.
The expected retained annual claims are: (20000)(0.04)(3000) + (30000)(0.04)(4000) +
(30000)(0.04)(5000) + (30000)(0.04)(2000) = 15.6 million.
The variance of the retained annual claims are:
(200002 )(0.04)(0.96)(3000) + (300002 )(0.04)(0.96)(4000) + (300002 )(0.04)(.96)(5000) +
(300002 )(0.04)(.96)(2000) = 4.2624 x 1011.
The total cost (retained claims plus reinsurance cost) of insuring the properties has mean:
15.6 million + 13 million = 28.6 million, and variance 4.2624 x 1011.
Probability that the total cost exceeds $28,650,000
≅ 1 - Φ[(28.65 million - 28.6 million) / 4.2624 x 1011 ] = 1 - Φ[0.08] = 46.8%.
Comment: The insurerʼs cost for reinsurance does not depend on the insurerʼs actual losses in a
year; rather it is fixed and has a variance of zero.
5.93. B. S(500) = e-500/1000 = 0.6065. The frequency distribution of losses of size greater than
500 is also a Negative Binomial Distribution, but with β = (0.6065)(2) = 1.2131 and r = 2.
Therefore, the frequency of non-zero payments has mean: (2)(1.2131) = 2.4262 and
variance: (2)(1.2131)(1+1.2131) = 5.369. When one truncates and shifts an Exponential
Distribution, one gets the same distribution, due to the memoryless property of the Exponential.
Therefore, the severity distribution of payments on losses of size greater than 500 is also an
Exponential Distribution with θ = 1000. The aggregate losses excess of the deductible, which are
the sum of the non-zero payments, have a variance of:
(mean freq.)(var. of sev.) + (mean sev. 2)(var. of freq) = (2.4262)(10002 ) + (10002 )(5.269) =
(7.796)(10002 ). Thus the standard deviation of total payments is: (1000) 7.796 = 2792.
Comment: The mean of the aggregate losses excess of the deductible is:
(2.4262)(1000) = 2426.
5.94. The expected aggregate losses are (500)(100) = 50,000. Thus the premium is:
(1.1)(50,000) = 55,000. If the loss ratio exceeds 0.95 then the aggregate losses exceed
(.95)(55,000) = 52250. The variance of the aggregate losses is: 500(100+1002 ) = 5,050,000.
Thus the chance that the losses exceed 52,250 is about:
1 - Φ[(52,250 - 50,000) / 5,050,000 ] = 1 - Φ(1.00) = 1 - 0.8413 = 0.159.
Comment: For a Compound Poisson Distribution, the variance of the aggregate losses =
(mean frequency)(2nd moment of the severity).
5.95. Mean aggregate is: (50)(1870) = 93,500.

Variance of aggregate is: (50)(6102 ) = 18,605,000.
Prob[Agg > 100,000] ≅ 1 - Φ[(100,000 - 93,500) / 18,605,000 ] = 1 - Φ[1.51] = 0.0655.
Comment: We know how many claims there were, and therefore the variance of frequency is 0.
5.96. C. The mean aggregate losses are: (8)(10,000) = 80,000.

σagg2 = µfreq σsev2 + µsev2 σfreq2 = (8)(39372 ) + (10000)2 (32 ) = 1,023,999,752.
The probability that the aggregate losses will exceed (1.5)(80,000) = 120,000 is approximately:
1 - Φ[(120,000 - 80,000) / 1,023,999,752 ] = 1 - Φ(1.25).
Comment: Short and easy. 1 - Φ(1.25) = 10.6%.
5.97. D. Policy Type one has a mean severity of 200 and a variance of the severity of
(400-0)2 /12 = 13,333.
Policy Type one has a mean frequency of 0.05 and a variance of the frequency of (0.05)(0.95) =
0.0475.
(0.05)(13,333) + (2002 )(0.0475) = 2567.
Policy Type two has a mean severity of 150 and a variance of the severity of (300-0)2 /12 = 7500.
Policy Type two has a mean frequency of 0.06 and a variance of the frequency of (0.06)(0.94) =
0.0564.
Thus, a single policy of type two has a variance of aggregate losses of:
(0.06)(7500) + (1502 )(0.0564) = 1719.
Therefore, the variance of the aggregate losses of 100 independent policies of type one and 200
policies of type two is: (100)(2567) + (200)(1719) = 600,500.
Comment: Frequency is Bernoulli. Severity is uniform.
5.98. E. Mean frequency = (0)(0.7) + (2)(0.2) + (3)(0.1) = 0.7.

Second moment of the frequency = (02 )(0.7) + (22 )(0.2) + (32 )(0.1) = 1.7.
Variance of the frequency = 1.7 - 0.72 = 1.21. Mean severity = (0)(0.8) + (10)(0.2) = 2.
Second moment of the severity = (02 )(0.8) + (102 )(0.2) = 20.
Variance of the severity = 20 - 22 = 16. Mean aggregate loss = (0.7)(2) = 1.4.
Variance of the aggregate losses = (0.7)(16) + (22 )(1.21) = 16.04.
Mean + 2 standard deviations = 1.4 + 2 16.04 = 9.41.
The aggregate benefits are greater than 9.41 if and only if there is at least one non-zero claim.
The probability of no non-zero claims is: 0.7 + (0.2)(0.82 ) + (0.1)(0.83 ) = 0.8792.
Thus the probability of at least one non-zero claim is: 1 - 0.8792 = 0.1208.
Comment: If one were to inappropriately use the Normal Approximation, the probability that
aggregate benefits will exceed expected benefits by more than 2 standard deviations is:
1 - Φ(2) = 1 - 0.9772 = 0.023. The fact that the magic phrase "use the Normal Approximation" did
not appear in this question, might make one think. One usually relies on the Normal Approximation
when the expected number of claims is large. In this case one has very few expected claims.
Therefore, one should not rush to use the Normal Approximation.
5.99. D. For the uniform distribution on [5, 95], E[X] = (5+95)/2 = 50,
Var[X] = (95 - 5)2 /12 = 675. E[X2 ] = 675 + 502 = 3175. Therefore, the aggregate claims have
mean of: (25)(50) = 1250 and variance of: (25)(3175) = 79,375.
Thus, Prob(aggregate claims > 2000) ≅ 1 - Φ[(2000 - 1250) / 79,375 ] = 1 - Φ(2.662).
5.100. E. Mean frequency is: (.8)(1) + (.2)(2) = 1.2.

Variance of the frequency is: (0.8)(12 ) + (0.2)(22 ) - 1.22 = 0.16.
Mean severity is: (0.2)(0) + (0.7)(100) + (0.1)(1000) = 170.
Second moment of the severity is: (0.2)(02 ) + (0.7)(1002 ) + (0.1)(10002 ) = 107,000.
Variance of the severity is: 107000 - 1702 = 78,100. Mean aggregate loss: (1.2)(170) = 204.
Variance of aggregate loss is: (1.2)(78,100) + (1702 )(.16) = 98,344.
mean plus the standard deviation = 204 + 98,344 = 518.
Comment: The frequency is 1 plus a Bernoulli Distribution with q = 0.2.
Therefore, it has mean: 1 + 0.2 = 1.2, and variance: (0.2)(1 - 0.2) = 0.16.
5.101. A. mean aggregate loss = (50)(200) = 10,000.

Variance of aggregate loss = (50)(400) + (2002 )(100) = 4,020,000.
Prob[aggregate < 8000] ≅ Φ[(8000 - 10000) / 4,020,000 ] = Φ[-1.00] = 15.9%.
5.102. A. Mean of aggregate = (110)(1101) = 121,110.

Variance of aggregate = (110)(702 ) + (11012 )(750) = 909,689,750.
Prob[aggregate < 100,000] ≅ Φ[(100000 - 121,110) / 909,689,750 ] =
Φ[-0.70] = 1 - 0.7580 = 0.2420.
5.103. E. The average number of tires repaired per year is: $10,000,000/$100 = 100,000.
There are 2,000,000 tires sold per year, so q = 100,000/2,000,000 = 0.05.
µfreq = mq = (2000000)(0.05) = 100,000. σfreq2 = mq(1-q) = (2000000)(0.05)(0.95) = 95,000.
µsev = 100. Variance of the aggregate is: 400002 = µfreq σsev2 + µsev2 σfreq2.
⇒ 1,600,000,000 = 100,000σsev2 + (1002 )(95,000). ⇒ σsev2 = 6500. ⇒ σsev = 80.62.
5.104. B. µfreq = 8. σfreq2 = 15. µsev = 100. σsev2 = 40,000.
σagg2 = µfreq σsev2 + µsev2 σfreq2 = (8)(40,000) + (1002 )(15) = 470,000. ⇒ σ = 685.57.
Now if we know that there have been 13 claims, then the aggregate is the sum of 13 independent,
identically distributed severities. ⇒ Var[Aggregate] = 13 Var[Severity] =
(13)(40,000) = 520,000. ⇒ σʼ = 520,000 = 721.11. σ/σʼ - 1 = 685.57/721.11 - 1 = -4.93%.

Alternately, if we know that there have been 13 claims, µfreq = 13, σfreq2 = 0,
and the variance of the aggregate is: (13)(40,000) + (1002 )(0) = 520,000. Proceed as before.
5.105. C. Mean aggregate per computer: (3)(80) = 240.

Variance of aggregate per computer: λ(2nd moment of severity) = (3)(2002 + 802 ) = 139,200.
For N computers, mean is: 240N, and the variance is: 139,200N. 120% of the mean is: 288N.
Prob[Aggregate > 120% of mean] ≅ 1 - Φ[(288N - 240N) / 139,200N ] = 1 - Φ[0.12865 N ].
This probability < 10%. ⇔ Φ[0.12865 N ] > 90%..
Φ[1.282] = 0.90. ⇒ We want 0.12865 N > 1.282. ⇒ N > 99.3.
Alternately, for classical credibility, we might want a probability of 90% of being within ± 20%.
However, here we are only interested in avoiding +20%, a one-sided rather than two-sided test.
For 10% probability in one tail, for the Standard Normal Distribution, y = 1.282.
n0 = (1.282/0.2)2 = 41.088. Severity has a CV of: 200/80 = 2.5.
Number of claims needed for full credibility of aggregate losses: (1 + 2.52 )(41.088) = 297.89.
However, the number of computers corresponds to the number of exposures.
Thus we need to divide by the mean frequency of 3: 297.89/3 = 99.3.
5.106. B. Mean severity is: (50%)(80) + (40%)(100) + (10%)(160) = 96.

Second Moment of severity is: (50%)(802 ) + (40%)(1002 ) + (10%)(1602 ) = 9760.
Mean Aggregate is: (1000)(96) = 96,000. Variance of Aggregate is: (1000)(9760) = 9,760,000.
Prob[Club pays > 90,000] = Prob[Aggregate > 100,000] ≅
1 - Φ[(100000 - 96,000) / 9,760,000 ] = 1 - Φ[1.28] = 10.0%.

Comment: One could instead work with the payments, which are 90% of the losses.
5.107. B. Frequency has mean of 0.9, second moment of 1.9, and variance of 1.09.
Severity reduced by the 1000 deductible has mean of:
(50%)(0) + (10%)(1000) + (10%)(2000) + (30%)(4000) = 1500,
second moment of: (50%)(02 ) + (10%)(10002 ) + (10%)(20002 ) + (30%)(40002 ) = 5.3 million,
and variance of: 5.3 million - 15002 = 3.05 million.
σAgg2 = (0.9)(3.05 million) + (15002 )(1.09) = 5,197,500. σAgg = 2280.
Alternately, for the original severity distribution:
E[X] = (50%)(1000) + (10%)(2000) + (10%)(3000) + (30%)(5000) = 2500.
E[X ∧ 1000] = (50%)(1000) + (10%)(1000) + (10%)(1000) + (30%)(1000) = 1000.
E[X2 ] = (50%)(10002 ) + (10%)(20002 ) + (10%)(30002 ) + (30%)(50002 ) = 9.3 million.
E[(X ∧ 1000)2 ] = (50%)(10002 ) + (10%)(10002 ) + (10%)(10002 ) + (30%)(10002 ) = 1 million.
First moment of the layer from 1000 to ∞ is:
E[X] - E[X ∧ 1000] = 2500 - 1000 = 1500.
Second moment of the layer from 1000 to ∞ is:
E[(X ∧ ∞)2 ] - E[(X ∧ 1000)2 ] - (2)(1000){E[X ∧ ∞] - E[X ∧ 1000]} =
E[X2 ] - E[(X ∧ 1000)2 ] - (2000){E[X] - E[X ∧ 1000]} =
9.3 million - 1 million - (2000)(2500 - 1000) = 5.3 million.
Proceed as before.
5.108. B. Frequency has mean rβ and variance rβ(1+β).
Mean Aggregate = rβ700. ⇒ 48000 = rβ700. ⇒ rβ = 68.571.
Variance of Aggregate = rβ1300 + 7002 rβ(1+β) = 491300rβ + 490000rβ2.
⇒ 80,000,000 = 491300rβ + 490000rβ2 = (491300)(68.571) + (490000)(68.571)β.

⇒ β = 1.378. ⇒ r = 49.76.
5.109. B. Mean of Aggregate: (50)(4500) = 225,000.

Variance of Aggregate: (50)(30002 ) + (45002 )(122 ) = 3366 million.
Set Mean of Lognormal equal to that of the aggregate: exp[µ + σ2/2] = 225,000.
Set Second Moment of Lognormal equal to that of the aggregate:
exp[2µ + 2σ2] = 225,0002 + 3366 million = 5399.1 million.
Divide the second equation by the square of the first: exp[σ2] = 1.0665.
⇒ σ = 0.2537. ⇒ µ = 12.292.
S((1.5)(225000)) = S(337500) = 1 - Φ[(ln(337500) - 12.292)/0.2537] = 1 - Φ[1.72] = 4.27%.
5.110. A. After the deductible, the severity has a mean of:

(0.35)(0) + (0.3)(250) + (0.25)(500) + (0.05)(750) + (0.05)(1000) = 287.5.
After the deductible, the severity has a second moment of
(0.35)(02 ) + (0.3)(2502 ) + (0.25)(5002 ) + (0.05)(7502 ) + (0.05)(10002 ) = 159,375.
Average Aggregate: (0.15)(287.5) = 43.125.
Variance of Aggregate: (0.15)(159,375) = 23,906.
Prob[Aggregate > 250] ≅ 1 - Φ[(250 - 43.125)/ 23,906 ] = 1 - Φ[1.34] = 9.01%.
Alternately, the severity distribution is in units of 250.
Therefore, one could apply the continuity correction; 375 is halfway between 250 and 500.
Prob[Aggregate > 250] ≅ 1 - Φ[(375 - 43.125)/ 23,906 ] = 1 - Φ[2.15] = 1.58%.
5.111. B. σAgg2 = µFreq σX2 + µX2 σFreq2 . 228742 = (103)(17812 ) + 62822 σF2 . ⇒ σF = 2.197.
Comment: Sometimes the answer given by the exam committee, in this case 2.17, will not match
the exact answer, 2.20 in this case. Very annoying! In this case, you might have checked your work
once, but then put down B as the best choice and move on.
5.112. C. With a coinsurance factor of 80%, each payment is each loss times 0.8.
When we multiply a variable by a constant, the mean and standard deviation are each multiplied by
that constant. The 95th percentile of the normal approximation is:
mean + (1.645)(standard deviation). Thus it is also multiplied by 0.8. The reduction is 20%.
Alternately, before the reduction, µAgg = (25)(10,000) = 250,000, and
σAgg2 = µFreq σX2 + µX2 σFreq2 = (25){(3)(10000)}2 + (10000)2 {(1.2)(25)}2 = 112,500 million.
mean + (1.645)(standard deviation) = 250,000 + (1.645)(335,410) = 801,750.
Paying 80% of each loss, multiplies the severity by 0.8. The mean of the lognormal is multiplied by
0.8 and its coefficient of variation is unaffected. µAgg = (25)(8000) = 200,000.
σAgg2 = µFreq σX2 + µX2 σFreq2 = (25){(3)(8000)}2 + (8000)2 {(1.2)(25)}2 = 72,000 million.
mean + (1.645)(standard deviation) = 200,000 + (1.645)(268,328) = 641,400.
Reduction in the estimated 95th percentile: 1 - 641,400/801,750 = 20%.
5.113. A. For Type I, mean: (12)(1/2) = 6, variance: 12(12 )/3 = 4.

For Type II, mean: (4)(2.5) = 10, variance: 4(52 )/3 = 33.33.
Overall mean = (12)(1/2) + (4)(2.5) = 6 + 10 = 16.
Overall variance = 4 + 33.33 = 37.33.
Prob[aggregate > 18] ≅ 1 - Φ((18 - 16)/ 37.33 ) = 1 - Φ(0.33) = 1 - 0.6293 = 0.3707.
Comment: For a Poisson frequency, variance of aggregate = λ(second moment of severity).
The two types of claims are independent, so their variances add.
5.114. D. Let the mean frequencies be b before and c after.

The probability of no claim increases by 30%. ⇒ 1.3e-b = e-c.
The probability of having one claim decreases by 10%. ⇒ 0.9be-b = ce-c.

Dividing: 0.9/1.3 = c/b.
The expected aggregate before is: (10002 + 2562 )b = 1,065,536b.
The expected aggregate after is: (1,5002 + 6782 )c = 2,709,684c.
The ratio of “after” over “before” is: 2.543c/b = 2.543(0.9/1.3) = 1.760. ⇔ 76.0% increase.
5.115. E. Due to the memoryless property of the Exponential, the payments excess of a
deductible follow the same Exponential Distribution as the ground up losses.
Thus the second moment of (non-zero) payments is: 2(10,0002 ) = 200 million.
The number of (non-zero) payments is Poisson with mean: 100e-25,000/10,000 = 8.2085.
Therefore, variance of aggregate payments = (8.2085)(200 million) = 1641.7 million.
Standard deviation of aggregate payments = 1641.7 million = 40,518.
5.116. A. Var[A] = (2)(10002 + 20002 ) = 10 million.

Var[B] = (1)(20002 + 40002 ) = 20 million.
The variances of two independent portfolios add.
Var[A] + Var[B] = 10 million + 20 million = 30 million.
Standard deviation of the combined losses is: 30 million = 5,477.
∞ ∞
5.117. B. E[X2 | X > 30] =
∫30 x2 f(x) dx / S(30). ⇒ 30∫ x2 f(x) dx = S(30)E[X2 | X > 30] =
(0.75)(9000) = 6750.
∞ ∞ 30
∫30 x f(x) dx = ∫0 x f(x) dx - ∫0 x f(x) dx = E[X] - {E[X ∧ 30] - 30S(30)} = 70 - {25 - (30)(.75)} = 67.5.
∞
∫30 f(x) dx = S(30) = 0.75.

With a deductible of 30 per loss, the second moment of the payment per loss is:
∞ ∞ ∞ ∞
∫30 (x - 30)2 f(x) dx = 30∫ x2 f(x) dx - 60 30∫ x f(x) dx + 900 30∫ f(x) dx
= 6750 - (60)(67.5) + (900)(0.75) = 3375.
Since frequency is Poisson, the variance of the aggregate payments is:
λ(second moment of the payment per loss) = (20)(3375) = 67,500.
Alternately, e(30) = (E[X] - E[X ∧ 30])/S(30) = (70 - 25)/0.75 = 60.
(X - 30)2 = X2 - 60X + 900 = X2 - 60(X - 30) - 900. ⇒
E[(X - 30)2 | X > 30] = E[X2 - 60(X - 30) - 900 | X > 30] =
E[X2 | X > 30] - 60 E[X - 30 | X > 30] - E[900 | X > 30] = 9000 - 60 e(30) - 900 =
9000 - (60)(60) - 900 = 4500.
The number of losses of size greater than 30 is Poisson with mean: (0.75)(20) = 15.
The variance of the aggregate payments is:
(number of nonzero payments)(second moment of nonzero payments) = (15)(4500) = 67,500.
Comment: Difficult.
In the original exam question, “number of losses, X,” should have read “number of losses, N,”
5.118. B. Mean = (100)(0.3)(300) + (300)(0.1)(1000) + (50)(0.6)(5000) = 189,000.

Variance per power boat: (0.3)(10,000) + (0.3)(0.7)(3002 ) = 21,900.
Variance per sailboat: (0.1)(400,000) + (0.1)(0.9)(10002 ) = 130,000.
Variance per luxury yacht: (0.6)(2,000,000) + (0.6)(.4)(50002 ) = 7,200,000.
Total Variance: (100)(21,900) + (300)(130,000) + (50)(7,200,000) = 401,190,000.
Mean + standard deviation = 189,000 + 401,190,000 = 209,030.
Comment: Assume that the repair costs for one boat are independent of those of any other boat.
5.119. C. Mean = (3)(10) = 30. Variance = (3)(202 /12) + (102 )(3.6) = 460.
Φ(1.645) = 0.95. The 95th percentile is approximately: 30 + 1.645 460 = 65.3.
5.120. C. The primary distribution is Binomial with m = 10000 and q = 0.05, with
mean: (10000)(0.05) = 500, and variance: (10000)(0.05)(1 - 0.05) = 475.
The second distribution is LogNormal with mean: exp[1.039 + 0.8332 /2] = 3.9986,
second moment: exp[2(1.039) + 2(0.8332 )] = 32.0013,
and variance: 32.0013 - 3.99862 = 16.012.
The mean number of days is: (500)(3.9986) = 1999.3.
The variance of the number of days is: (500)(16.012) + (3.99862 )(475) = 15,601.
The 90th percentile of the Standard Normal Distribution is 1.282.
Thus the 90th percentile of the aggregate number of days is approximately:
1999.3 + 1.282 15,601 = 2159.43.
This corresponds to losses of: ($100)(2159.43) = $215,943.
Since this was for 10,000 policies, this corresponds to a premium per policy of $21.59.
5.121. D. The mean aggregate is: (10)(2000) = 20,000.

Variance of aggregate = λ(second moment of severity) = (10)(2)(20002 ) = 80,000,000.
Match the first and second moments of a LogNormal to that of the aggregate:
exp[µ + σ2/2] = 20,000.
exp[2µ + 2σ2] = 80,000,000 + 20,0002 = 480,000,000.

exp[2µ + 2σ2]/exp[2µ + σ2] = exp[σ2] = 1.2. ⇒ σ = 0.427. ⇒ µ = 9.812.
105% of the expected annual loss is: (1.05)(20000) = 21,000.
For the approximating LogNormal, S(21,000) = 1 - Φ[(ln(21000) - 9.812)/0.427]
= 1 - Φ[0.33] = 37.07%.
Comment: We have fit a LogNormal to the aggregate losses via the method of moments.
5.122. C. The mixed severity has mean: (1/16)(5) + (15/16)(10) = 9.6875.

The mixed severity has second moment: (1/16)(502 + 52 ) + (15/16)(202 + 102 ) = 626.56.
Thus without the vaccine, for 100 lives the mean of the compound Poisson Process is:
(100)(0.16)(9.6875) = 155.0, and the variance is: (100)(.16)(626.56) = 10025.
Φ[0.71] = 0.7611 ≅ 1 - 0.24. Therefore, we set the aggregate premium for 100 individuals as:
155.0 + (0.71) 10,025 = 226.1.

With the vaccine, the cost for 100 individuals has mean: (100)(0.15) + (100)(0.16)(10)(15/16) =
165, and variance: (100)(.016)(15/16)(202 + 102 ) = 7500.
Therefore, with the vaccine we set the aggregate premium for 100 individuals as:
165.0 + (0.71) 7500 = 226.5.
A/B = 226.1/226.5 = 0.998.
Alternately, one can thin the original process into two independent Poisson Processes,
that for Disease 1 with λ = 0.16/16 = 0.01, and that for other diseases with λ = (0.16)15/16 = 0.15.
The first process has mean: (0.01)(5) = 0.05, and variance: (0.01)(502 + 52 ) = 25.25.
The second process has mean: (0.15)(10) = 1.5, and variance: (0.15)(202 + 102 ) = 75.
Without the vaccine, for 100 lives, the aggregate loss has mean: (100)(0.05 + 1.5) = 155, and
variance: (100)(25.25 + 75) = 10025.
With the vaccine, for 100 lives, the aggregate cost has mean: (100)(0.15 + 1.5) = 165,
and variance: (100)(0 + 75) = 7500. Proceed as before.
Comment: The use of the vaccine increases the mean cost, but decreases the variance.
This could result in either an increase or decrease in aggregate premiums, depending on the criterion
used to set premiums, as well as the number of insured lives.
For example, assume instead that the premiums for a group of 100 independent lives are set at a
level such that the probability that aggregate losses for the group will exceed aggregate premiums
for the group is 5%. Then A = 155.0 + (1.645) 10,025 = 319.7,
B = 165.0 + (1.645) 7500 = 307.5, and A/B = 1.04.
5.123. C. The non-zero payments are Poisson with mean: (0.6)(10) = 6.

The size of the non-zero payments is uniform from 0 to 6, with mean 3 and variance: 62 /12 = 3.
Variance of aggregate payments is: λ(second moment of severity) = (6)(3 + 32 ) = 72.
Alternately, the payments per loss are a mixture of 40% zero and 60% uniform from 0 to 6.
Therefore, the size of the payments per loss has second moment:
(40%)(0) + (60%)(3 + 32 ) = 7.2.
Variance of aggregate payments is: λ(second moment of severity) = (10)(7.2) = 72.
5.124. The payments per loss are a mixture of 40% zero and 60% uniform from 0 to 6.
The uniform from 0 to 6 has mean 3, variance 62 /12 = 3, and thus second moment: 3 + 32 = 12.
Thus, the size of the payments per loss has second moment: (40%)(0) + (60%)(12) = 7.2.
Mean is: (60%)(3) = 1.8. Variance is: 7.2 - 1.82 = 3.96.
5.125. The size of the non-zero payments is uniform from 0 to 6, with variance: 62 /12 = 3.
5.126. B. 2nd moment of the amount distribution is: (5%)(102 ) + (15%)(52 ) + (80%)(12 ) = 9.55.
Variance of the compound Poisson Process is: (22)(9.55) = 210.1.
5.127. D. The Binomial frequency has mean: (1000)(.3) = 300, and variance: (1000)(.3)(.7) = 210.
The Pareto severity has a mean of: 500/(3 -1) = 250,
second moment: (2)(5002 )/{(3 -1)(3 -2)} = 250,000, and variance: 250,000 - 2502 = 187,500.
Variance of Aggregate is: (300)(187,500) + (2502 )(210) = 69,375,000.
Standard deviation of the aggregate losses is: 69,375,000 = 8329.
5.128. A. E[X2 ] = θ2 Γ[1 + 2/γ] Γ[α - 2/γ]/Γ[α] = 22 Γ[1 + 2/1] Γ[3 - 2/1]/Γ[3] = (4)Γ[3]Γ[1]/Γ[3] = 4.
Since frequency is Poisson, the variance of aggregate is:
λ(second moment of severity) = (3)(4) = 12.
Comment: A Burr Distribution with γ = 1, is a Pareto Distribution.
E[X2 ] = 2θ2 / {(α-1)(α-2)} = 2(22 ) / {(3-1)(3-2)} = 4.
5.129. D. Negative Binomial has mean: rβ = 96, and variance: rβ(1 + β) = 672.
Uniform severity has mean: 4, and variance 82 /12 = 16/3.
Mean of Aggregate is: (96)(4) = 384.
Variance of Aggregate is: (96)(16/3) + (42 )(672) = 11,264.
Φ[1.645] = 95%. Premium is: 384 + 1.645 11,264 = 559.
5.130. A. Mean of the aggregate is: (100)(20,000) = 2,000,000.

Variance of the aggregate is: (100)(50002 ) + (20,000)2 (252 ) = 2.525 × 1011.
Prob[Aggregate > (1.5)(2,000,000)] ≅ 1 - Φ[1,000,000/ 2.525 x 1011 ] = 1 - Φ[2.0] = 2.3%.

2016-C-3, Aggregate Distributions §6 Individual Risk Model, HCM 10/21/15, Page 204
Section 6, Individual Risk Model87
In the individual risk model, the aggregate loss is the sum of the losses from different independent
policies.
Throughout this section we will assume that each policy has at most one claim per year.
Thus frequency is Bernoulli for each policy, with the q parameters varying between policies.
Mean and Variance of the Aggregate:
Often, the claim size will be a fixed amount, bi, for policy i.
In this case, the mean aggregate is the sum of the policy means: ∑ qi bi .
The variance of the aggregate is the sum of the policy variances: ∑ (1 - qi) qi bi2 . 88
Exercise: There are 300 policies insuring 300 independent lives.

For the first 100 policies, the probability of death this year is 2% and the death benefit is 5.
For the second 100 policies, the probability of death this year is 4% and the death benefit is 20.
For the third 100 policies, the probability of death this year is 6% and the death benefit is 10.
Determine the mean and variance of Aggregate Losses.
[Solution: Mean = ∑ ni qi bi = (100)(2%)(5) + (100)(4%)(20) + (100)(6%)(10) = 150.
Variance = ∑ ni (1 - qi) qi bi2 =
(100)(0.98)(0.02)(52 ) + (100)(0.96)(0.04)(202 ) + (100)(0.94)(0.06)(102 ) = 2149.]
If the severity is not constant, assume the severity distribution for the policy i has mean µi and
variance σi2. In that case, the mean aggregate is the sum of the policy means: ∑ qi µ i .
The variance of the aggregate is the sum of the policy variances: ∑ {qi σi 2 + (1- qi) qi µ i2} . 89
For example, assume that for a particular policy the benefit for ordinary death is 5 and for accidental
death is 10, and that 30% of deaths are accidental.90 Then for this policy:
µ = (70%)(5) + (30%)(10) = 6.5, and σ2 = (70%)(52 ) + (30%)(102 ) - 6.52 = 5.25.
87
See Sections 9.8.1 and 9.8.2 in Loss Models.
88
Applying the usual formula for the variance of the aggregate, where the variance of a Bernoulli frequency is
q(1-q) and for a given policy severity is fixed.
89
Applying the usual formula for the variance of the aggregate, where a Bernoulli frequency has mean q and
variance q(1-q).
90
Parametric Approximation:
As discussed before, one could approximate the Aggregate Distribution by a Normal Distribution,
a LogNormal Distribution, or some other parametric distribution.
Exercise: For the situation in the previous exercise, what is the probability that the aggregate loss will
exceed 170? Use the Normal Approximation.
[Solution: Prob[A > 170] ≅ 1 - Φ[(170 - 150)/ 2149 ] = 1 - Φ[0.43] = 33.36%.
Comment: We usually do not use the continuity correction when working with aggregate
distributions. Here since everything is in units of 5, a more accurate approximation would be:
1 - Φ[(172.5 - 150)/ 2149 ] = 1 - Φ[0.49] = 31.21%.]
Direct Calculation of the Aggregate Distribution:
When there are only 3 policies, it is not hard to calculate the aggregate distribution directly.
Policy Probability of Death Death Benefit
1 2% 5
2 4% 20
3 6% 10
For the first policy, there is a 98% chance of an aggregate of 0 and a 2% chance of an aggregate of
5. For the second policy there is a 96% chance of 0 and a 4% chance of 20.
Adding the first two policies, the combined aggregate has:91

(98%)(96%) = 94.08% @0, (2%)(96%) = 1.92% @ 5, (98%)(4%) = 3.92% @ 20, and
(2%)(4%) = 0.08% at 25.
Exercise: Add in the third policy, in order to calculate the aggregate for all three policies.
[Solution: The third policy has a 94% chance of 0 and a 6% chance of 10.
The combined aggregate has: (94.08%)(94%) = 88.4352% @ 0,
(1.92%)(94%) = 1.8048% @ 5, (94.08%)(6%) = 5.6448% @ 10, (1.92%)(6%) = 0.1152% @ 15,
(3.92%)(94%) = 3.6848% @ 20, (0.08%)(94%) = 0.0752% @ 25,
(3.92%)(6%) = 0.2352% @ 30, (0.08%)(6%) = 0.0048% @ 35.]
91
This is the same as convoluting the two aggregate distributions.
One can use this distribution to calculate the mean and variance of aggregate losses:
Aggregate Probability First Moment Second Moment
0 88.4352% 0.0000 0.0000
5 1.8048% 0.0902 0.4512
10 5.6448% 0.5645 5.6448
15 0.1152% 0.0173 0.2592
20 3.6848% 0.7370 14.7392
25 0.0752% 0.0188 0.4700
30 0.2352% 0.0706 2.1168
35 0.0048% 0.0017 0.0588
Sum 1 1.5000 23.7400
The mean is 1.5 and the variance is: 23.74 - 1.52 = 21.49, matching the previous results.
Exercise: What is the probability that the aggregate will exceed 17?
[Solution: Prob[Agg > 17] = 3.6848% + 0.0752% + 0.2352% + 0.0048% = 4%.
Comment: Agg > 17 if and only if there is claim from the second policy.
The exact 4% differs significantly from the result using the Normal Approximation of 0.04%!
The Normal Approximation is poor when the total number of claims expected is only 0.12.]
Problems:
6.1 (2 points) An insurer provides life insurance for the following group of independent lives:
Number Death Probability
of Lives Benefit of Death
2000 1 0.05
3000 5 0.04
4000 10 0.02
Using the Normal Approximation, what is the probability that the aggregate losses exceed 110% of
their mean?
(A) 6.5% (B) 7.0% (C) 7.5% (D) 8.0% (E) 8.5%

An insurer writes two classes of policies with the following distributions of losses per policy:
Class Mean Variance
1 10 20
2 15 40
6.2 (1 point) The insurer will write 10 independent policies, 5 of class one and 5 of class 2.
What is variance of the aggregate losses?
A. 300 B. 320 C. 340 D. 360 E. 380
6.3 (2 points) The insurer will write 10 independent policies.

The number of these policies that are class one is Binomial with m = 10 and q = 0.5.
What is variance of the aggregate losses?
A. 300 B. 320 C. 340 D. 360 E. 380
6.4 (2 points) An insurer provides life insurance for the following group of 4 independent lives:
Life Death Benefit Probability of Death
A 10 0.03
B 25 0.06
C 50 0.01
D 100 0.02
What is the coefficient of variation of the aggregate losses?
A. 2.9 B. 3.1 C. 3.3 D. 3.5 E. 3.7

An insurer provides life insurance for the following group of 400 independent lives:
Number of Lives Death Benefit Probability of Death
100 10 0.03
100 25 0.06
100 50 0.01
100 100 0.02
6.5 (2 points) What is the coefficient of variation of the aggregate losses?

A. 0.3 B. 0.4 C. 0.5 D. 0.6 E. 0.7
6.6 (2 points) Using the Normal Approximation, what is the probability that the aggregate losses
are less than 300?
A. 20% B. 21% C. 22% D. 23% E. 24%
6.7 (2 points) An insurer provides life insurance for the following group of 4 independent lives:
Death Probability
Life Benefit of Death
1 10 0.04
2 10 0.03
3 20 0.02
4 30 0.05
What is the probability that the aggregate losses are more than 40?
A. less than 0.09%
E. at least 0.12%
6.8 (Course 151 Sample Exam #1, Q.22) (2.5 points) An insurer provides one year term life
insurance to a group. The benefit is 100 if death is due to accident and 10 otherwise.
The characteristics of the group are:
Probability Probability
Number of Death of Death
Gender of Lives (all causes) (accidental causes)
Female 100 0.004 0.0004
Male 200 0.006 0.0012
The aggregate claims distribution is approximated using a compound Poisson distribution which
equates the expected number of claims. The premium charged equals the mean plus 10% of the
standard deviation of this compound Poisson distribution.
Determine the relative security loading, (premiums / expected losses) - 1.
(A) 0.10 (B) 0.13 (C) 0.16 (D) 0.19 (E) 0.22

An insurer provides life insurance for the following group of independent lives:
Number of Death Probability
Lives Benefit of Death
100 α 0.02
200 2α 0.03
Let S be the total claims. Let w be the variance of the compound Poisson distribution which
approximates the distribution of S by equating the expected number of claims.
Determine the maximum value of α such that w ≤ 2500.
(A) 6.2 (B) 8.0 (C) 9.8 (D) 11.6 (E) 13. 4

An insurer has the following portfolio of policies:
Benefit Number Probability
Class Amount of Policies of a Claim
1 1 400 0.02
2 10 100 0.02
There is at most one claim per policy.
The insurer reinsures the amount in excess of R (R >1) per policy.
The reinsurer has a reinsurance loading of 0.25.
The insurer wants to minimize the probability, as determined by the normal approximation,
that retained claims plus cost of reinsurance exceeds 34. Determine R.
(A) 1.5 (B) 2.0 (C) 2.5 (D) 3.0 (E) 3.5

An insurance company has a portfolio of two classes of insureds:
Probability Number of Relative Security
Class Benefit of a Claim Insureds Loading
I 5 0.20 N 0.10
II 10 0.10 2N 0.05
Assume all claims are independent. The total of the premiums equals the 95th percentile of the
normal distribution that approximates the distribution of total claims. Determine N.
(A) 1488 (B) 1538 (C) 1588 (D) 1638 (E) 1688
6.12 (5A, 5/94, Q.37) (2 points)

An insurance company has two classes of insureds as follows:
Number of Probability Claim
Class Insureds of 1 Claim Amount
1 200 0.05 2000
2 300 0.01 1500
There is at most one claim per insured and each insured has only one size of claim.
The insurer wishes to collect an amount equal to the 95th percentile of the distribution of total claims,
where each individual's share is to be proportional to the expected claim amount.
Calculate the relative security loading, (premiums / expected losses) - 1,
using the Normal Approximation.
6.13 (5A, 11/97, Q.39) (2 points) A life insurance company issues 1-year term life contracts for
benefit amounts of $100 and $200 to individuals with probabilities of death of 0.03 or 0.09.
The following table gives the number of individuals in each of the four classes.
Class Probability Benefit Number
1 0.03 100 50
2 0.03 200 40
3 0.09 100 60
4 0.09 200 50
The company wants to collect from this population an amount equal to the 95th percentile of the
distribution of total claims, and it wants each individual's share of this amount to be proportional to the
individual's expected claim. Using the Normal Approximation,
calculate the required relative security loading, (premiums / expected losses) - 1.
6.14 (Course 1 Sample Exam, Q.15) (1.9 points) An insurance company issues insurance
contracts to two classes of independent lives, as shown below.
Class Probability of Death Benefit Amount Number in Class
A 0.01 200 500
B 0.05 100 300
The company wants to collect an amount, in total, equal to the 95th percentile of the distribution of
total claims.
The company will collect an amount from each life insured that is proportional to that lifeʼs expected
claim. That is, the amount for life j with expected claim E[Xj] would be kE[Xj].
Using the Normal Approximation, calculate k.
A. 1.30 B. 1.32 C. 1.34 D. 1.36 E. 1.38
6.1. C. Mean = ∑ ni qi bi = (2000)(0.05)(1) + (3000)(0.04)(5) + (4000)(0.02)(10) = 1500.

(2000)(0.95)(0.05)(12 ) + (3000)(0.96)(0.04)(52 ) + (4000)(0.98)(0.02)(102 ) = 10,815.
Prob[A > (1.1)(1500)] ≅ 1 - Φ[150/ 10,815 ] = 1 - Φ[1.44] = 1 - 0.9251 = 7.49%.
6.2. A. (5)(20) + (5)(40) = 300.

Comment: Since we are given the variance of aggregate losses, there is no need to compute the
variance of aggregate losses from the mean frequency, variance of frequency, mean severity, and
variance of severity.
6.3. D. Using analysis of variance, let n be the number of policies of class 1:

Var[Agg] = En [Var[Agg | n]] + Varn [E[Agg | n]]
= En [(n)(20) + (10-n)(40)] + Varn [(n)(10) + (10-n)(15)] =
En [400 - 20n] + Varn [150 - 5n] = 400 - 20En [n] + 25Varn [n] =
400 - (20)(10)(0.5) + (25)(10)(0.5)(1 - 0.5) = 362.5.
6.4. E. Mean = ∑ qi bi = (0.03)(10) + (0.06)(25) + (0.01)(50) + (0.02)(100) = 4.3.

Variance = ∑ (1 - qi) qi bi2 =
(0.97)(0.03)(102 ) + (0.94)(0.06)(252 ) + (0.99)(0.01)(502 ) + (0.98)(0.02)(1002 ) = 258.91.
CV = ( 258.91 )/4.3 = 3.74.
6.5. B. Mean = ∑ ni qi bi = 100{(0.03)(10) + (0.06)(25) + (0.01)(50) + (0.02)(100)} = 430.

100{(0.97)(0.03)(102 ) + (0.94)(0.06)(252 ) + (0.99)(0.01)(502 ) + (0.98)(0.02)(1002 )} = 25,891.
CV = 258.91 / 430 = 0.374.
Comment: With 100 policies of each type, the coefficient of variation is 1/10 of what it would have
been with only one policy of each type as in the previous question.
As the number of policies increases, the mean goes up as N and the variance goes up as N.
Thus, the standard deviation goes up as square root of N. Thus, the coefficient of variation, the
standard deviation divided by the mean, goes down as square root of N.
6.6. B. From the previous solution, Mean = 430 and Variance = 25,891.
Prob[Aggregate < 300] ≅ Φ[(300 - 430)/ 258.91 ] = Φ[-0.81] = 1 - 0.7910 = 20.9%.
6.7. C. Prob[Agg > 40] = Prob[Agg = 50] + Prob[Agg = 60] + Prob[Agg = 70] =
Prob[lives 3 and 4 die] + Prob[lives 1, 2, and 4 die] + Prob[lives 1, 3, and 4 die] +
Prob[lives 2, 3, and 4 die] + Prob[lives 1, 2, 3, and 4 die] =
(0.96)(0.97)(0.02)(0.05) + (0.04)(0.03)(0.98)(0.05) + (0.04)(0.97)(0.02)(0.05) +
(0.96)(0.03)(0.02)(0.05) + (0.03)(0.04)(0.02)(0.05) = 0.00106.
6.8. B. The expected number of fatal accidents is: (100)(0.0004) + (200)(0.0012) = 0.28.
The expected number of deaths (all causes) is: (100)(0.004) + (200)(0.006) = 1.6.
So the expected number of deaths from other than accidents is: 1.6 - 0.28 = 1.32.
Therefore, the mean severity is: {(1.32)(10) + (0.28)(100)} / 1.6 = 25.75.
The second moment of the severity is: ((1.32)(102 ) + (0.28)(1002 )) / 1.6 = 1832.5.
The mean aggregate loss is: (1.6)(25.75) = 41.2.
If this were a compound Poisson Distribution, then the variance would be:
(mean frequency)(second moment of the severity) = (1.6)(1832.5) = 2932.
The standard deviation is: 2932 = 54.14.
Thus the premium is: 41.2 + (10%)(54.14) = 46.614.
The relative security loading is: premium /(expected loss) - 1 = 46.614/41.2 - 1 = 13.1%.
6.9. C. Expected number of claims = (100)(0.02) + (200)(0.03) = 2 + 6 = 8.

Therefore, 2/8 of the claims are expected to have death benefit α, while 6/8 of the claims are
expected to have death benefit of 2α.
The second moment of the severity is: (2/8)(α2) + (6/8)(2α)2 = 13α2 /4.
Thus the variance of aggregate losses is: (8)(13α2/4) = 26α2.
Setting 2500 = 26α2. ⇒ α = 9.8.

6.10. B. Expected number of claims = (400)(.02) + (100)(.02) = 8 + 2 = 10.

For a retention of 10 > R > 1, the expected losses retained are : (8)(1) + (2)(R) = 8 + 2R.
Variance of retained losses = (12 )(400)(.02)(.98) + (R2 )(100)(.02)(.98) = 7.84 + 1.96R2 .
The expected ceded losses are 2(10-R) = 20 - 2R. Thus the cost of reinsurance is:
1.25(20 - 2R) = 25 - 2.5R. Thus expected retained losses plus reinsurance costs are:
8 + 2R + 25 - 2.5 R = 33 - 0.5 R. The variance of the retained losses plus reinsurance costs is that
of the retained losses. Therefore, the probability that the retained losses plus reinsurance costs
exceed 34 is approximately: 1 - Φ[(34 - (33 - 0.5 R))/ 7.84 + 1.96R2 ].
This probability is minimized by maximizing
(34 - (33 - 0.5 R))/ 7.84 + 1.96R2 = (1 + 0.5 R)/ 7.84 + 1.96R2 .
Setting the derivative with respect to R equal to zero:
0 = {(.5) 7.84 + 1.96R2 - (1 + 0.5 R)(1/2)(2)(1.96R)/ 7.84 + 1.96R2 }/ (7.84 + 1.96R2 ).
⇒ (.5)(7.84 + 1.96R2 ) = (1 + 0.5 R)(1.96R). ⇒ 3.92 + 0.98R2 = 1.96R + 0.98R2 . ⇒ R = 2.
Comment: The graph of (1 + 0.5 R)/ 7.84 + 1.96R2 , for 1 < R < 10 is as follows:
0.5
0.48
0.46
0.44
2 3 4 5 6 7 8 9 10
The graph of the approximate probability that the retained losses plus reinsurance costs exceed 34,
1 - Φ[(34 - (33 - 0.5 R))/ 7.84 + 1.96R2 ], for 1 < R < 10 is as follows:
0.335
0.33
0.325
0.32
0.315
0.31
2 3 4 5 6 7 8 9 10
This probability is minimized for R = 2. However, this probability is insensitive to R, so in this
case this may not be a very practical criterion for selecting the “best” R.
6.11. A. The 95th percentile of the Normal Distribution implies that the premium =
expected aggregate loss + 1.645(standard deviations).
The expected aggregate loss is: (5)(.2)N + (10)(.1)(2N) = 3N.
The variance of aggregate losses is: (53 )(.2)(.8)N + (102 )(.1)(.9)(2N) = 22N.
The premiums = (1.1)(5)(.2)N + (1.05)(10)(.1)(2N) = 3.2N.
Setting the premiums equal to expected aggregate loss + 1.645(standard deviations):
3.2N = 3N + (1.645) 22N . Solving, N = 22(1.645/.2)2 = 1488.
6.12. The mean loss is: (.05)(2000)(200) + (.01)(1500)(300) = 24,500.

The variance of aggregate losses is: (0.05)(0.95)(20002 )(200) + (.01)(.99)(15002 )(300)
= 44,682,500. The 95th percentile of aggregate losses is approximately:
24,500 + (1.645) 44,682,500 = 24,500 + 10996.
The relative security loading is: 10996/24500 = 45%.
6.13. The mean loss is: (0.03)(100)(50) + (0.03)(200)(40) + (0.09)(100)(60) + (0.09)(200)(50) =

1830. The variance of aggregate losses is:
(0.03)(0.97)(1002 )(50) + (0.03)(0.97)(2002 )(40) + (0.09)(0.91)(1002 )(60) +
(0.09)(0.91)(2002 )(50) = 274,050.
The 95th percentile of aggregate losses is approximately: 1830 + (1.645) 274,050 = 2691.
The security loading is: 2691 - 1830 = 861.
The relative security loading is: 861/1830 = 47%.
6.14. E. The mean aggregate is: (0.01)(500)(200) + (0.05)(300)(100) = 2,500.

The variance of the aggregate is: (0.01)(0.99)(500)(2002 ) + (0.05)(0.95)(300)(1002 ) = 340,500.
Using the Normal Approximation, the 95th percentile of the aggregate is:
2500 + 1.645 340,500 = 3460.
k = 3460/2500 = 1.384.
2016-C-3, Aggregate Distributions §7 Panjer Algorithm, HCM 10/21/15, Page 216
Section 7, Recursive Method / Panjer Algorithm
As discussed previously, the same mathematics apply to aggregate distributions (independent

frequency and severity) and compound frequency distributions.
While one could calculate the density function by brute force, tools have been developed to make it
easier to work with aggregate distributions when the primary distribution has certain forms. The
Panjer Algorithm, referred to in Loss Models as the “recursive method”, is one such technique.92
The first step is to calculate the density at zero of the aggregate or compound distribution.
Density of Aggregate Distribution at Zero:
For a aggregate distribution one can calculate the probability of zero claims as follows from first
principles. For example, assume frequency follows a Poisson Distribution, with λ = 1.3, and severity
has a 60% chance of being zero.
Exercise: What is the density at 0 of the aggregate distribution?

[Solution: There are a number of ways one can have zero aggregate.
One can either have zero claims or one can have n claims, each with zero severity.
Assuming there were n claims, the chance of each of them having zero severity is 0.6n .
The chance of having zero claims is the density of the Poisson distribution at 0, e-1.3.
Thus the chance of zero aggregate is:
e-1.3 + (1.3)e-1.3 0.6 + (1.32 /2!)e-1.3 0.62 + (1.33 /3!)e-1.3 0.63 + (1.34 /4!)e-1.3 0.64 + ... =
∞
e-1.3 ∑ {(1.3)(.6)} n / n! = e-1.3 exp((1.3) 0.6) = exp[-(1.3)(1 - 0.6)] = e-0.52 = 0.5945.
n=0
∞ n
Comment: I have used the fact that: ∑ xn! = ex.]
0
In this exercise, we have computed that there is a 59.45% chance that there is zero aggregate.
Instead one can use the following formula, the first step of the Panjer algorithm: c(0) = Pp (s(0)),
where c is the compound or aggregate density,
s is the density of the severity or secondary distribution,
and Pp is the probability generating function of the frequency or primary distribution.
92
Loss Models points out that the number of computations increases as n2 , O(n2 ), rather than n3 , O(n3 ), as for
direct calculation using convolutions.
Exercise: Apply this formula to determine the density at zero of the aggregate distribution.
[Solution: For the Poisson, P(z) = exp[λ(z - 1)] = exp[1.3(z - 1)].
c(0) = Pp (s(0)) = Pp (.6) = e1.3(.6-1) = e-0.52 = 0.5945205.]
Derivation of the Formula for the Density of Aggregate Distribution at Zero:
Let the frequency or primary distribution be p, the severity or secondary distribution be s, and let c
be the aggregate or compound distribution.
The probability of zero aggregate is :

c(0) = p(0) + p(1)s(0) + p(2)s(0)2 + p(3)s(0)3 + ...
∞
c(0) = ∑ p(n) s(0)n .
n=0
We note that by the definition of the Probability Generating Function, the righthand side of the
above equation is the Probability Generating of the primary distribution at s(0).93
Therefore, the density of the compound distribution at zero is:94
c(0) = Pp (s(0)) = P.G.F. of primary distribution at (density of secondary distribution at zero).
Formulas for the Panjer Algorithm (recursive method):
Let the frequency or primary distribution be p, the severity or secondary distribution be s, and
c be the aggregate or compound distribution. If the primary distribution p is a member of the
(a,b,0) class95, then one can use the Panjer Algorithm (recursive method) in order to iteratively
compute the compound density:96
x
1
c(0) = Pp (s(0)). c(x) = ∑
1 - a s(0) j=1
(a + jb / x) s(j) c(x - j) .
93
P(z) = E[zn] = Σ p(n) zn.
94
This is the source of the values given in Table D.1 in Appendix D of Loss Models.
95
f(x+1)/f(x) = a + b/(x+1), which holds for the Binomial, Poisson, and Negative Binomial Distributions.
96
Formula 9.21 in Loss Models.
Note that if primary distribution is a member of the (a, b, 1) class, then as discussed in the next section, there is a
modification of this algorithm which applies.
Aggregate Distribution Example:
In order to apply the Panjer Algorithm one must have a discrete severity distribution. Thus either the
original severity distribution must be discrete or one must approximate a continuous severity
distribution with a discrete one.97 We will assume for simplicity that the discrete distribution has
support on the nonnegative integers. If not, we can just change units to make it more convenient.
Exercise: Assume the only possible sizes of loss are 0, $1000, $2000, $3000, etc.
How could one change the scale so that the support is the nonnegative integers?
[Solution: One puts everything in units of thousands of dollars instead of dollars. If f(2000) is the
original density at 2000, then it is equal to s(2), the new density at 2. The new densities still sum to
unity and the aggregate distribution will be in units of $1000.]
Let the frequency distribution be p,98 the discrete severity distribution be s,99 and let c be the
aggregate loss distribution.100
Exercise: Let severity have density: s(0) = 60%, s(1) = 10%, s(2) = 25%, and s(3) = 5%.
Frequency is Poisson with λ = 1.3.
Use the Panjer Algorithm to calculate the density at 3 of the aggregate distribution.
[Solution: For the Poisson a = 0 and b = λ = 1.3.
c(0) = Pp (s(0)) = Pp (0.6) = e1.3(0.6-1) = 0.5945205.
x x
∑ ∑ j s(j) c(x - j) .
1 1.3
c(x) = (a + jb / x) s(j) c(x - j) =
1 - a s(0) x
j=1 j=1
c(1) = (1.3/1) (1) s(1) c(1-1) = (1.3/1) {(1)(0.1)(0.5945205)} = 0.0772877.

c(2) = (1.3/2){(1)s(1)c(1) + (2)s(2)c(0)} = (1.3/2) {(1)(0.1)(0.0772877) +(2)(.25)(0.5945205)}
= 0.1982429.
c(3) = (1.3/3) {(1)(0.1)(0.1982429) + (2)(0.25)(0.0772877) + (3)(0.05)(0.5945205)} =
0.0639800.]
By continuing iteratively in this manner, one could calculate the density for any value.101
The Panjer algorithm reduces the amount of work needed a great deal while providing exact results,
provided one retains enough significant digits in the intermediate calculations.
97
There are a number of ways of performing such an approximation, as discussed in a subsequent section.
98
In general, p is the primary distribution, which for this application of the Panjer algorithm is the frequency
distribution.
99
In general, s is the secondary distribution, which for this application of the Panjer algorithm is the discrete severity
distribution.
100
In general c is the compound distribution, which for this application of the Panjer algorithm is the aggregate
101
In this case, the aggregate distribution out to 10 is: 0.594521, 0.0772877, 0.198243, 0.06398, 0.0380616,
0.0170385, 0.00657186, 0.00276448, 0.000994199, 0.000356408, 0.000123164.
The chance of the aggregate losses being greater than 10 is: 0.0000587055.
Exercise: In the previous exercise, compare the exact probability that the aggregate distribution is
less than or equal to 3 with the Normal Approximation using the continuity correction.
[Solution: Prob[Agg ≤ 3] = 0.5945205 + 0.0772877 + 0.1982429 + 0.0639800 = 0.9340311.
The mean severity is: (60%)(0) + (10%)(1) + (25%)(2) + (5%)(3) = 0.75.
The second moment of severity is: (60%)(02 ) + (10%)(12 ) + (25%)(22 ) + (5%)(32 ) = 1.55.
Mean aggregate is: (1.3)(0.75) = 0.975. Variance of aggregate is: (1.3)(1.55) = 2.015.
Prob[Agg ≤ 3] ≅ Φ[(3.5 - 0.975) / 2.015 ] = Φ[1.78] = 0.9625.
Comment: The Normal Approximation would be better if the expected frequency were larger.
Commonly one does apply the continuity correction when working on aggregate losses.
Here the severity is either 0, 1, 2, or 3, so using the continuity correction makes sense.]
If the severity density is zero at 0, then c(0) = Pp (s(0)) = Pp (0).
Now Pp (0) = lim Pp (z) = lim E[ZN] = lim {p(0) +

z→0 z→0 z→0 ∑ p(n)zn } = p(0).
n=1
Therefore, if s(0) = 0, the probability of zero aggregate losses = c(0) = p(0) = the probability of no
claims. If s(0) > 0, then there is an additional probability of zero aggregate losses, due to the
contribution of situations with claims of size zero.
Thinning the Frequency Distribution:
The Panjer Algorithm directly handles situations in which there is a positive chance of a zero severity.
In contrast, if one tried to apply convolution to such a situation, one would need to calculate a lot of
convolutions, since one can get zero aggregate even if one has many claims.
One can get around this difficulty by thinning the frequency or primary distribution.102
One can apply this thinning technique when the frequency is any member of the (a, b, 0) class.
If the original frequency is Poisson, then the non-zero claims are also Poisson with mean {1 - s(0)} λ.
If the original frequency is Binomial, then the non-zero claims are also Binomial with parameters m
and {1 - s(0)} q. If the original frequency is Negative Binomial, then the non-zero claims are also
Negative Binomial with parameters r and {1 - s(0)} β.
As in the previous example, let frequency be Poisson with λ = 1.3, and the severity have density:
s(0) = 60%, s(1) = 10%, s(2) = 25%, and s(3) = 5%.
Exercise: What is the distribution of the number of claims with non-zero severity?
[Solution: Poisson with λ = (1.3)(40%) = 0.52.]
102
Thinning is discussed in “Mahlerʼs Guide to Frequency Distributions.”
Exercise: If the severity distribution is truncated to remove the zeros, what is the resulting
distribution?
[Solution: s(1) = 10%/40% = 25%, s(2) = 25%/40% = 62.5%, and s(3) = 5%/40% = 12.5%.]
Only the claims with non-zero severity contribute to the aggregate.

Therefore, we can compute the aggregate distribution by using the thinned frequency, Poisson with
λ = (1.3)(40%) = 0.52, and the severity distribution truncated to remove the zero claims.
Exercise: Use convolutions to calculate the aggregate distribution up to 3.

[Solution: p(3) = e-0.52 0.523 /6 = 0.01393. (s*s)[3] = (2)(0.25)(0.625) = 0.3125.
(0.30915)(0.12500) + (0.08038)(0.31250) + (0.01393)(0.01562) = 0.06398.
n 0 1 2 3 Aggregate
Poisson 0.59452 0.30915 0.08038 0.01393 Density
x s*0 s s*s s*s*s
0 1 0.594521
1 0.25000 0.077288
2 0.62500 0.06250 0.198243
3 0.12500 0.31250 0.01562 0.063980
Comment: Matching the result obtained previously using the Panjer Algorithm.]
Thus the probability that the aggregate loss is less than or equal to 3 is:
0.594521 + 0.077288 + 0.198243 + 0.063980 = 0.934032.
One could instead apply general organized reasoning to obtain the same result as follows.
Using Organized Reasoning:103
Frequency is Poisson with λ = 0.52. The discrete severity is: 25% @ 1, 62.5% @ 2, 12.5% @ 3.
If there are zero claims, then aggregate ≤ 3. Probability is e-0.52 = 0.59452.
If there are one claim, then since it of size at most 3, aggregate ≤ 3.
Probability of this is: 0.52 e-0.52 = 0.30915.
If there are two claims, then in order for aggregate ≤ 3 we must have: 1 and 1, 1 and 2, 2 and 1.
The sum of the probability of these combinations of two claims is:
25%2 + (2)(25%)(62.5%) = 0.375.
Thus if there are two claims, probability aggregate ≤ 3 is: (0.375) (0.522 / 2) e-0.52 = 0.03014.
If there are three claims, then in order for aggregate ≤ 3 each claim must be of size 1.
Probability is: (25%3 ) (0.523 / 6) e-0.52 = 0.00022.
Thus, Prob[Agg ≤ 3] = 0.59452 + 0.30915 + 0.03014 + 0.00022 = 0.93403,
matching the previous result.
103
Somewhat similar to what is done in Example 9.9 in Loss Models.
This method of organized reasoning can be helpful if one is asked for the distribution function of the
aggregate at some value and do not need the individual densities.104
Compound Distribution Example:
The mathematics are the same in order to apply the Panjer Algorithm to the compound case.
of passengers dropped off by any other taxicab. Then the aggregate number of passengers
dropped off per minute at the Heartbreak Hotel is a compound Poisson-Binomial distribution, with
parameters: λ = 1.3, q = 0.4, m = 5.
Exercise: Use the Panjer Algorithm to calculate the density at 3 for this example.
[Solution: The densities of the secondary Binomial Distribution are:
j s(j) j s(j)
0 0.07776 3 0.2304
1 0.2592 4 0.0768
2 0.3456 5 0.01024
For the primary Poisson a = 0, b = λ = 1.3, and P(z) = exp[λ(z - 1)] = exp[1.3(z - 1)].
c(0) = Pp (s(0)) = Pp (0.07776) = e1.3(0.07776-1) = 0.301522.
x x
1 1.3
c(x) = ∑
1 - a s(0) j=1
(a + jb / x) s(j) c(x - j) = ∑
x j=1
j s(j) c(x - j) .
c(1) = (1.3/1) (1) s(1) c(1-1) = (1.3/1) {(1)(0.2592)(0.301522)} = 0.101601.

c(2) = (1.3/2) {(1)(0.2592)(0.101601) + (2)(0.3456)(0.301522)} = 0.152586.
c(3) =
(1.3/3) {(1)(0.2592)(0.152586) + (2)(0.3456)(0.101601) + (3)(0.2304)(0.301522)} = 0.137882.]
By continuing iteratively in this manner, one could calculate the density for any value. The Panjer
algorithm reduces the amount of work needed a great deal while providing exact results, provided
one retains enough significant digits in the intermediate calculations.105
104
See 4, 5/07, Q.8 and 3, 11/02, Q.36 (2009 Sample Q.95).
105
Here are the densities for the compound Poisson-Binomial distribution, with parameters λ = 1.3, q = 0.4, m = 5,
calculated using the Panjer Algorithm, from 0 to 20: 0.301522, 0.101601, 0.152586, 0.137882, 0.0988196,
0.070989, 0.0507183, 0.0335563, 0.0211638, 0.0130872, 0.0078559, 0.00456369, 0.00258682,
0.00143589, 0.000779816, 0.000414857, 0.000216723, 0.000111302, 0.0000562232, 0.0000279619,
0.0000137058.
Preliminaries to the Proof of the Panjer Algorithm:
In order to prove the Panjer Algorithm/ Recursive Method, we will use two results for convolutions.
First, s*n(x) = Prob(sum of n losses is x) =
∞ ∞
∑ Prob(first loss is i) Prob(sum of n- 1 losses is x - i) = ∑ s(i) s * ( n - 1) (x - i) .
i=0 i=0
In other words, s*n = s * s*n-1. In this case, since the severity density is assumed to have support
equal to the nonnegative integers, s*n-1(x-i) is 0 for i > x, so the terms for
i > x drop out of the summation:
x
s*n(x) = ∑ s(i) s * ( n - 1) (x - i) .
i=0
Second, assume we have n independent, identically distributed losses, each with distribution s.
Assume we know their sum is x > 0, then by symmetry the conditional expected value of any of
these losses is x/n.
∞
x/n = E[L1 | L1 + L2 + ... + Ln = x] = ∑ i Prob[ L1 = i | L1 + L2 + ... + Ln = x] =
i=0
∞
∑ i Prob[ L1 = i and L1 + L2 + ... + Ln = x] / Prob[L1 + L2 + ... + Ln = x] =
i=0
∞
∑ i Prob[ L1 = i and L2 + ... + Ln = x - i] / s*n(x) =
i=0
∞ x
(1 / s*n(x)) ∑ i s(i) s * (n - 1)(x - i) = {1 / s*n(x)} ∑ i s(i) s * (n - 1) (x - i).106
i=0 i=1
x
Therefore, s*n(x) = (n/x) ∑ i s(i) s * (n - 1) (x - i).
i=1
106
Note that the term for i = 0 drops out. Since for use in the Panjer Algorithm the severity density is assumed to
have support equal to the nonnegative integers, s*n-1(x-i) is 0 for i > x,
so the terms for i > x drop out of the summation.
Exercise: Verify the above relationship for n = 2 and x = 5.

[Solution: s*2 (5) = probability two losses sum to 5 =
s(0)s(5) + s(1)s(4) + s(2)s(3) + s(3)s(2) + s(4)s(1) + s(5)s(0) =
2{s(0)s(5) + s(1)s(4) + s(2)s(3)}.
x
(n/x) ∑ i s(i) s * (n - 1) (x - i) = (2/5){s(1)s(4) + 2s(2)s(3) + 3s(3)s(2) + 4s(4)s(1) + 5s(5)s(0)} =
i=1
2{s(0)s(5) + s(1)s(4) + s(2)s(3)}.]
Proof of the Panjer Algorithm:
Recall that for a member of the (a, b, 0) class of frequency distributions:

f(n+1) / f(n) = a + {b / (n+1)}. Thus, f(n) = f(n-1){a + b /n}, for n > 0.
For the compound distribution to take on a value x > 0, there must be one or more losses.
As discussed previously, one can write the compound distribution in terms of convolutions:
∞ ∞ ∞
c(x) = ∑ f(n) s * n (x) = a ∑ f(n- 1) s * n (x) + b ∑ f(n- 1) s * n (x) / n =
n=1 n=1 n=1
∞ x ∞ x
a ∑ f(n- 1) ∑ s(i) s * ( n - 1) (x - i) + b ∑ f(n- 1) (n / x) ∑ i s(i) s * (n - 1) (x - i) / n =
n=1 i=0 n=1 i=1
x ∞ x ∞
a ∑ s(i) ∑ f(n- 1) s * (n - 1)(x - i) + (b/x) ∑ i s(i) ∑ f(n- 1) s * (n - 1)(x - i) =
i=0 n=1 i=1 n=1
x x
a ∑ s(i) c(x − i) + (b/x) ∑ i s(i) c(x − i) =
i=0 i=1
x
a s(0) c(x) + ∑ (a + bi / x) s(i) c(x - i) .
i=1
Taking the first term to the lefthand side of the equation and solving for c(x):
x
1
1 - a s(0) ∑
c(x) = (a + i b / x) s(i) c(x - i).
i=1
Problems:

• One has a compound Geometric distribution with β = 2.1.
• The discrete severity distribution is as follows:
0 25%
1 35%
2 20%
3 15%
4 5%
7.1 (1 point) What is the mean aggregate loss?

A. less than 3.0
E. at least 4.5
7.2 (2 points) What is the variance of the aggregate losses?

A. less than 13
E. at least 16
7.3 (1 point) What is the probability that the aggregate losses are zero?
A. less than 0.37
E. at least 0.40
7.4 (2 points) What is the probability that the aggregate losses are one?
A. less than 12%
E. at least 15%
7.5 (2 points) What is the probability that the aggregate losses are two?
A. less than 8%
E. at least 11%
7.6 (2 points) What is the probability that the aggregate losses are three?
A. less than 9.2%
E. at least 9.5%
7.7 (2 points) What is the probability that the aggregate losses are four?
A. less than 7.2%
E. at least 7.5%
7.8 (2 points) What is the probability that the aggregate losses are five?
A. less than 4.7%
E. at least 5.0%
7.9 (2 points) What is the probability that the aggregate losses are greater than 5?
A. less than 10%
E. at least 25%
7.10 (2 points) Approximate the distribution of aggregate losses via a LogNormal Distribution, and
estimate the probability that the aggregate losses are greater than 5.
A. less than 10%
E. at least 25%
7.11 (2 points) What is the 70th percentile of the distribution of aggregate losses?
A. 2 B. 3 C. 4 D. 5 E. 6

• Frequency follows a Binomial Distribution with m = 10 and q = 0.3.
• The discrete severity distribution is as follows:
0 20%
1 50%
2 20%
3 10%
7.12 (1 point) What is the probability that the aggregate losses are zero?
A. less than 4%
E. at least 7%
A. less than 13%
E. at least 16%
A. less than 16%
E. at least 19%
A. less than 17%
E. at least 20%
A. less than 14%
E. at least 17%
A. less than 11%
E. at least 14%

The number of snowstorms each winter in Springfield is Negative Binomial with r = 5 and
β = 3. The probability that a given snowstorm will close Springfield Elementary School for at least
one day is 30%, independent of any other snowstorm.
7.18 (2 points) What is the probability that Springfield Elementary School will not be closed due to
snow next winter?
A. 2% B. 3% C. 4% D. 5% E. 6%
7.19 (2 points) What is the probability that Springfield Elementary School will be closed by exactly
one snowstorm next winter?
A. 10% B. 12% C. 14% D. 16% E. 18%
7.20 (2 points) The frequency distribution is a member of the (a, b , 0) class, with
a = 0.75 and b = 3.75. The discrete severity distribution is: 0, 1, 2, 3 or 4 with probabilities of: 15%,
30%, 40%, 10% and 5%, respectively. The probability of the aggregate losses being
6, 7, 8 and 9 are: 0.0695986, 0.0875199, 0.107404, and 0.127617, respectively.
What is the probability of the aggregate losses being 10?
A. less than 14.6%
E. at least 14.9%

The number of hurricanes that form in the Atlantic Ocean each year is Poisson with λ = 11.
The probability that a given such hurricane will hit the continental United States is 15%, independent
of any other hurricane.
7.21 (2 points) What is the probability that no hurricanes hit the continental United States next year?
A. 11% B. 13% C. 15% D. 17% E. 19%
7.22 (2 points) What is the probability that exactly one hurricane will hit the continental United States
next year?
A. 30% B. 32% C. 34% D. 36% E. 38%

One has a compound Geometric-Poisson distribution with parameters β = 1.7 and λ = 3.1.
7.23 (1 point) What is the density function at zero?

A. less than 0.36
E. at least 0.39
7.24 (2 points) What is the density function at one?

A. less than 3.3%
E. at least 3.6%
7.25 (2 points) What is the density function at two?

A. less than 5.4%
E. at least 5.7%
7.26 (2 points) What is the density function at three?

A. less than 6.2%
E. at least 6.5%
7.27 (2 points) What is the density function at four?

A. less than 6.0%
E. at least 6.3%

A. 2 B. 3 C. 4 D. 5 E. 6
7.29 (2 points) The frequency distribution is a member of the (a, b , 0) class,

with a = -0.42857 and b = 4.71429.
The discrete severity distribution is: 0, 1, 2, or 3 with probabilities of:
20%, 50%, 20% and 10% respectively.
The probability of the aggregate losses being 10, 11 and 12 are:
0.00792610, 0.00364884, and 0.00157109, respectively.
What is the probability of the aggregate losses being 13?
A. less than 0.03%
E. at least 0.06%
7.30 (3 points) Frequency is given by a Poisson-Binomial compound frequency distribution, as per

Loss Models, with parameters λ = 1.2, m = 4, and q = 0.1.
(Frequency is Poisson with λ = 1.2, and severity is Binomial with m = 4 and q = 0.1.)
What is the density function at 1?
A. less than 0.20
E. at least 0.23
7.31 (4 points) Assume that S has a compound Poisson distribution with λ = 2 and individual claim
amounts that are 20, 30, and 50 with probabilities of 0.5, 0.3 and 0.2, respectively.
Calculate Prob[S > 75].
A. 30% B. 32% C. 34% D. 36% E. 38%
7.32 (8 points) The number of crises per week faced by the superhero Underdog follows a
Negative Binomial Distribution with r = 0.3 and β = 4. The number of super energy pills he requires
per crisis is distributed as follows: 50% of the time it is 1, 30% of the time it is 2, and 20% of the time
it is 3. What is the minimum number of super energy pills Underdog needs at the beginning of a
week to be 99% certain he will not run out during the week?
Use a computer to help you perform the calculations.
7.33 (5 points) Frequency is Binomial with m = 6 and q = 0.2.

Severity follows a Zero-Truncated Poisson with λ = 3.
Severity and frequency are independent.
(a) (3 points) Determine the exact probability that the aggregate is at most 3.
(b) (2 points) Using the Normal Approximation with continuity correction, estimate the probability
that the aggregate is at most 3.
7.34 (5A, 11/95, Q.36) (2 points) Suppose that the aggregate loss S has a compound Poisson
distribution with expected number of claims equal to 3 and the following claim amount distribution:
individual claim amounts can be 1, 2 or 3 with probabilities of 0.6, 0.3, and 0.1, respectively.
Calculate the probability that S = 2.
7.35 (5A, 5/98, Q.36) (2.5 points) Assume that S has a compound Poisson distribution with
λ = 0.6 and individual claim amounts that are 1, 2, and 3 with probabilities of 0.25, 0.35 and 0.40,
respectively. Calculate Prob[S = 1], Prob[S= 2] and Prob[S=3].
7.36 (Course 151 Sample Exam #3, Q.12) (1.7 points) You are given:
(i) S has a compound Poisson distribution with λ = 2.
(ii) individual claim amounts, x, are distributed as follows:
x p(x)
1 0.4
2 0.6
Determine fS(4).
(A) 0.05 (B) 0.07 (C) 0.10 (D) 0.15 (E) 0.21

The frequency distribution of the number of losses in a year is geometric-Poisson with geometric
primary parameter β = 3 and Poisson secondary parameter λ = 0.5.
(In other words, Geometric frequency and Poisson severity.)

Calculate the probability that the total number of losses in a year is at least 4.
7.38 (Course 3 Sample Exam, Q.42) If individual losses are all exactly 100, determine the
expected aggregate losses in excess of 400.
7.39 (3, 11/02, Q.36 & 2009 Sample Q.95) (2.5 points)
The number of claims in a period has a geometric distribution with mean 4.
The amount of each claim X follows P(X = x) = 0.25, x = 1, 2, 3, 4.
S is the aggregate claim amount in the period. Calculate Fs(3).
(A) 0.27 (B) 0.29 (C) 0.31 (D) 0.33 (E) 0.35
7.40 (3 points) The number of claims in a period has a geometric distribution with mean 5.
The amount of each claim X follows P(X = x) = 0.2, x = 0, 1, 2, 3, 4.
S is the aggregate claim amount in the period. Calculate Fs(3).
(A) 0.27 (B) 0.29 (C) 0.31 (D) 0.33 (E) 0.35
7.41 (CAS3, 5/04, Q.40) (2.5 points) XYZ Re provides reinsurance to Bigskew Insurance
Company. XYZ agrees to pay Bigskew for all losses resulting from “events”, subject to:
• a $500 deductible per event and
• a $100 annual aggregate deductible
For providing this coverage, XYZ receives a premium of $150.
Use a Poisson distribution with mean equal to 0.15 for the frequency of events.
Event severity is from the following distribution:
Loss Probability
250 0.10
500 0.25
800 0.30
1,000 0.25
1,250 0.05
1,500 0.05
• i = 0%
What is the actual probability that XYZ will payout more than it receives?
A. 8.9% B. 9.0% C. 9.1% D. 9.2% E. 9.3%
7.42 (4, 5/07, Q.8) (2.5 points)

Annual aggregate losses for a dental policy follow the compound Poisson distribution with λ = 3.
The distribution of individual losses is:
Loss Probability
1 0.4
2 0.3
3 0.2
4 0.1
Calculate the probability that aggregate losses in one year do not exceed 3.
(A) Less than 0.20
(E) At least 0.80
7.1. A. The mean severity is: (1)(0.35) + (2)(0.2) + (3)(0.15) +(4)(0.05) = 1.4.
The mean aggregate losses = (2.1)(1.4) = 2.94.
7.2. D. The second moment of the severity is:

(12 )(0.35) + (22 )(0.2) + (32 )(0.15) +(42 )(0.05) = 3.3.
Thus the variance of the severity is: 3.3 - 1.42 = 1.34. The mean frequency is 2.1.
The variance of the frequency is: (2.1)(1 + 2.1) = 6.51.
The variance of the aggregate losses is: (2.1)(1.34) + (1.42 )(6.51) = 15.57.
7.3. C. The p.g.f. of the Geometric Distribution is: P(z) = {1 - 2.1(z-1)}-1.

1
c(0) = P(s(0)) = P(.25) = = 0.38835.
1 - (2.1)(0.25 -1)
Alternately, the non-zero losses are Geometric with β = (75%)(2.1) = 1.575.

The only way to get an aggregate of 0 is to have no non-zero losses: 1/2.575 = 0.38835.
Comment: In the alternative solution, we are trying to determine the number of losses of size other
than zero. If one has one or more such loss, then the aggregate losses are positive.
If one has zero such losses, then the aggregate losses are zero.
We can have any number of losses of size zero without affecting the aggregate losses.
7.4. A. For the Geometric Distribution: a = β/(1+β) = 2.1/3.1 = 0.67742 and b = 0.

1
1/(1-as(0)) = = 1.20388.
1 - (0.67742)(0.25)
Use the Panjer Algorithm:
x x
∑ (a + jb / x) s(j) c(x - j) = 1.20388 ∑ 0.67742 s(j) c(x - j)

1
c(x) =
1 - a s(0)
j=1 j=1
x
= 0.81553 ∑ s(j) c(x - j)
j=1
c(1) = 0.81553 s(1) c(0) = (0.81553)(0.35)(0.38835) = 0.11085.

Alternately, the non-zero losses are Geometric with β = (75%)(2.1) = 1.575,
and severity distribution: 35/75 = 7/15 @ 1, 4/15 @2, 3/15 @3, and 1/15 @4.
The only way to get an aggregate of 1 is to have one non-zero loss of size 1:
(1.575/2.5752 ) (7/15) = 0.11085.
7.5. C. Use the Panjer Algorithm, c(2) = 0.81553 {s(1) c(1) + s(2)c(0)} =
(0.81553) {(0.35)(0.11085) + (0.20)(0.38835)} = 0.09498.
Ways to get an aggregate of 2:
One non-zero loss of size 2: (1.575/2.5752 ) (4/15) = 0.06334.
Two non-zero losses, each of size 1: (1.5752 /2.5753 ) (7/15)2 = 0.03164.
Total probability: 0.06334 + 0.03164 = 0.09498.
7.6. B. Use the Panjer Algorithm, c(3) = 0.81553 {s(1) c(2) + s(2)c(1) +s(3)c(0)} =
(.81553) {(0.35)(0.09498) + (0.20)(0.11085) + (0.15)(0.38835)} = 0.09270.
Two non-zero losses, one of size 1 and one of size 2 in either order:
(1.5752 /2.5753 ) (2)(7/15)(4/15) = 0.03616.
Three non-zero losses, each of size 1: (1.5753 /2.5754 ) (7/15)3 = 0.00903.
Total probability: 0.04751 + 0.03616 + 0.00903 = 0.09270.
7.7. A. c(4) = 0.81553 {s(1) c(3) + s(2)c(2) +s(3)c(1) + s(4)c(0)} =

(0.81553){(0.35)(0.09270) + (0.20)(0.09498) + (0.15)(0.11085) + (0.05)(0.38835)} = 0.07135.
Two non-zero losses, one of size 1 and one of size 3 in either order:
(1.5752 /2.5753 ) (2)(7/15)(3/15) = 0.02712.
Two non-zero losses, each of size 2:
(1.5752 /2.5753 ) (4/15)2 = 0.01033.
Three non-zero losses, two of size 1 and one of size 2 in any order:
(1.5753 /2.5754 ) (3)(7/15)2 (4/15) = 0.01548.
Four non-zero losses, each of size 1: (1.5754 /2.5755 ) (7/15)4 = 0.00258.
Total probability: 0.01584 + 0.02712 + 0.01033 + 0.01548 + 0.00258 = 0.07135.
7.8. E. c(5) = 0.81553 {s(1) c(4) + s(2)c(3) +s(3)c(2) + s(4)c(1) + s(5)c(0)} =

(.81553) {(0.35)(0.07135) + (0.20)(0.09270) + (0.15)(.09498) + (0.05)(.11085) + (0)(.38835)} =
0.05162.
Comment: The aggregate distribution from 0 to 20 is:
0.38835, 0.110849, 0.094983, 0.0926988, 0.0713479, 0.0516245, 0.0415858, 0.0327984,
0.0253694, 0.0197833, 0.0154927, 0.0120898, 0.00943242, 0.00736622, 0.00575178,
0.0044901, 0.00350553, 0.00273696, 0.00213682, 0.00166827, 0.00130247.
7.9. E. From previous solutions, the mean and variance of the aggregate losses are: 2.94 and
15.57. Thus the probability that the aggregate losses are greater than 5 is approximately:
1 - Φ[(5.5 - 2.94)/ 15.57 ] = 1 - Φ[(5.5 - 2.94)/ 15.57 ] = 1 - Φ[.65] = 25.8%.
Comment: Based on the previous solutions, the exact answer is 1 - 0.80985 = 19.0%.
7.10. B. From previous solutions, the mean and variance of the aggregate losses are: 2.94 and
15.57. The mean of a LogNormal is exp(µ + 0.5σ2). The second moment of a LogNormal is
exp(2µ + 2σ2). Therefore set: exp(µ + 0.5σ2) = 2.94 and exp(2µ + 2σ2) = 15.57 + 2.942 .
1 + 15.57 / 2.942 = exp(2µ + 2σ2) / exp(2µ + σ2) = exp(σ2).
σ= ln(2.8013) = 1.015. µ = ln[ 2.94 / exp[.(0.5) (1.0152 )] ] = 0.5634. Since the aggregate losses
are discrete, we apply a “continuity correction”; more than 5 corresponds to 5.5.
The probability that the aggregate losses are greater than 5 is approximately:
1 - Φ[(ln(5.5) - 0.5634)/1.015] = 1 - Φ[1.12] = 13.1%.
Comment: Based on the previous solutions, the exact answer is 1 - 0.80985 = 19.0%.
7.11. C. c(0) + c(1) + c(2) + c(3) = 0.38835 + 0.110849 + 0.094983 + 0.0926988 =

0.6868808 < 70%.
c(0) + c(1) + c(2) + c(3) + c(4) = 0.38835 + 0.110849 + 0.094983 + 0.0926988 + 0.0713479 =
0.7582287 ≥ 70%.
The 70th percentile is 4, the first value such that the distribution function is ≥ 70%.
7.12. D. P(z) = (1 + 0.3(z-1))10. c(0) = P(s(0)) = P(.2) = (1 + 0.3(.2-1))10 = 0.06429.

Alternately, the non-zero losses are Binomial with q = (80%)(0.3) = 0.24.
The only way to get an aggregate of 0 is to have no non-zero losses: (1 - 0.24)10 = 0.06429.
7.13. A. For the Binomial, a = -q/(1-q) = - 0.3/0.7 = - 0.42857.

b = (m+1)q/(1-q) = 33/7 = 4.71429.
x x
∑ (a + jb / x) s(j) c(x - j) = 0.92105 ∑ (-.42857 + 4.71429 j / x) s(j) c(x - j)

1
c(x) =
1 - a s(0)
j=1 j=1
x
= 0.39474 ∑ (-1 + 11 j / x) s(j) c(x - j)
j=1
c(1) = 0.39474{-1 + 11(1/1)}s(1)c(0) = 0.39474(10)(0.5)(0.06429) = 0.12689.

Alternately, the non-zero losses are Binomial with q = (80%)(0.3) = 0.24.
The severity distribution truncated to remove the zero losses is: 5/8 @ 1, 2/8 @2, and 1/8 @3.
The only way to get an aggregate of 1 is to have 1 non-zero loss of size 1:
10 (1 - 0.24)9 (.24) (5/8) = 0.12689.
7.14. B. c(2) = 0.39474{(-1 + 11(1/2))s(1)c(1) + (-1 + 11(2/2))s(2)c(0)} =

0.39474 {(4.5)(0.5)(0.12689) + (10)(0.2)(0.06429)} = 0.16345.
Alternately, in order to get an aggregate of 2 we have either 1 non-zero loss of size 2 or two
non-zero losses each of size 1: 10 (1 - 0.24)9 (.24) (2/8) + 45 (1 - 0.24)8 (.24)2 (5/8)2 = 0.16345.
7.15. B. c(3) = 0.39474{(-1 + 11(1/3))s(1)c(2) + (-1 + 11(2/3))s(2)c(1) + (-1 + 11(3/3))s(3)c(0)}

= 0.39474 {(2.6667)(0.5)(0.16345) + (6.3333)(0.2)(0.12689) + (10)(0.1)(0.06429)} = 0.17485.
Alternately, in order to get an aggregate of 3 we have either 1 non-zero loss of size 3, two non-zero
losses each of sizes 1 and 2 in either order, or three non-zero losses each of size 1:
10 (1 - 0.24)9 (0.24) (1/8) + 45 (1 - 0.24)8 (0.24)2 (2)(5/8)(2/8) + 120 (1 - 0.24)7 (0.24)3 (5/8)3 =
0.17485.
7.16. C. c(4) = 0.39474{(-1 + 11(1/4))s(1)c(3) + (-1 + 11(2/4))s(2)c(2) +

(-1 + 11(3/4))s(3)c(1) + (-1 + 11(4/4))s(4)c(0)} =
0.39474 {(1.75)(.5)(0.17485) + (4.5)(0.2)(0.16345) + (7.25)(0.1)(0.12689) + (10)(0)(0.06429)}
= 0.15478.
7.17. B. c(5) = 0.39474{(-1 + 11(1/5))s(1)c(4) + (-1 + 11(2/5))s(2)c(3) +

(-1 + 11(3/5))s(3)c(2) + (-1 + 11(4/5))s(4)c(1) + (-1 + 11(5/5))s(5)c(0)} =
0.39474 {(1.2)(0.5)(0.15478) + (3.4)(0.2)(0.17485) + (5.6)(0.1)(0.16345)} = 0.11972.
Comment: Note that the terms involving s(4) and s(5) drop out, since s(4) = s(5) = 0.
7.18. C. & 7.19. A. We are thinning a Negative Binomial; the snowstorms that close the school are
also Negative Binomial, but with r = 5 and β = (.3)(3) = 0.9.
f(0) = 1/(1 + 0.9)5 = 4.0%. f(1) = (5)(0.9)/(1 + 0.9)6 = 9.6%.
Alternately, the number of storms that close the school is a compound Negative Binomial - Bernoulli
Distribution. c(0) = Pp(s(0)) = Pp(0.7) = {1 - (3)(0.7 - 1)}-5 = 1/1.95 = 4.04%.
a = β/(1+β) = 3/4. b = (r-1)β/(1+β) = 3. Using the recursive method / Panjer Algorithm:
c(1) = {1/(1 - as(0))} {(a + b)s(1)c(0)} = {1/(1 - (3/4)(0.7)} {(15/4)(0.3)(0.0404)} = 9.57%.
7.20. D. Apply the Panjer Algorithm.

x x
∑ (a + jb / x) s(j) c(x - j) = 1.12676 ∑ (0.75 + 3.75 j / x) s(j) c(x - j)

1
c(x) =
1 - a s(0)
j=1 j=1
x
= 0.84507 ∑ (1 + 5 j / x) s(j) c(x - j).
j=1
c(10) = 0.84507{(1+ 5(1/10))s(1)c(9) + (1+ 5(2/10))s(2)c(8) + (1+ 5(3/10))s(3)c(7) +

(1+ 5(4/10))s(4)c(6)} =
.84507{(1 + 5(1/10))(.3)(0.127617) + (1 + 5(2/10))(.4)(0.107404) +
(1 + 5(3/10))(.1)(0.0875199) + (1+ 5(4/10))(.05)(0.0695986)} = 0.14845.
Comment: Note that the terms involving s(5), s(6), etc., drop out, since in this case the severity is
such that s(x) = 0, for x > 4. Frequency follows a Negative Binomial Distribution with r = 6 and β = 3.
The aggregate distribution from 0 to 10 is: 0.0049961, 0.0075997, 0.0168763, 0.0250745,
0.0385867, 0.052303, 0.0695986, 0.0875199, 0.107404, 0.127617, 0.148454.
7.21. E. & 7.22. B. We are thinning a Poisson; the hurricanes that hit the continental United States
are also Poisson, but with λ = (.15)(11) = 1.65.
f(0) = e-1.65 = 19.2%. f(1) = 1.65e-1.65 = 31.7%.
Alternately, the number of storms that hit the continental United States is a compound
Poisson-Bernoulli Distribution.
c(0) = Pp(s(0)) = Pp(.85) = exp[(11)(.85 - 1)] = e-1.65 = 19.205%.
a = 0. b = λ = 11. Using the recursive method / Panjer Algorithm:
c(1) = {1/(1 - as(0))}{(a + b)s(1)c(0)} = {1}{(11)(.15)(.19205)} = 31.688%.
7.23. D. For the Primary Geometric, P(z) = 1/{1 - β(z-1)} = 1/(2.7 - 1.7z).
The secondary Poisson has density at zero of e−λ = e-3.1. The density of the compound distribution
at zero is the p.g.f. of the primary distribution at e-3.1: 1/{2.7 - 1.7e-3.1} = 0.3812.
7.24. C. For the Primary Geometric, a = β/(1+β) = 1.7/2.7 = 0.62963 and b = 0.
The secondary Poisson has density at zero of e−λ = e-3.1 = 0.045049.

1/(1 - as(0)) = 1/{1 - (.62963)(.045049)} = 1.02919. Use the Panjer Algorithm,
x x
∑ (a + jb / x) s(j) c(x - j) =1.02919 ∑ 0.62963 s(j) c(x- j)

1
c(x) =
1 - a s(0)
j=1 j=1
x
= 0.64801 ∑ s(j) c(x - j) . c(1) = 0.64801 s(1) c(0) = (0.64801)(0.139653)(0.3812) = 0.03450.
j=1
Alternately, the probability that an accident has zero claimants is e-3.1.

Thinning the Geometric Distribution, the number of accidents with at least one claimant is Geometric
with β = (1.7)(1 - e-3.1) = 1.62342.
Therefore, Prob[1 non-zero accident] = 1.62342/2.623422 = 0.23588.
Prob[one claim in total]
= Prob[one non-zero accident] Prob[accident has one claimant | accident has at least one claimant]
= (0.23588){3.1 e-3.1 / (1 - e-3.1)} = (0.23588)(0.14624) = 0.03450.
Alternately, the compound distribution is one if and only if the Geometric is n ≥ 1, and of the resulting
n Poissons one is 1 and the rest are 0.
∞
c(1) = ∑ Prob[Geometric = n] n Prob[Poisson = 1] Prob[Poisson = 0]n - 1 =
n=1
∞ ∞
∑ {(1.7 / 2.7)n / 2.7} n 3.1e - 3.1 (e - 3.1)n - 1 = (3.1/2.7) ∑ n (e- 3.11.7 / 2.7)n =
n=1 n=1
(3.1/2.7){0.0283643 + 0.0016091 + 0.0000685 + 0.0000026 + 0.0000001 + ...} = 0.03450.

Comment: The densities of the secondary Poisson Distribution with λ = 3.1 are:
n s(n)
0 0.045049
1 0.139653
2 0.216461
3 0.223677
4 0.173350
The formula for the Panjer Algorithm simplifies a little since for the Geometric b = 0.
7.25. D. Use the Panjer Algorithm: c(2) = 0.64801 {s(1) c(1) + s(2)c(0)} =
(0.64801) {(0.139653)(0.03450) + (0.216461)(0.3812)} = 0.05659.
Alternately, the probability that an accident has zero claimants is e-3.1. Thinning the Geometric
Distribution, the number of accidents with at least one claimant is Geometric with β = (1.7)(1 - e-3.1) =
1.62342. Therefore, Prob[2 non-zero accidents] = 1.623422 /2.623423 = 0.14597.
Prob[to claims in total]
= Prob[two non-zero accident] Prob[accident has one claimant | accident has at least one claimant]2
+ Prob[one non-zero accident] Prob[accident has two claimants | accident has at least one claimant]
= (0.14597) {3.1 e-3.1 / (1 - e-3.1)}2 + (0.23588) {(3.12 e-3.1 / 2) / (1 - e-3.1)} = 0.05659.
7.26. E. Use the Panjer Algorithm: c(3) = 0.64801 {s(1) c(2) + s(2)c(1) +s(3)c(0)} =
(0.64801) {(0.139653)(0.05659) + (0.216461)(0.03450) + (0.223677)(0.3812) } = 0.06521.
7.27. C. c(4) = 0.64801 {s(1) c(3) + s(2)c(2) +s(3)c(1) + s(4)c(0)} =

(0.64801){(.139653)(.06521) +(.216461)(.05659) + (.223677)(.03450) + (.173350)(.3812)} =
0.06166.
7.28. B. c(0) + c(1) + c(2) = 0.3812 + 0.03450 + 0.05659 = 0.47229 < 50%.
c(0) + c(1) + c(2) + c(3) = 0.3812 + 0.03450 + 0.05659 + 0.06521 = 0.5375 ≥ 50%.
The median is 3, the first value such that the distribution function is ≥ 50%.
7.29. E. Apply the Panjer Algorithm.

x x
∑ (a + jb / x) s(j) c(x - j) = 0.92105 ∑ (-0.42857 + 4.71429 j / x) s(j) c(x - j)

1
c(x) =
1 - a s(0)
j=1 j=1
x
= 0.39474 ∑ (-1 + 11 j / x) s(j) c(x - j)
j=1
c(13) = 0.39474{(-1+ 11(1/13))s(1)c(12) + (-1+ 11(2/13))s(2)c(11) + (-1+ 11(3/13))s(3)c(10)} =

0.39474{(-.15385)(.5)(.00157109) + (.69231)(.2)(0.00364884) + (1.53846)(.1)(.00792610)} =
0.00063307.
Comment: Terms involving s(4), s(5), etc., drop out, since in this case the severity is such that
s(x) = 0, for x > 3. Frequency follows a Binomial Distribution with m = 10 and q = 0.3.
7.30. E. The secondary Binomial has density at zero of (1-q)m = 0.94 = 0.6561. The density of
the compound distribution at zero is the p.g.f. of the primary Poisson distribution at 0.6561:
exp[1.2(.6561 - 1)] = 0.66187.
For the Primary Poisson a = 0 and b = λ = 1.2. 1/(1-as(0)) = 1. Use the Panjer Algorithm,
x x
∑ (a + jb / x) s(j) c(x - j) = 1.2 ∑ (j/ x) s(j) c(x - j) .

1
c(x) =
1 - a s(0)
j=1 j=1
c(1) = (1.2)(1/1) s(1) c(0) = (1.2){(4)(.93 ) (.1)}(.66187) = 0.23160.

Alternately, the p.g.f. of the compound distribution is:
P(z) = exp(1.2({1+ 0.1(z-1)}4 -1)). P(0) = exp(1.2({1+ 0.1(0-1)}4 -1)) = 0.66187.
Pʼ(z) = P(z) (1.2)(4)(.1)(1+.1(z-1))3 .
Pʼ(0) = P(0) (.48)(.1)(1+.1(0-1))3 = (.66187)(.48)(.93 ) = 0.23160.
f(n) = (dn P(z) / dzn )z=0 / n!, so that f(1) = Pʼ(0) = 0.23160.
Comment: Alternately, think of the Primary Poisson Distribution as the number of accidents, while the
secondary Binomial represents the number of claims on each accident. The only way for the
compound distribution to have be one, is if all but one accident has zero claims and the remaining
accident has 1 claim.
For example, the chance of 3 accidents is: 1.23 e-1.2 /3! = 0.086744.
The chance of an accident having no claims is: 0.94 = 0.6561. The chance of an accident 1 claim is:
(4)(.93 ) (.1) = 0.2916. Thus if one has 3 accidents, the chance that 2 accidents are for zero and 1
accident is 1 is: (3)(.2916)(.65612 ) = 0.37657. Thus the chance that there are 3 accidents and they
sum to 1 is: (.086744)(.37657) = 0.03267. Summing over all the possible numbers of accidents,
gives a density at one of the compound distribution of 0.23160:
Number Chance of all Chance of
of Accidents Poisson but one at 0 claims 1 claim in
and one at 1 claim Aggregate
0 0.30119 0.00000 0.00000
1 0.36143 0.29160 0.10539
2 0.21686 0.38264 0.08298
3 0.08674 0.37657 0.03267
4 0.02602 0.32943 0.00857
5 0.00625 0.27017 0.00169
6 0.00125 0.21271 0.00027
7 0.00021 0.16282 0.00003
8 0.00003 0.12209 0.00000
Sum 1.00000 0.23160
7.31. A. Prob[S ≤ 75] = Prob[N=0] + Prob[N=1] +

Prob[N=2](1 - Prob[30,50 or 50,30 or 50,50]) + Prob[N=3]Prob[3@20 or 2@20 and 1@30] =
e -2 + 2e-2 + 2e-2{1 - (2)(.3)(.2) - 0.22 } + 4e-2/3{.53 + (3)(.52 )(.3)} = 5.1467e-2 = 0.6965.
Prob[S > 75] = 1 - 0.6965 = 30.35%.
Alternately, use the Panjer Algorithm, in units of 10:
For the Poisson Distribution, a = 0 and b = λ = 2.
c(0) = P(s(0)) = P(0) = e2(0-1) = e-2 = 0.135335.
x x
∑ (a + jb / x) s(j) c(x - j) = (2/x) ∑ j s(j) c(x - j) .

1
c(x) =
1 - a s(0)
j=1 j=1
c(1) = (2/1) (1) s(1) c(1-1) = 0.

c(2) = (2/2){(1)s(1)c(1) + (2)s(2)c(0)} = {0 + (2)(.5)(.135335)} = 0.135335.
c(3) = (2/3){(1)s(1)c(2) + (2)s(2)c(1) + (3)s(3)c(0)} = (2/3){0 + 0 + (3)(.3)(.135335)} = 0.081201.
c(4) = (2/4){(1)s(1)c(3) + (2)s(2)c(2) + (3)s(3)c(1) + (4)s(4)c(0)} =
0.5{0 + (2)(.5)(.135335) + 0 + 0} = 0.067668.
c(5) = (2/5){(1)s(1)c(4) + (2)s(2)c(3) + (3)s(3)c(2) + (4)s(4)c(1) + (5)s(5)c(0)} =
0.4{0 + (2)(.5)(.081201) + (3)(.3)(.135335) + 0 + (5)(.2)(.135335)} = 0.135335.
c(6) = (2/6){(1)s(1)c(5) + (2)s(2)c(4) + (3)s(3)c(3) + (4)s(4)c(2) + (5)s(5)c(1) + (6)s(6)c(0)} =
(1/3){0 + (2)(.5)(.067668) + (3)(.3)(.081201) + 0 + 0 + 0} = 0.046916.
c(7) = (2/7){(1)s(1)c(6) + (2)s(2)c(5) + (3)s(3)c(4) + (4)s(4)c(3) + (5)s(5)c(2) + (6)s(6)c(1) +
(7)s(7)c(0)} =
(2/7){0 + (2)(.5)(.135335) + (3)(.3)(.067668) + 0 + (5)(.2)(.135335) + 0 + 0} = 0.094735.
c(0) + c(1) + c(2) + c(3) + c(4) + c(5) + c(6) + c(7) = 0.696525.
1 - 0.696525 = 30.35%.
Alternately, use convolutions:
n 0 1 2 3 Aggregate Aggregate
Poisson 0.1353 0.2707 0.2707 0.1804 Density Distribution
x p*0 p p*p p*p*p
0 1 0.1353 0.1353
10 0.0000 0.1353
20 0.5 0.1353 0.2707
30 0.3 0.0812 0.3519
40 0.25 0.0677 0.4195
50 0.2 0.30 0.1353 0.5549
60 0.09 0.125 0.0469 0.6018
70 0.20 0.225 0.0947 0.6965
Sum 1 1 0.84 0.35
1 - 0.6965 = 30.35%.
7.32. For the Negative Binomial Distribution, a = β/(1+β) = 4/5 = 0.8, b = (r - 1)β/(1+β) = -0.56,
and P(z) = 1/{1 - β(z -1)}r = 1/{1 - 4(z - 1)}.3 = 1/{5 - 4z)}.3.
c(0) = Pp (s(0)) = Pp (0) = 1/50.3 = 0.6170339.
x x
∑ (a + jb / x) s(j) c(x - j) = ∑ (0.8 - 0.56j / x) s(j) c(x - j) .

1
c(x) =
1 - a s(0)
j=1 j=1
c(1) = (.8 - 0.56)s(1)c(0) = (.24)(.5)(.6170339) = 0.0740441.

c(2) = (.8 - 0.56/2)s(1)c(1) + (.8 - 0.56(2/2))s(2)c(0) = (.52)(.5)(.0740441) + (.24)(.3)(.6170339)
= 0.0636779.
c(3) = (.8 - 0.56/3)s(1)c(2) + (.8 - 0.56(2/3))s(2)c(1) + (.8 - 0.56(3/3))s(3)c(0)
= (.613333)(.5)(.0636779) + (.426667)(.3)(.0740441) + (.24)(.2)(.6170339) = 0.0586232.
c(4) = (.8 - 0.56/4)s(1)c(3) + (.8 - 0.56(2/4))s(2)c(2) + (.8 - 0.56(3/4))s(3)c(1)
+ (.8 - 0.56(4/4))s(4)c(0) = (.66)(.5)(.0586232) + (.52)(.3)(.0636779) + (.38)(.2)(.0740441) +
(.24)(0)(.6170339) = 0.0349068.
Continuing in this manner produces the following densities for the compound distribution from zero to
twenty: 0.617034, 0.0740441, 0.0636779, 0.0586232, 0.0349067, 0.0280473, 0.0224297,
0.0173693, 0.0140905, 0.0114694, 0.00937036, 0.00773602, 0.00641437,
0.00534136, 0.00446732, 0.00374844, 0.00315457, 0.00266188, 0.00225134,
0.00190808, 0.0016202.
The corresponding distribution functions from zero to twenty are: 0.617034, 0.691078, 0.754756,
0.813379, 0.848286, 0.876333, 0.898763, 0.916132, 0.930223, 0.941692, 0.951062,
0.958798, 0.965213, 0.970554, 0.975021, 0.97877, 0.981924, 0.984586, 0.986838, 0.988746,
0.990366.
Thus Underdog requires 20 pills to be 99% certain he will not run out during the week.
Comment: This compound distribution has a long righthand tail. Therefore, one would not get the
same result if one used the Normal Approximation. Note that since the secondary distribution has
only three non-zero densities, each recursion involves summing at most three non-zero terms.
7.33. (a) The probabilities at 1, 2, and 3 of the zero-truncated Poisson are:

3 e-3 / (1 - e-3) = 0.15719, (32 /2) e-3 / (1 - e-3) = 0.23578, and (33 /6) e-3 / (1 - e-3) = 0.23578.
If there are zero claims, then the aggregate is 0; probability is: 0.86 = 0.26214.
For the aggregate to be 1, there must be one claim of size 1; the probability is:
(6)(0.85 )(0.2) (0.15719) = 0.06181.
For the aggregate to be 2, there can be one claim of size 2 or two claims each of size 1;
the probability is: (6)(0.85 )(0.2) (0.23578) + (15)(0.84 )(0.22 ) (0.157192 ) = 0.09878.
For the aggregate to be 3, there can be one claim of size 3, or two claims of size 1 and 2 or 2 and 1,
or three claims each of size 1; the probability is:
(6)(0.85 )(0.2) (0.23578) + (15)(0.84 )(0.22 ) (2)(0.15719)(0.23578) + (20)(0.83 )(0.23 ) (0.157193 )
= 0.11125.
Thus, Prob[Agg ≤ 3] = 0.26214 + 0.06181 + 0.09878 + 0.11125 = 0.53398.
Alternately, use the Panjer Algorithm / recursive method, for a Binomial Frequency.
P(z) = {1 + 0.2(z-1)}6 . c(0) = P(s(0)) = P(0) = {1 + 0.2(0-1)}6 = 0.26214.
For the Binomial, a = -q/(1-q) = -0.2/0.8 = -0.25. b = (m+1)q/(1-q) = (7)(0.25) = 1.75.
x x
∑ ∑ (-0.25 + 1.75 j / x) s(j) c(x - j)

1 1
c(x) = (a + jb / x) s(j) c(x - j) =
1 - a s(0) 1 - (-0.25)(0)
j=1 j=1
x
= 0.25 ∑ (-1 + 7 j / x) s(j) c(x - j).
j=1
c(1) = (0.25) {-1 + 7(1/1)} s(1)c(0) = (0.25)(6)(0.15719)(0.26214) = 0.06181.

c(2) = (0.25) {-1 + 7(1/2)} s(1)c(1) + (0.25) {-1 + 7(2/2)} s(2)c(0)} =
(0.25) (2.5) (0.15719) (0.06181) + (0.25) (6) (0.23578) (0.26214) = 0.09878.
c(3) = (0.25){-1 + 7(1/3)} s(1)c(2)} + (0.25){-1 + 7(2/3)} s(2)c(1) + (0.25){-1 + 7(3/3)} s(3)c(0) =
(0.25)(4/3)(0.15719)(0.09878) + (0.25)(11/3)(0.23578)(0.06181)
+ (0.25)(6)(0.23578)(0.26214) = 0.11125.
Thus, Prob[Agg ≤ 3] = 0.26214 + 0.06181 + 0.09878 + 0.11125 = 0.53398.
(b) The mean frequency is: (6)(0.2) = 1.2. Variance of frequency is: (6)(0.2)(1 - 0.2) = 0.96.
Mean severity is: 3 / (1 - e-3) = 3.1572. 2nd moment of severity is: (3 + 32 ) / (1 - e-3) = 12.629.
Variance of Severity is: 12.629 - 3.15722 = 2.661.
Mean aggregate is: (1.2)(3.1572) = 3.789.
Variance of Aggregate is: (1.2)(2.661) + (3.15722 )(0.96) = 12.762.
Prob[Agg ≤ 3] = Φ[(3.5 - 3.789) / 12.762 ] = Φ[-0.08] = 0.4681.
Comment: The Normal Approximation would be better if the expected frequency were larger.
Commonly one does apply the continuity correction when working on aggregate losses.
Here the severity is 1, 2, 3, etc., so using the continuity correction makes sense.
7.34. Using the Panjer algorithm, for the Poisson a = 0 and b = λ = 3.

c(0) = P(s(0)) = P( 0) = e3(0-1) = e-3 = 0.04979.
x x
∑ (a + jb / x) s(j) c(x - j) = 0 (3/x) ∑ j s(j) c(x - j)

1
c(x) =
1 - a s(0)
j=1 j=1
c(1) = (3/1) (1) s(1) c(1-1) = (3/1) {(1)(.6)(.04979)} = 0.08962.

c(2) = (3/2){(1)s(1)c(1) + (2)s(2)c(0)} = (3/2) {(1)(.6)(.08962) +(2)(.3)(.04979)} = 0.1255.
Alternately, if the aggregate losses are 2, then there is either one claim of size 2 or two claims each
of size 1. This has probability: (3e-3)(.3) + (32 e-3 /2)(.62 ) = 0.04481 + 0.08066 = 0.1255.
7.35. Using the Panjer algorithm, for the Poisson a = 0 and b = λ = 0.6.
c(0) = P(s(0)) = P(0) = e.6(0-1) = e-.6 = 0.54881.
x x
∑ (a + jb / x) s(j) c(x - j) = (0.6/x) ∑ j s(j) c(x - j) .

1
c(x) =
1 - a s(0)
j=1 j=1
c(1) = (.6/1) (1) s(1) c(1-1) = (.6) {(1)(.25)(.54881)} = 0.08232.

c(2) = (.6/2){(1)s(1)c(1) + (2)s(2)c(0)} = (.3) {(1)(.25)(.08232) + (2)(.35)(.54881)} = 0.12142.
c(3) = (.6/3){(1)s(1)c(2) + (2)s(2)c(1) + (3)s(3)c(0)} =
(.2) {(1)(.25)(.12142) + (2)(.35)(.08232) + (3)(.40)(.54881)} = 0.14931.
Comment: For example, one could instead calculate the probability of the aggregate losses being
three as: Prob[1 claim @3] + Prob[ 2 claims of sizes 1 and 2] + Prob [3 claims each @1] =
(.4)(.6e-.6) + (2)(.25)(.35)(.62 e-.6 /2) + (.253 )(.63 e-.6 /6) = 0.1493.
7.36. D. For the aggregate losses to be 4, one can have either 2 claims each of size 2,
3 claims of which 2 are size 1 and one is size 2 (there are 3 ways to order the claim sizes),
or 4 claims each of size one.
Thus fS(4) = (22 e-2/2)(.62 ) + (23 e-2/6)((3)(.42 )(.6)) + (24 e-2/24)(.44 ) = 1.121e-2 = 0.152.
Alternately, use the Panjer Algorithm (recursive method): For the Poisson a = 0 and b = λ = 2.
c(0) = P(s(0)) = P(0) = e2(0-1) = e-2 = 0.13534.
x x

1
c(x) =
1 - a s(0)
j=1 j=1
c(1) = (2/1) (1) s(1) c(1-1) = (2/1) {(1)(.4)(.13534)} = 0.10827.

c(2) = (2/2){(1)s(1)c(1) + (2)s(2)c(0)} = (.4)(.10827) + (2)(.6)(0.13534) = 0.20572.
c(3) = (2/3){s(1)c(2) + 2s(2)c(1) + 3s(3)c(0)} = (2/3){(.4)(.20572) + (2)(.6)(.10827) + 0} =
0.14147.
c(4) = (2/4){s(1)c(3) + 2s(2)c(2) + 3s(3)c(1) + 4s(4)c(0)} =
(2/4){(.4)(.14147) + (2)(.6)(.20572) + 0 + 0} = 0.1517.
Alternately, weight together convolutions of the severity distribution:
(.1353)(0) + (.2707)(0) + (.2707)(.36) + (.1804)(.288) + (.0902)(.0256) = 0.1517.
Poisson 0.1353 0.2707 0.2707 0.1804 0.0902 Distribution
x p*0 p p*p p*p*p p*4
0 1 0.135335
1 0.4 0.108268
2 0.6 0.16 0.205710
3 0.48 0.064 0.141470
4 0.36 0.288 0.0256 0.151720
5 0.432 0.1536
6 0.216 0.3456
7 0.3456
8 0.1296
Sum 1 1 1 1 1
Comment: Since we only want the density at 4, and do not need the densities at 0, 1, 2, and 3 in
order to answer this question, the Panjer Algorithm involves more computation in this case.
7.37. The secondary Poisson has density at zero of e−λ = e-0.5 = 0.6065.
The densities of the secondary Poisson Distribution with λ = 0.5 are:
n s(n)
0 0.6065
1 0.3033
2 0.0758
3 0.0126
4 0.0016
5 0.0002
The density of the compound distribution at zero is the p.g.f. of the primary Geometric distribution,
P(z) = 1/{1 - β(z-1)}, at z = e-0.5 : 1/{4 - 3e-.5} = 0.4586.
For the Primary Geometric, a = β/(1+β) = 3/4 = 0.75 and b = 0.
1
1/(1 - as(0)) = = 1.8345.
1 - (0.75)(0.6065)
Use the Panjer Algorithm:
x x x
∑ (a + jb / x) s(j) c(x - j) =1.8345 ∑ 0.75 s(j) c(x- j) = 1.3759 ∑ s(j) c(x - j) .

1
c(x) =
1 - a s(0)
j=1 j=1 j=1
c(1) = 1.3759 s(1) c(0) = (1.3759)(.3033)(.4586) = 0.1914.

c(2) = 1.3759 {s(1) c(1) + s(2)c(0)} = ( 1.3759){(.3033)(.1914)+(.0758)(.4586)} = 0.1277.
c(3) = 1.3759 {s(1) c(2) + s(2)c(1) +s(3)c(0)} =
( 1.3759){(.3033)(.1277) +(.0758)(.1914) + (.0126)(.4586) } = 0.0812.
The chance of 4 or more claims in a year is 1 - (c(0) + c(1) + c(2) + c(3)) =
1 - (0.4586 + 0.1914 + 0.1277 + 0.0812) = 0.1411.
Comment: Long! Using the Normal Approximation, one would proceed as follows.
The expected number of losses per year is:
(mean of Geometric)(Mean of Poisson) = (3)(0.5) = 1.5.
The variance of the compound frequency distribution is:
(mean of Poisson)2 (variance of geometric) + (mean of geometric)(variance of Poisson)
= (λ2)β(1+β) + βλ = 3 + 1.5 = 4.5. Thus the chance of more than 3 losses is approximately:
1 - Φ[(3.5 - 1.5)/ 4.5 ] = 1 - Φ[0.94] = 1 - 0.8264 = 0.1736. Due to the skewness of the
compound frequency distribution, the approximation is not particularly good.
7.38. The expected number of losses per year is:

(mean of Geometric)(Mean of Poisson) = (3)(0.5) = 1.5. Thus the expected annual aggregate
losses are: (100)(1.5) = 150. Since each loss is of size 100, if one has 4 or more losses, then the
aggregate losses are greater than 400. Therefore, the expected losses limited to 400:
0f(0) + 100f(1) + 200f(2) + 300f(3) + 400{1-(f(0)+f(1)+f(2)+f(3))} =
100{4 - 4f(0) - 3f(1) -2f(2) - f(3)} = 100{4 - 4(.4586) - 3(.1914) -2(.1277) - 0.0812} = 125.48.
Therefore, the expected losses excess of 400 are: 150 - 125.48 = 24.52.
Comment: Uses the intermediate results of the previous question. Since severity is constant, this
question is basically about the frequency. The question does not specify that it wants expected
annual excess losses.
7.39. E. For the geometric distribution with β = 4 , P(z) = 1/(1 - β(z-1)) = 1/(5 - 4z).
a = β/(1 + β) = 0.8, b = 0. Using the Panjer algorithm, c(0) = Pf(s(0)) = P(0) = 0.2.
x x
∑ (a + jb / x) s(j) c(x - j) = 0.8 ∑ s(j) c(x - j) .

1
c(x) =
1 - a s(0)
j=1 j=1
c(1) = 0.8s(1)c(0) = (.8)(1/4)(.2) = 0.04.

c(2) = 0.8{s(1)c(1) + s(2)c(0)} = (.8){(1/4)(.04) + (1/4)(.2)} = 0.048.
c(3) = 0.8{s(1)c(2) + s(2)c(1) + s(3)c(0)} = (.8){(1/4)(.048) + (1/4)(.04) + (1/4)(.2)} = 0.0576.
Alternately, one can use “semi-organized reasoning”
For the Geometric with β = 4: f(0) = 1/5 = .2, f(1) = 0.8f(0) = 0.16,
f(2) = 0.8f(1) = 0.128, f(3) = 0.8f(2) = 0.1024.
The ways in which the aggregate is ≤ 3:
0 claims: 0.2. 1 claim of size ≤ 3: (3/4)(.16) = 0.12.
2 claims of sizes 1 & 1, 1 & 2, or 2 & 1: (3/16)(.128) = 0.024.
3 claims of sizes 1 & 1 & 1: (1/64)(.1024) = 0.0016.
Alternately, using convolutions:
n 0 1 2 3 Aggregate
Geometric 0.200 0.160 0.128 0.102 Distribution
x f*0 f f*f f*f*f
0 1 0.2000
1 0.25 0.0400
2 0.25 0.062 0.0480
3 0.25 0.125 0.0156 0.0576
7.40. E. One can thin the Geometric Distribution.

The non-zero claims are Geometric with β = (5)(4/5) = 4.
The size distribution for the non-zero claims is: P(X = x) = 0.25, x = 1, 2, 3, 4.
Only the non-zero claims contribute to the aggregate distribution.
Thus this question has the same solution as the previous question 3, 11/02, Q. 36.
Distribution of aggregate at 3 is: 0.3456.
Comment: In general, thinning the frequency to only consider the non-zero claims can simplify the
use of convolutions or “semi-organized reasoning”. Thinning a Binomial affects q.
Thinning a Poisson affects λ. Thinning a Negative Binomial affects β.
7.41. E. For XYZ to pay out more than it receives, the aggregate has to be > 250 prior to the
application of the aggregate deductible. After the 500 per event deductible, the severity distribution
is: 0 @ 35%, 300 or more @ 65%.
Thus XYZ pays out more than it receives if and only if XYZ makes a single nonzero payment.
Nonzero payments are Poisson with mean: (65%)(.15) = 0.0975.
Probability of at least one nonzero payment is: 1 - e-.0975 = 9.29%.
Alternately, for the aggregate distribution after the per event deductible,
using the Panjer Algorithm, c(0) = P(s(0)) = Exp[.15(.35 - 1)] = 0.9071. 1 - 0.9071 = 9.29%.
Comment: The aggregate deductible applies after the per event deductible is applied.
7.42. B. Some densities of the Poisson frequency are: f(0) = e-3 = 0.0498, f(1) = 3e-3 = 0.1494,
f(2) = 32 e-3/2 = 0.2240, f(3) = 33 e-3/6 = 0.2240.
Ways in which the aggregate can be less than or equal to 3:
no claims: 0.0498.
1 claim of size less than 4: (0.1494)(0.9) = 0.1345.
2 claims of sizes 1 and 1, 1 and 2, 2 and 1: (0.2240){(0.4)(0.4) + (0.4)(0.3) + (0.3)(0.4)} = 0.0896.
3 claims each of size 1: (0.2240)(0.43 ) = 0.0143
The sum of these probabilities is: 0.0498 + 0.1345 + 0.0896 + 0.0143 = 0.2882.
Alternately, using the Panjer algorithm, for the Poisson a = 0 and b = λ = 3. P(z) = exp[λ(z-1)].
c(0) = P(s(0)) = P(0) = e3(0-1) = e-3 = 0.04979.
x x
∑ (a + jb / x) s(j) c(x - j) = (3/x) ∑ j s(j) c(x - j)

1
c(x) =
1 - a s(0)
j=1 j=1
c(1) = (3/1) (1) s(1) c(1-1) = (3) {(1)(.4)(.04979)} = 0.05975.

c(2) = (3/2){(1)s(1)c(1) + (2)s(2)c(0)} = (1.5) {(1)(.4)(.05975) + (2)(.3)(.04979)} = 0.08066.
c(3) = (3/3){(1)s(1)c(2) + (2)s(2)c(1) + (3)s(3)c(0)}
= (1)(.4)(.08066) + (2)(.3)(.05975) + (3)(.2)(.04979) = 0.09799.
The sum of the densities of the aggregate at 0, 1, 2, and 3 is:
0.04979 + 0.05975. + 0.08066 + 0.09799 = 0.2882.
2016-C-3, Aggregate Distributions §8 Advanced Panjer Algorithm, HCM 10/21/15, Page 250
Section 8, Recursive Method / Panjer Algorithm, Advanced107
Additional items related to the Recursive Method / Panjer Algorithm will be discussed.
Aggregate Distribution, when Frequency is a Compound Distribution:108
Assume frequency is a Compound Geometric-Poisson Distribution with β = 0.8 and λ = 1.3.

Let severity have density: s(0) = 60%, s(1) = 10%, s(2) = 25%, and s(3) = 5%.
Then we can use the Panjer Algorithm twice, in order to compute the Aggregate Distribution.
First we use the Panjer Algorithm to calculate the density of an aggregate distribution with a Poisson
frequency with λ = 1.3, and this severity. As computed in the previous section, this aggregate
distribution is:109 0.594521, 0.0772877, 0.198243, 0.06398, ...
These then are used as the secondary distribution in the Panjer algorithm, together with a Geometric
with β = 0.8 as the primary distribution.
For the primary Geometric, P(z) = 1/{1 - β(z-1)} = 1/(1.8 - 0.8z), a = β/(1+β) = 0.8/1.8 = 0.4444444,
and b = 0.
c(0) = Pp (s(0)) = 1/{1.8 - (0.8)(0.594521)} = 0.7550685.
1/{1 - as(0)} = 1/{1- (0.4444444)(0.594521)} = 1.359123.

x x
1
1 - a s(0) ∑
c(x) = (a + j b / x) s(j) c(x - j) =1.359123 ∑ 0.4444444 s(j) c(x- j)
j=1 j=1
x
= 0.604055 ∑ s(j) c(x- j) .
j=1
c(1) = 0.604055 s(1) c(0) = (0.604055)(0.0772877)(0.7550685) = 0.035251.

c(2) = 0.604055{s(1)c(1)+s(2)c(0)} =
(0.604055){(0.0772877)(0.035251) + (0.198243)(0.7550685)} = 0.092065.
c(3) = 0.604055 {s(1)c(2) + s(2)c(1) + s(3)c(0)} =
(0.604055){(0.0772877)(0.092065) + (0.198243)(0.035251) + (0.06398)(0.7550685)} =
0.0377009.
One could calculate c(4), c(5), c(6), .... , in a similar manner.
107
108
See Section 9.6.1 of Loss Models, not on the syllabus.
109
The densities at 0, 1, 2, and 3 were computed, while the densities from 4 through 10 were displayed.
Practical Issues:
Loss Models mentions a number of concerns that may arise in practical applications of the recursive
method/ Panjer Algorithm.
Whenever one uses recursive techniques, one has to be concerned about the propagation of
rounding errors. Small errors can compound at each stage and become very significant.110 While the
chance of this occurring can be minimized by keeping as many significant digits as possible, in
general the chance can not be eliminated.
In the case of the Panjer Algorithm, one is particularly concerned about the calculated right hand tail of
the aggregate distribution. In the case of a Poisson or Negative Binomial frequency distribution, the
relative errors in the tail of the aggregate distribution do not grow quickly; the algorithm is numerically
stable.111 However, in the case of a Binomial frequency, rarely the errors in the right hand tail will
“blow up”.112
Exercise: Aggregate losses are compound Poisson with λ = 1000. There is a 5% chance that the
size of a loss is zero. What is the probability that the aggregate losses are 0?
[Solution: P(Agg=0) = PN(fX(0)) = exp[1000(fX(0)-1)] = exp(-950) = 2.6 x 10-413.]
Thus for this case, the probability of the aggregate losses being zero is an extremely small number.
Depending on the computer and software used, e-950 may not be distinguishable from zero. If this
value is represented as zero, then the results of the Panjer Algorithm would be complete
nonsense.113
Exercise: Assume in the above exercise, the probability of aggregate losses at zero, c(0) is
mistakenly taken as zero. What is the aggregate distribution calculated by the Panjer algorithm?
x
1
1 - a s(0) ∑
[Solution: c(x) = (a + j b / x) s(j) c(x - j) .
j=1
Thus c(1) = {1/(1 - as(0))} (a + b/1) s(1) c(0) = 0.

Then, c(2) = {1/(1 - as(0))} {(a + b/2) s(1) c(1) + (a + 2b/2)s(2)c(0)} = 0.
In a similar manner, the whole aggregate distribution would be calculated as zero.]
110
This is a particular concern when one is applying the recursion formula many times. While on a typical exam
question one would apply the recursion formula at most 4 times, in a practical application one could apply it
thousands of times.
111
112
In this case, the calculated probabilities will alternate sign. Of course the probabilities are actually nonnegative.
113
Of course with such a large expected frequency, it is likely that the Normal, LogNormal or other approximation to
the aggregate losses may be a superior technique to using the Panjer algorithm.
So taking c(0) = 0, rather than the correct c(0) = 2.6 x 10-413, would defeat the whole purpose of
using the Panjer algorithm. Thus we see that it is very important when applying the Panjer Algorithm
to such situations either to carefully distinguish between extremely small numbers and zero, or to be
a little “clever” in applying the algorithm.
Exercise: Aggregate losses are compound Poisson with λ = 1000. The severity distribution is:
f(0) = 5%, f(1) = 75%, and f(2) = 20%. What are the mean and variance of the aggregate losses?
[Solution: The mean of the severity is 1.15. The second moment of the severity is 1.55.
Therefore, the mean of the aggregate losses is 1150 and the variance of the aggregate losses is:
(1000)(1.55) = 1550.]
Thus in this case the mean of the aggregate losses minus 6 standard deviations is:
1150 - 6 1550 = 914. In general, we expect there to be extremely little probability more than 6
standard deviations below the mean.114
One could take c(x) = 0 for x ≤ 913, and c(914) = 1; basically we start the algorithm at 914. Then
when we apply the algorithm, the distribution of aggregate losses will not sum to unity, since we
arbitrarily chose c(914) = 1. However, at the end we can add up all of the calculated densities and
divide by the sum, in order to normalize the distribution of aggregate losses.
Exercise: Assume the aggregate losses have a mean of 100 and standard deviation of 5.
Explain how you would apply the Panjer algorithm?
[Solution: One assumes there will be very little probability below 100 - (6)(5) = 70. Thus we take
c(x) = 0 for x < 70, and c(70) = 1. Then we apply the Panjer algorithm starting at 70; we calculate,
c(71), c(72), c(73), ... , c(130). Then we sum up the probabilities. Perhaps they sum to 1,617,012.
Then we would divide each of these calculated values by 1,617,012.]
Another way to solve this potential problem, would be first to perform the calculation for
λ = 1000/128 = 7.8125 rather than 1000.115 Let g(x) be the result of performing the Panjer algorithm
with λ = 7.8125. Then the desired distribution of aggregate losses, corresponding to λ =1000, can
be obtained as g(x)*128. Note that we can “power-up” the convolutions by successively taking
convolutions. For example, (g*8 ) * (g*8 ) = (g*16), and then in turn
(g*16) * (g*16) = (g*32). In this manner we need only perform 7 convolutions in order to get the
27 = 128th convolution. This technique relies on the property that the sum of independent, identically
distributed compound Poisson distributions is another compound Poisson distribution.116
114
Φ(-6) = 9.87 x 10-10. Loss Models in Section 9.6.2, suggests starting at 6 standard deviations below the mean.
115
One would pick some sufficiently large power of 2, such as for example 128.
116
See a “Mahlerʼs Guide to Frequency Distributions.”
Since the compound Negative Binomial shares the same property, one can apply a similar
technique.
Exercise: Assume you have a Compound Negative Binomial with β = 20 and r = 30.
How might you use the Panjer Algorithm to calculate the distribution of aggregate losses?
[Solution: One could apply the Panjer Algorithm to a Compound Negative Binomial with
β = 20 and r = 30/32 = 0.9375, and then take the 32nd convolution of the result.]
For the Binomial, since the m parameter has to be integer, one has to modify the technique slightly.
Exercise: Assume you have a Compound Binomial with q = 0.6 and m = 592. How might you use
the Panjer Algorithm to calculate the distribution of aggregate losses?
[Solution: One could apply the Panjer Algorithm to a Compound Binomial with
q = 0.6 and m = 1, and then take the 592nd convolution of the result.
Comment: One could get the 29 = 512nd convolution relatively quickly and then convolute that with
the 80th convolution. In base 2, 592 is written as 1001010000. Therefore, in order to get the
592nd convolution, one would retain the 512nd, 64th and 16th convolutions, and convolute
them at the end. ]
Panjer Algorithm (Recursive Method) for the (a,b,1) class:
If the frequency distribution or primary distribution, pk, is a member of the (a,b,1) class, then one can
modify the Panjer Algorithm:117 118
x
s(x) {p1 - (a+ b) p0} 1
c(x) =
1 - a s(0)
+ ∑ (a + jb / x) s(j) c(x - j) ,
1 - a s(0) j=1
where p0 = frequency density at zero, and p1 = frequency density at one.
c(0) = Pp (s(0)) = p.g.f. of the frequency distribution at (density of severity distribution at zero.)
If p is a member of the (a, b, 0) class, then p1 = (a+b)p0 , and the first term of c(x) drops out.
Thus this formula reduces to the previously discussed formula for the (a, b, 0) class.
117
See Theorem 9.8 in Loss Models. While on the syllabus, it is unlikely that you will be asked about this.
118
The (a, b, 1) class of frequency distributions includes the (a, b, 0) class. For the (a, b, 1) class, the recursion
relationship f(x+1) = f(x)/{a + b/(x+1)} need only hold for x ≥1, rather than x ≥ 0.
Exercise: Calculate the density at 1 for a zero-modified Negative Binomial with β = 2,

r = 3, and probability at zero of 22%.
[Solution: Without the modification, f(0) = 1/(1+2)3 = 0.037037,
and f(1) = (3)(2)/(1+2)4 = 0.074074. Thus with the zero-modification, the density at one is:
(0.074074)(1 - 0.22)/(1 - 0.037037) = 0.06.]
Exercise: What is the probability generating function for a zero-modified Negative Binomial with
β = 2, r = 3, and probability at zero of 22%?
[Solution: P(z) = 0.22 + (1 - 0.22)(p.g.f. of zero-truncated Negative Binomial) =
0.22 + (0.78){(1 - 2(z-1))-3 - (1+2)-3} / {1 - (1+2)-3} = 0.22 + (0.81){(1 - 2(z-1))-3 - 0.037037}.]
Exercise: Let severity have density: s(0) = 30%, s(1) = 60%, s(2) = 10%. Aggregate losses are
given by a compound zero-modified Negative Binomial distribution, with parameters
β = 2, r = 3, and the probability at zero for the zero-modified Negative Binomial is 22%.
Use the Panjer algorithm to calculate the density at 0 of the aggregate losses.
[Solution: From the previous exercise, the zero-modified Negative Binomial has p.g.f.
P(z) = 0.22 + (0.81){(1 - 2(z-1))-3 - 0.037037}.
c(0) = Pp (s(0)) = Pp (.3) = 0.22 + (0.81) {(1 - 2(.3-1))-3 - 0.037037} = 0.24859.]
Exercise: Use the Panjer algorithm to calculate the density at 2 of the aggregate losses.
[Solution: For the zero-modified Negative Binomial, a = 2/(1+2) = 2/3 and
b = (3-1)(2)/(1+2) = 4/3.
x
s(x) {p1 - (a+ b) p0} 1
1 - a s(0) ∑
c(x) = + (a + j b / x) s(j) c(x - j) =
1 - a s(0)
j=1
x
s(x){0.06 - (2/3 + 4/3)(0.22)}/(1- (2/3)(0.3)) + 1/{1 - (2/3)(0.3)} ∑ (2 / 3 + j4 / 3x) s(j) c(x - j) =
j=1
x
-0.475 s(x) + 0.83333 ∑ (1 + 2j / x) s(j) c(x - j) .
j=1
c(1) = -0.475 s(1) + (0.83333)(1 + (2)(1)/1) s(1) c(0) =

(-0.475)(0.6) + (0.83333)(3)(0.6)(0.24859) = 0.087885.
c(2) = -0.475 s(2) + (0.83333){(1 + (2)(1)/2)s(1) c(1) + (1 + (2)(2)/2)s(2) c(0)} =
(-0.475)(0.1) + (0.83333){(2)(0.6)(0.087885) + (3)(0.1)(0.24859)} = 0.10253.
Comment: The densities out to 10 are: 0.24859, 0.087885, 0.102533, 0.102533, 0.0939881,
0.0811716, 0.0671683, 0.0538092, 0.0420268, 0.0321601, 0.0241992.]
Here is a graph of the density of the aggregate losses:
0.25
0.2
0.15
0.1
0.05
0 5 10 15 20
Other than the large probability of zero aggregate losses, the aggregate losses look like they could
be approximated by one of the size of loss distributions in Appendix A of Loss Models.
Continuous Severity Distributions:
If one has a continuous severity distribution s(x), and the frequency distribution, p, is a member of
the (a, b, 1) class,119 then one has an integral equation for the distribution of aggregate losses, c,
similar to the Panjer Algorithm:120
x
c(x) = p1 s(x) + ∫0 (a + by / x) s(y) c(x - y) dy .
Loss Models merely states this result without using it. Instead as has been discussed, Loss Models
demonstrates how one can employ the Panjer algorithm using a discrete severity distribution. One
can either have started with a discrete severity distribution, or one can have approximated a
continuous severity distribution by a discrete severity distribution, as will be discussed in the next
section.
119
It also holds for the members of the (a, b, 0) class, which is a subset of the (a, b, 1) class.
120
See Theorem 9.10 in Loss Models. This is a Volterra integral equation of the second kind. See for example
Appendix D of Insurance Risk Models, by Panjer and Willmot.
Problems:

• Frequency follows a zero-truncated Poisson with λ = 0.8.
• Severity is discrete and takes on the following values:
Size Probability
0 20%
1 40%
2 20%
3 10%
4 10%
8.1 (2 points) What is the probability that the aggregate losses are zero?
A. less than 12%
E. at least 15%
A. less than 30%
E. at least 33%
A. less than 19%
E. at least 22%
A. less than 9%
E. at least 12%
A. less than 9%
E. at least 12%
A. less than 5%
E. at least 8%
8.7. (14 points) Use the following information:

• Frequency follows a zero-modified Binomial with m = 4, q = 0.4, and pM
0 = 0.2.

Size Probability
0 1/8
100 2/8
200 3/8
500 2/8
Use the recursive method (Panjer Algorithm) to determine the densities of the aggregate losses at
0, 100, 200, 300, 400, 500, 600.

• Frequency follows a zero-truncated Geometric with β = 2.
Size Probability
0 30%
1 40%
2 20%
3 10%
A. less than 12%
E. at least 15%
A. less than 20%
E. at least 23%
A. less than 14%
E. at least 17%
A. less than 12%
E. at least 15%
8.12 (10 points) At a food bank, people volunteer their time on a daily basis.
The number of people who volunteer on any day is a zero-truncated Binomial Distribution with
m = 10 and q = 0.3.
The number of hours that each person helps at the food bank is a zero-truncated Binomial
Distribution with m = 3 and q = 0.4.
The number of volunteers and the number of hours they each help are independent.
With the aid of a computer, determine all of the densities for the distribution of the total number of
hours volunteered per day.
8.1. D. For the zero-truncated Poisson, P(z) = (eλz - 1) / (eλ - 1), a = 0, and b = λ.
P(z) = (eλz - 1) / (eλ - 1) = (e0.8z - 1) / (e0.8 - 1).

c(0) = P(s(0)) = P(0.2) = (e(0.8)(0.2) - 1) / (e0.8 - 1) = 0.141579.
8.2. B. For the zero-truncated Poisson, a = 0 and b = λ = 0.8.

p 0 = 0. p1 = 0.8e-0.8 / (1 - e-0.8) = 0.652773.
x
s(x) {p1 - (a+ b) p0} 1
c(x) =
1 - a s(0)
+ ∑ (a + jb / x) s(j) c(x - j) =
1 - a s(0) j=1
∑ (0 + j 0.8 / x) s(j) c(x - j) =

s(x) {0.652773 - (0.8)(0)} 1
+
1 - (0)(0.2) 1 - (0)(0.2)
j=1
x
0.652773 s(x) + (0.8/x) ∑ j s(j) c(x- j).
j=1
c(1) = 0.652773 s(1) + (0.8/1)(1)s(1)c(0) = (0.652773)(0.4) + (0.8)(0.4)(0.141579) = 0.306414.
8.3. C. c(2) = 0.652773 s(2) + (0.8/2) {1s(1)c(1) + 2s(2)c(0)} =

(0.652773)(0.2) + (0.8/2) {(1)(0.4)(0.306414) + (2)(0.2)(0.141579)} = 0.202233.
8.4. E. c(3) = 0.652773 s(3) + (0.8/3) {1s(1)c(2) + 2s(2)c(1) + 3s(3)c(0)} =

(0.652773)(0.1) + (0.8/3){(1)(0.4)(0.202233) + (2)(0.2)(0.306414) + (3)(0.1)(0.141579)} =
0.130859.
8.5. E. c(4) = 0.652773 s(4) + (0.8/4) {1s(1)c(3) + 2s(2)c(2) + 3s(3)c(1) + 4s(4)c(0)} =

(0.652773)(0.1)
+ (0.8/4){(1)(0.4)(0.130859) + (2)(0.2)(0.202233) + (3)(0.1)(0.306414) + (4)(0.1)(0.141579)}
= 0.121636.
8.6. A. c(5) = 0.652773 s(5) + (0.8/5){5s(1)c(4) + 2s(2)c(3) + 3s(3)c(2) + 4s(4)c(1) + 5s(5)c(0)}

= (0.652773) (0) +
(0.8/5) {(1)(0.4)(0.121636) + (2)(0.2)(0.130859) + (3)(0.1)(0.202233)+ (4)(0.1)(0.306414)
+ (5)(0)(0.141579)} = 0.045477.
Comment: The distribution of aggregate losses from 0 to 15 is: 0.141579, 0.306414, 0.202234,
0.130859, 0.121636, 0.0454774, 0.0249329, 0.0133713, 0.00776193, 0.00303326,
0.00146421, 0.000689169, 0.000325073, 0.000126662, 0.0000556073, 0.0000237919.
Here is a graph:
0.3
0.25
0.2
0.15
0.1
0.05
0 2 4 6 8 10 12 14
8.7. Put the severity in units of 100, so that s(0) = 1/8, s(1) = 2/8, s(2) = 3/8, s(5) = 2/8.
For zero-truncated Binomial: a = -q/(1-q) = -0.4/0.6 = -2/3, b = (m+1)q/(1-q) = (5)(0.4)/(0.6) = 10/3.
{1 + 0.4(z -1)}4 - 0.64
PM(z) = pM
0 + (1 - pM
0 ) PT(z) = 0.2 + 0.8 .
1 - 0.64
{1 + 0.4(1/ 8 -1)}4 - 0.64
c(0) = P(s(0)) = P(1/8) = 0.2 + 0.8 = 0.244951.
1 - 0.64
1 - 0.2
p 0 = pM 3
0 = 0.2. p1 = 1 - 0.64 4 (0.4)(0.6 ) = 0.317647.
x
s(x) {p1 - (a+ b) p0} 1
c(x) =
1 - a s(0)
+ ∑ (a + jb / x) s(j) c(x - j) =
1 - a s(0) j=1
∑ {-2 / 3 + j (10 / 3) / x} s(j) c(x - j) =

s(x) {0.317647 - (-2 / 3 +10 / 3)(0.2)} 1
+
1 - (-2 / 3)(1/ 8) 1 - (-2 / 3)(1/ 8)
j=1
x
-0.199095s(x) + (24/39) ∑ {-1 + j 5 / x} s(j) c(x - j) .
j=1
c(1) = -0.199095 s(1) + (24/39) (4) s(1) c(0) = (-0.199095)(2/8) + (24/39) (4) (2/8) (0.244951) =
0.100965. c(2) = -0.199095 s(2) + (24/39) {(1.5) s(1) c(1) + (4) s(2) c(0)} =
(-0.199095)(3/8) + (24/39) {(1.5)(2/8)(0.100965) + (4)(3/8)(0.244951)} = 0.174748.
c(3) = -0.199095 s(3) + (24/39) {(2/3)s(1)c(2) + (7/3) s(2) c(1) + (4) s(3) c(0)} =
(-0.199095)(0) + (24/39) {(2/3)(2/8)(0.174748) + (7/3)(3/8)(0.100965) + (3)(0)(0.244951)} =
0.072289.
c(4) = -0.199095 s(4) + (24/39){(1/4)s(1)c(3) + (3/2) s(2) c(2) + (11/4) s(3)c(1) + (4) s(4) c(0)} =
(24/39) {(1/4)(2/8)(0.072289) + (3/2)(3/8)(0.174748)} = 0.063270.
c(5) = -0.199095s(5) + (24/39){(0)s(1)c(4) + (1)s(2) c(3) + (2)s(3)c(2) + (3)s(4)c(1) + (4)s(5)c(0)}
= (-0.199095)(2/8) + (24/39) {(3/8)(0.072289) + (4)(2/8)(0.244951)} = 0.117647.
c(6) = -0.199095s(6) +
(24/39){(-1/6)s(1)c(5) + (2/3)s(2)c(4) + (3/2)s(3)c(3) + (7/3)s(4)c(2) + (19/6)s(5)c(1)+ (4)s(6)c(0)}
= (24/39) {(-1/6)(2/8)(0.117647) + (2/3)(3/8)(0.063270) + (19/6)(2/8)(0.100965)} = 0.055905.
In this case, the aggregate can not be more than: (4)(5) = 20, equivalent to $2000.
Here is the whole set of densities: 0.244951, 0.100965, 0.174747, 0.0722886, 0.0632698,
0.117647, 0.0559053, 0.0783088, 0.0223403, 0.0177849, 0.0257812, 0.00840993,
0.0113051, 0.00165441, 0.00124081, 0.00238971, 0.000367647, 0.000551471, 0, 0,
0.0000919118. The only way to get an aggregate of 20 is to have 4 claims each of size 5.
1 - 0.2
The probability of 4 claims is: 0.44 = 0.023529412.
1 - 0.64
Thus the probability that the aggregate is 20 is: (0.023529412)(2/8)4 = 0.0000919118.
8.8. B. For the zero-truncated Geometric: a = β/(1+β) = 2/3, and b = 0,

{1 - β(z -1)}- 1 - {1 + β}- 1 {1 - 2(z -1)} - 1 - 1/ 3 1.5
P(z) = = = - 0.5.
1 - {1 + β} - 1 1 - 1/ 3 1 - 2(z -1)
1.5
c(0) = P(s(0)) = P(0.3) = - 0.5 = 0.125.
1 - 2(0.3 -1)
8.9. B. p0 = 0. p1 = 1/(1 + β) = 1/3.

x
s(x) {p1 - (a+ b) p0} 1
c(x) =
1 - a s(0)
+ ∑ (a + jb / x) s(j) c(x - j) =
1 - a s(0) j=1
∑ (2 / 3 + j 0 / x) s(j) c(x - j) =
s(x) {1/ 3 - (2 / 3 + 0)(0)} 1
+
1 - (2 / 3)(0.3) 1 - (2 / 3)(0.3)
j=1
∑ s(j) c(x - j) .
2/3
s(x)/2.4 +
0.8
j=1
c(1) = s(1)/2.4 + (1/1.2) s(1) c(0) = 0.4/2.4 + (1/1.2) (0.4)(0.125) = 0.208333.
8.10. E. c(2) = s(2)/2.4 + (1/1.2) {s(1)c(1) + s(2)c(0)} =

0.2/2.4 + ((1/1.2) {(0.4)(0.208333) + (0.2)(0.125)} = 0.173611.
8.11. D. c(3) = s(3)/2.4 + (1/1.2) {s(1)c(2) + s(2)c(1) + s(3)c(0)} =

0.1/2.4 + (1/1.2) {(0.4)(0.173611) + (0.2)(0.208333) + (0.1)(0.125)} = 0.144676.
Comment: The aggregate densities from 0 to 20: 0.125, 0.208333, 0.173611, 0.144676,
0.0945216, 0.0700874, 0.0511724, 0.0366155, 0.0265745, 0.0192251, 0.0138888,
0.0100483, 0.00726633, 0.00525422, 0.00379982, 0.00274784, 0.0019871, 0.00143699,
0.00103917, 0.00075148, 0.000543437.
8.12. For zero-truncated Binomial Distribution of number of volunteers:

a = -q/(1-q) = -0.3/0.7 = -3/7, b = (m+1)q/(1-q) = (11)(0.3)/0.7 = 33/7.
{1 + 0.3(z -1)}10 - 0.710
P(z) = .
1 - 0.710
p 0 = 0, and p1 = (10)(0.3)(0.79 ) / ( 1 - 0.710) = 0.1245799.

The densities at 1, 2, and 3 of the zero-truncated Binomial Distribution of number of hours:
0.5510204, 0.3673469, 0.0816327.
{1 + 0.3(0 -1)}10 - 0.710
c(0) = P(s(0)) = P(0) = = 0.
1 - 0.710
x
s(x) {p1 - (a+ b) p0} 1
c(x) =
1 - a s(0)
+ ∑ (a + jb / x) s(j) c(x - j) =
1 - a s(0) j=1
∑ {-3 / 7 + j (33 / 7) / x} s(j) c(x - j) =

s(x) {0.1245699 - (-3 / 7 + 33 / 7)(0)} 1
+
1 - (-3 / 7)(0) 1 - (-3 / 7)(0)
j=1
x
0.1245799 s(x) + (3/7) ∑ {-1 + j 11/ x} s(j) c(x - j) .
j=1
c(1) = 0.1245799 s(1) + (30/7) (10) s(1) c(0) =

(0.1245799)(0.5510204) + (10/7) (0.5510204) (0) = 0.0686461.
c(2) = 0.1245799 s(2) + (3/7) {(4.5) s(1) c(1) + (11) s(2) c(0)} =
(0.1245799)(0.3673469) + (3/7) {(4.5)(0.5510204)(0.0686461) + (11)(0.3673469)(0)} =
0.118713.
The densities at 1 through 30 for the total number of hours are:
0.0686461, 0.118713, 0.153374, 0.164899, 0.152085, 0.123304, 0.0893742, 0.0585136,
0.0348984, 0.0190702, 0.00958846, 0.00444987, 0.00190984, 0.000758938, 0.000279338,
0.0000951842, 0.0000299911, 8.71974x10-6, 2.33222x10-6, 5.7145x10-7, 1.27545x10-7,
2.57369x10-8, 4.64828x10-9, 7.41092x10-10, 1.02312x10-10, 1.1894x10-11, 1.11433x10-12,
7.8157x10-14, 3.59342x10-15, 7.98539x10-17.
Comment: The maximum total number of hours is: (3)(10) = 30.
This can only occur if 10 people volunteer and each help for three hours.
The probability of 10 people volunteering is: 0.310 / (1 - 0.710) = 6.07655 x 10-6.
The probability that a person helps for 3 hours is: 0.43 / (1 - 0.63 ) = 0.0816327.
Thus the probability of 30 hours in total is: (6.07655 x 10-6) (0.081632710) = 7.98539 x 10-17.
2016-C-3, Aggregate Distributions §9 Discretization, HCM 10/21/15, Page 265
Section 9, Discretization121
With a continuous severity distribution, in order to apply the Recursive Method / Panjer Algorithm,
one would first need to approximate this continuous distribution by a discrete severity distribution.
There are a number of methods one could use to do this.
Method of Rounding:122
Assume severity follows an Exponential distribution with θ = 100.

For example, we could use a discrete distribution g, with support 0, 20, 40, 60, 80, 100, etc.
Then we could take g(0) = F(20/2) = F(10) = 1 - e-10/100 = 1 - e-0.1 = 0.095163.
We could let g(20) = F(30) - F(10) = e-.1 - e-.3 = 0.164019.123
Exercise: Continuing in this manner what is g(40)?

[Solution: g(40) = F(50) - F(30) = (1 - e-50/100) - (1 - e-30/100) = e-0.3 - e-0.5 = 0.134288.]
Graphically, one can think of this procedure as balls dropping from above, with their probability
horizontally following the density of this Exponential Distribution. The method of rounding is like
setting up a bunch of cups each of width the span of 20, centered at 0, 20, 40, 60, etc.
0 20 40
Then the expected percentage of balls falling in each cup is the discrete probability produced by
the method of rounding. This discrete probability is placed at the center of each cup.
121
122
See Section 9.6.5.1 of Loss Models. Also called the method of mass dispersal.
123
Loss Models actually takes g(0) = F(10) - Prob(10), g(20) = (F(30) - Prob(30)) - (F(10) - Prob(10)), etc. This makes
no difference for a continuous distribution such as the Exponential. It would make a difference if there happened to
be a point mass of probability at either 10 or 30. Loss Models provides no explanation for this choice of including a
point mass at 30 in the discretized distribution at 40. It is unclear that this choice is preferable to instead either
including a point mass at 30 in the discretized distribution at 20 or splitting it equally between 20 and 40.
We could arrange this calculation in a spreadsheet:

x F(x+10) g(x) x F(x+10) g(x)
0 0.095163 0.095163 400 0.983427 0.003669
20 0.259182 0.164019 420 0.986431 0.003004
40 0.393469 0.134288 440 0.988891 0.002460
60 0.503415 0.109945 460 0.990905 0.002014
80 0.593430 0.090016 480 0.992553 0.001649
100 0.667129 0.073699 500 0.993903 0.001350
120 0.727468 0.060339 520 0.995008 0.001105
140 0.776870 0.049402 540 0.995913 0.000905
160 0.817316 0.040447 560 0.996654 0.000741
180 0.850431 0.033115 580 0.997261 0.000607
200 0.877544 0.027112 600 0.997757 0.000497
220 0.899741 0.022198 620 0.998164 0.000407
240 0.917915 0.018174 640 0.998497 0.000333
260 0.932794 0.014879 660 0.998769 0.000273
280 0.944977 0.012182 680 0.998992 0.000223
300 0.954951 0.009974 700 0.999175 0.000183
320 0.963117 0.008166 720 0.999324 0.000150
340 0.969803 0.006686 740 0.999447 0.000122
360 0.975276 0.005474 760 0.999547 0.000100
380 0.979758 0.004482 780 0.999629 0.000082
400 0.983427 0.003669 800 0.999696 0.000067
The discrete distribution g is the result of discretizing the continuous Exponential distribution.
Note that one could continue beyond 800 in the same manner, until the probabilities got sufficiently
small for a particular application.
This is an example of the Method of Rounding.

For the Method of Rounding with span h, construct the discrete distribution g:124
g(0) = F(h/2). g(ih) = F(h(i + 1/2)) - F(h(i - 1/2)).
We have that for example F(30) = G(30) = 1 - e-0.3. In this example, F and G match at 10, 30, 50,
70, etc. In general, the Distribution Functions match at all of the points halfway between
the support of the discretized distribution obtained from the method of rounding.125
In this example, the span was 20, the spacing between the chosen discrete sizes. If one had
instead taken a span of 200, there would have been one tenth as many points and the
approximation would have been worse. If instead one had taken a span of 2, there would have
been 10 times as many points and the approximation would have been much better.
124
“Span” here differs from the the somewhat similar concept of the “bandwidth” as using in kernel smoothing
discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
125
This contrasts with the method of local moment matching, to be discussed subsequently.
Since one discretizes in order to simplify calculations, usually one wants to have fewer points, and
thus a larger span. This goal conflicts with the desire to have a good approximation to the continuous
distribution, which requires a smaller span. Thus in practical applications one needs to select a
span that is neither too small nor too large. One can always test whether making the span
smaller would materially affect your results.
One could use this discretized severity distribution obtained from the method of rounding in the
Panjer Algorithm in order to approximate the aggregate distribution.
Exercise: Using the above discretized approximation to the Exponential distribution with
θ = 100, and a Geometric frequency with β = 9, calculate the first four densities of the aggregate
distribution via the Panjer Algorithm.
[Solution: The discretized distribution has span of 20, so we treat 20 as 1, 40 as 2, etc., for
purposes of the Panjer Algorithm.
The p.g.f. of the Geometric Distribution is: P(z) = 1/{1 - 9(z - 1)} = 1/(10 - 9z).
c(0) = P(s(0)) = P(0.095163) = 1/{10 - 9(0.095163)} = 0.10937.
For the Geometric Distribution: a = β/(1 + β) = 9/10 = 0.9 and b = 0.
1/(1 - as(0)) = 1/{1 - (0.9)(0.095163)} = 1.10967.
x x x
∑ (a + jb / x) s(j) c(x - j) = 1.10967 ∑ 0.9 s(j) c(x - j) = 0.98430 ∑ s(j) c(x - j) .

1
c(x) =
1 - a s(0)
j=1 j=1 j=1
c(1) = 0.98430 s(1)c(0) = (0.98430)(0.164019)(0.10937) = 0.01766.

c(2) = 0.98430 {s(1)c(1) + s(2)c(0)} =
(0.98430){(0.164019)(0.01766) + (0.134288)(0.10937)} = 0.01731.
c(3) = 0.98430 {s(1)c(2)+ s(2)c(1) + s(3)c(0)} =
(0.98430){(0.164019)(0.01731) + (0.134288)(0.01766) + (0.109945)(0.10937)} = 0.01695.]
Thus the approximate discrete densities of the aggregate distribution at 0, 20, 40, and 60 are:
0.10937, 0.01766, 0.01731, 0.01695.
Exercise: What is the moment generating function of aggregate losses if the severity is
Exponential with θ = 100 and frequency is Geometric with β = 9?
[Solution: For the Exponential Distribution the m.g.f. is: MX(t) = (1 - 100t)-1.
The p.g.f. of the Geometric Distribution is: P(z) = 1/{1 - 9(z - 1)} = 1/(10 - 9z).
M A(t) = 1 /{ 10 - 9(1 - 100t)-1} = (1 - 100t) / (1 - 1000t).]
Note that (1 - 100t)/(1 - 1000t) = 0.1 + (0.9){1/(1 - 1000t)}. This is the weighted average of the
m.g.f. of a point mass at zero and the m.g.f. of an Exponential distribution with mean 1000.
Therefore, the aggregate distribution is a weighted average of a point mass at zero and an
Exponential distribution with mean 1000, using weights 10% and 90%.126
Thus the distribution function of aggregate losses for x > 0 is:
C(x) = 0.1 + 0.9(1 - e-x/1000) = 1 - 0.9e-x/1000.
One can create a discrete approximation to this aggregate distribution via the method of
rounding with a span of 20. Here are the first four discrete densities:
g(0) = C(10) = 1 - 0.9 e-0.01 = 0.10896.
g(20) = C(30) - C(10) = 0.9(e-0.01 - e-0.03) = 0.01764.
g(40) = C(50) - C(30) = 0.9(e-0.03 - e-0.05) = 0.01729.
g(60) = C(70) - C(50) = 0.9(e-0.05 - e-0.07) = 0.01695.
This better discrete approximation to the aggregate distribution is similar to the previous
approximation obtained by applying the Panjer Algorithm using the approximate severity
distribution:
Discrete Density from Applying Panjer Discrete Density from Applying Method of
x Algorithm to the Approximate Severity Rounding to the Exact Aggregate Distribution
0 0.10937 0.10896
20 0.01766 0.01764
40 0.01731 0.01729
60 0.01695 0.01695
126
This is an example of a general result discussed in my section on Analytic Results.
Exercise: Create a discrete approximation to a Pareto Distribution with θ = 40 and

α = 3, using the method of rounding with a span of 50. Stop at 1000.
[Solution: F(x) = 1 - {40/(40 + x)}3 .
For example, g(50) = F(75) - F(25) = (40/(40 + 25))3 - (40/(40 + 75))3 = 0.190964.
g(0) = F(25) = 0.767.
F(25) = 0.767
g(50) = F(75) - F(25) = 0.191.
F(75) = 0.958
g(100) = F(125) - F(75) = 0.028.
F(125) = 0.986
g(150) = F(175) - F(125) = 0.008.
F(175) = 0.994
g(200) = F(225) - F(175) = 0.003.
F(225) = 0.997
etc.
x F(x+25) g(x) x F(x+25) g(x)

0 0.766955 0.766955 550 0.999725 0.000080
50 0.957919 0.190964 600 0.999782 0.000058
100 0.985753 0.027834 650 0.999825 0.000043
150 0.993560 0.007807 700 0.999857 0.000032
200 0.996561 0.003001 750 0.999882 0.000025
250 0.997952 0.001391 800 0.999901 0.000019
300 0.998684 0.000731 850 0.999916 0.000015
350 0.999105 0.000421 900 0.999929 0.000012
400 0.999363 0.000259 950 0.999939 0.000010
450 0.999531 0.000168 1000 0.999947 0.000008
500 0.999645 0.000114
Comment: By stopping at 1000, there is 1 - 0.999947 = 0.000053 of probability not included in
the discrete approximation. One could place this additional probability at some convenient spot.
For example, we could figure out where 1 - F(x) = 0.000053/2. This occurs at x = 1302. Thus
one might put a probability of 0.000053 at 1300.]
The sum of the first n densities that result from the method of rounding is:
F(h/2) + F(3h/2) - F(h/2) + F(5h/2) - F(3h/2) + ... + F(h(n + 1/2)) - F(h(n - 1/2)) = F(h(n + 1/2)).
As n goes to infinity, this sum approaches F(∞) = 1.
Thus the method of rounding includes in the discrete distribution all of the probability.
Average of the Result of the Method of Rounding:
The mean of the discrete distribution that results from the method of rounding is:
0 F(h/2) + h{F(3h/2) - F(h/2)} + 2h{F(5h/2) - F(3h/2)} + 3h{F(7h/2) - F(5h/2)} + ... =
h{S(h/2) - S(3h/2)} + 2h{S(3h/2) - S(5h/2)} + 3h{S(5h/2) - S(3h/2)} + ... =
∞
h{S(h/2) + S(3h/2) + S(5h/2) + S(7h/2) + ...} ≅ ∫0 S(x) dx = E[X].
Thus the method of rounding produces a discrete distribution with approximately the same mean as
the continuous distribution we are approximating. The smaller the span, h, the better the
approximation will be.
Here is a computation of the mean of the previous result of applying the method of rounding with
span 20 to an Exponential distribution with θ = 100.
x F(x+10) g(x) Extension x F(x+10) g(x) Extension
0 0.0952 0.0952 0.0000 400 0.98343 0.00367 1.4677
20 0.2592 0.1640 3.2804 420 0.98643 0.00300 1.2617
40 0.3935 0.1343 5.3715 440 0.98889 0.00246 1.0822
60 0.5034 0.1099 6.5967 460 0.99090 0.00201 0.9263
80 0.5934 0.0900 7.2013 480 0.99255 0.00165 0.7914
100 0.6671 0.0737 7.3699 500 0.99390 0.00135 0.6749
120 0.7275 0.0603 7.2407 520 0.99501 0.00111 0.5747
140 0.7769 0.0494 6.9162 540 0.99591 0.00090 0.4886
160 0.8173 0.0404 6.4715 560 0.99665 0.00074 0.4149
180 0.8504 0.0331 5.9607 580 0.99726 0.00061 0.3518
200 0.8775 0.0271 5.4224 600 0.99776 0.00050 0.2979
220 0.8997 0.0222 4.8835 620 0.99816 0.00041 0.2521
240 0.9179 0.0182 4.3617 640 0.99850 0.00033 0.2130
260 0.9328 0.0149 3.8687 660 0.99877 0.00027 0.1799
280 0.9450 0.0122 3.4110 680 0.99899 0.00022 0.1517
300 0.9550 0.0100 2.9922 700 0.99917 0.00018 0.1279
320 0.9631 0.0082 2.6131 720 0.99932 0.00015 0.1077
340 0.9698 0.0067 2.2732 740 0.99945 0.00012 0.0906
360 0.9753 0.0055 1.9706 760 0.99955 0.00010 0.0762
380 0.9798 0.0045 1.7030 780 0.99963 0.00008 0.0640
400 0.9834 0.0037 1.4677 800 0.99970 0.00007 0.0538
Sum 101.0249003
In this case, the mean of the discrete distribution is 101, compared to 100 for the Exponential.
For a longer-tailed distribution such as a Pareto, the approximation might not be this close.
Method of Local Moment Matching:127
The method of local moment matching is another technique for approximating a continuous
distribution by a discrete distribution with a span of h. In the method of moment matching, the
approximating distribution will have the same lower moments as the original distribution.
In the simplest case, one requires that the means match. In a more complicated version, one could
require that both the first and second moments match.
In order to have the means match, using a span of h, the approximating densities are:128
g(0) = 1 - E[X ∧ h]/h.
g(ih) = {2E[X ∧ ih] - E[X ∧ (i-1)h] - E[X ∧ (i+1)h]} / h.
For example, for an Exponential Distribution, E[X ∧ x] = θ(1 - e-x/θ).
g(ih) = {2E[X ∧ ih] - E[X ∧ (i-1)h] - E[X ∧ (i+1)h]} / h = θe-ih/θ{eh/θ + e-h/θ - 2}/h.
For an Exponential Distribution with θ = 100 using a span of 20:

g(0) = 1 - E[X ∧ 20]/20 = 1 - (100)(1 - e-.2)/20 = 0.093654.
g(ih) = θe-ih/θ{eh/θ + e-h/θ - 2}/h = (100)e-i/5{e.2 + e-.2 - 2}/20 = 0.2000668e-i/5.
g(20) = 0.2000668e-1/5 = 0.164293.
g(40) = 0.2000668e-2/5 = 0.134511.
Out to 800, the approximating distribution is:

0.093654, 0.164293, 0.134511, 0.110129, 0.090166, 0.073821, 0.060440, 0.049484,
0.040514, 0.033170, 0.027157, 0.022235, 0.018204, 0.014904, 0.012203, 0.009991,
0.008180, 0.006697, 0.005483, 0.004489, 0.003675, 0.003009, 0.002464, 0.002017,
0.001651, 0.001352, 0.001107, 0.000906, 0.000742, 0.000608, 0.000497, 0.000407,
0.000333, 0.000273, 0.000223, 0.000183, 0.000150, 0.000123, 0.000100, 0.000082,
0.000067.129
In general, calculating the mean matching discrete distribution requires that one calculate the limited
expected value of the original distribution at each of the spanning points.
127
See Section 9.6.5.2 of Loss Models.
128
For a distribution with positive support, for example x > 0.
Obviously, this would only be applied to a distribution with a finite mean.
129
While this is close to the method of rounding approximation calculated previously, they differ.
Exercise: Create a discrete approximation to a Pareto Distribution with θ = 40 and α = 3,

matching the mean, with a span of 10. Stop at 400.
[Solution: E[X ∧ x] = {θ/(α-1)} (1 - {θ/(x+θ)}α−1) = 20(1 - {40/(x+40)}2 ).
x LEV(x) g(x) Extension x LEV(x) g(x) Extension
0 0.0000 28.000% 0.0000 200 19.4444
10 7.2000 32.889% 3.2889 210 19.4880 0.049% 0.1035
20 11.1111 15.528% 3.1057 220 19.5266 0.042% 0.0927
30 13.4694 8.277% 2.4830 230 19.5610 0.036% 0.0833
40 15.0000 4.812% 1.9249 240 19.5918 0.031% 0.0751
50 16.0494 2.988% 1.4938 250 19.6195 0.027% 0.0680
60 16.8000 1.952% 1.1715 260 19.6444 0.024% 0.0617
70 17.3554 1.330% 0.9308 270 19.6670 0.021% 0.0562
80 17.7778 0.937% 0.7494 280 19.6875 0.018% 0.0514
90 18.1065 0.679% 0.6110 290 19.7062 0.016% 0.0470
100 18.3673 0.504% 0.5041 300 19.7232 0.014% 0.0432
110 18.5778 0.382% 0.4203 310 19.7388 0.013% 0.0397
120 18.7500 0.295% 0.3539 320 19.7531 0.011% 0.0366
130 18.8927 0.231% 0.3006 330 19.7663 0.010% 0.0338
140 19.0123 0.184% 0.2574 340 19.7784 0.009% 0.0313
150 19.1136 0.148% 0.2220 350 19.7896 0.008% 0.0291
160 19.2000 0.121% 0.1928 360 19.8000 0.008% 0.0270
170 19.2744 0.099% 0.1685 370 19.8096 0.007% 0.0252
180 19.3388 0.082% 0.1480 380 19.8186 0.006% 0.0235
190 19.3951 0.069% 0.1308 390 19.8269 0.006% 0.0219
200 19.4444 0.058% 0.1161 400 19.8347 0.005% 0.0205
210 19.4880 410 19.8420 Sum 19.5441
For example, g(0) = 1 - E[X ∧ 10]/10 = 1 - 7.2/10 = 28%.

g(10) = {2E[X ∧ 10] - E[X ∧ 20] - E[X ∧ 0]}/10 = {(2)(7.2) - 11.111 - 0}/10 = 32.89%.
g(20) = {2E[X ∧ 20] - E[X ∧ 30] - E[X ∧ 10]}/10 = {(2)(11.111) - 13.469 - 7.2}/10 = 15.53%.
Comment: Summing through 400, the mean of the approximating distribution is 19.544 < 20, the
mean of the Pareto. The Pareto is a long-tailed distribution, and we would need to include values of
the approximating distribution beyond g(400), in order to get closer to the mean.]
Relationship to Layers of the Method of Mean Matching:
Note that the numerator of g(ih) can be written as a difference of layers of loss:130
ih ih+h
g(ih) = {(E[X ∧ ih] - E[X ∧ (i-1)h]) - (E[X ∧ (i+1)h] - E[X ∧ ih])} / h = { ∫ih-h S(x) dx - ∫ih S(x) dx } / h.
This numerator is nonnegative, since S(x) is a nondecreasing function of x.

Thus all of the approximating discrete densities are nonnegative, when we match the mean.131
h g(ih) = {(E[X ∧ ih] - E[X ∧ (i-1)h]) - (E[X ∧ (i+1)h] - E[X ∧ ih])} = Layi - Layi+1,
where Layi is the layer from (i-1)h to ih, i = 1, 2, 3, ...
The first four of these successive layers are shown on the following Lee Diagram:132
Size
4h
Lay. 4
3h
Layer 3
2h
Layer 2
h
Layer 1
Prob.
1
g(ih) = Layi /h - Layi+1 /h = (average width of area i) - (average width of area i+1)
= (average contribution of S(x) to Layer i) - (average contribution of S(x) to Layer i+1).
130
This formula works even when i = 0, since we have assumed S(x) = 1 for x ≤ 0.
131
If one matches the first two moments, this nice property does not necessarily hold.
132
Lee Diagrams are not on the syllabus of this exam. See “Mahlerʼs Guide to Loss Distributions”.
Demonstration that the Densities Given by the Formulas Do Match the Mean:
n n ih ih+h 0 nh+h nh+h
∑ g(ih) = ∑ { ∫ S(x) dx - ∫ S(x) dx} / h = { ∫ S(x) dx - ∫ S(x) dx }/h = 1 - ∫ S(x) dx / h.

i=0 i=0 ih-h ih -h nh nh
The final term goes to zero as n approaches ∞, since S(x) goes to zero as x approaches ∞.
Therefore, the densities of the approximating distribution do sum to 1.
n n ih ih+h n-1 ih+h n ih+h
∑ ih g(ih) = ∑ i { ∫ S(x) dx - ∫ S(x) dx} = ∑ (i +1) ∫ S(x) dx - ∑ (i +1) ∫ S(x) dx

i=0 i=1 ih-h ih i=0 ih i=1 ih
n-1 ih+h nh+h nh nh+h
∑ ∫ S(x) dx - n ∫ S(x) dx = ∫ S(x) dx - n ∫ S(x) dx .

i=0 ih nh 0 nh
As n approaches infinity, the first term goes to the integral from zero to infinity of the survival function,
which is the mean. Assuming the mean exists, xS(x) goes to zero as x approaches infinity.133
Therefore, the second term goes to zero as n approaches infinity.134 Therefore, the mean of the
discretized distribution, g, matches the mean of the original distribution.
n
One can rewrite the above as, ∑ ih g(ih) = E[X ∧ nh] - n{E[X ∧ nh + h] - E[X ∧ nh]}
i=0
= (n+1) E[X ∧ nh] - n E[X ∧ nh + h].
For example, when the Pareto was approximated, the sum up to n = 40 was:
(41)E[X ∧ 400] - (40)E[X ∧ 410] = (41)(19.8420) - (40)(19.8347) = 19.54.
Another way of showing that the mean of the approximating discrete distribution matches that of the
original continuous distribution:
n ∞ ∞ ∞ ∞ ∞ ∞
∑ ih g(ih) = ∑ i (Layi - Layi + 1) = ∑ i Layi - ∑ i Layi + 1 = ∑ i Layi - ∑ (i -1) Layi = ∑ Layi

i=0 i=1 i=1 i=1 i=1 i=1 i=1
= Mean.
133
If S(x) ~ 1/x for large x, then the integral of S(x) to infinity would not exist, and therefore neither would the mean.
134
The second term is n times the layer from nh to nh+h. As n approaches infinity, the layer starting at nh of width h
has to go to zero faster than 1/n. Otherwise when we add them up, we get an infinite sum. (The sum of 1/n
diverges.) If we got an infinite sum, then the mean would not exist.
Matching the First Two Moments:
According to Loss Models, matching the first two moments for the discretized distribution, results in
more accurate results, when for example calculating stop loss premiums. While the equations for
moment matching shown in Loss Models can be written out for the case of matching the first two
moments and then can be programmed on a computer, this is well beyond the level of calculations
you should be expected to perform on the exam!135
Matching the first two moments, the densities of the approximating distribution are:136
2h
g(0) = ∫0 (x2 - 3hx + 2h2) f(x) dx / (2h2).
ih+h
For i odd, g(ih) = - ∫ih-h {x2 - 2ihx + (i2 -1)h2} f(x) dx / h2.
For i even, g(ih) =
ih ih+2h
{ ∫ {x2 - (2i - 3)hx + (i-1)(i - 2)h2 } f(x) dx +
∫ih {x2 - (2i + 3)hx + (i+ 1)(i + 2)h2} f(x) dx }/(2h2 ).
ih-2h
Applying these formulas to an Exponential Distribution with mean 50, using a span of 10,
f(x) = e-x/50/50 and h = 10:
2h 20
g(0) = ∫0 (x2 - 3hx + 2h2) f(x) dx / (2h2) = ∫0 {(x2 - 30x + 200) e- x / 50 / 50} dx / 200
= 0.0661987.
10+h 20
g(10) = - ∫ ∫0
{x2 - 2ihx + (i2 -1)h 2} f(x) dx /h2 = - {(x2 - 20x) e- x / 50 / 50} dx / 100
10-h
= 0.219203.
135
Loss Models does not show an example of matching the first two moments.
Matching the first three moments is even more complicated.
136
Derived from equations 9.27 and 9.28 in Loss Models.
20 40
g(20) = { ∫0 {(x2 - 10x) e- x / 50 / 50} dx + 20∫ {(x2 - 70x + 1200) e - x / 50 / 50} dx} / 200
= 0.0886528.
Note these formulas involve various integrals of x2 f(x) and x f(x). Thus, one needs to be able to
calculate such integrals. For an Exponential Distribution:
b b x =b
∫a x2 f(x) dx =
∫a x2 e- x / θ / θ dx = -(x2 + 2xθ + 2θ2)e - x / θ ]
x=a
= (a2 + 2aθ + 2θ2)e-a/θ - (b2 + 2bθ + 2θ2)e-b/θ.
b b x= b
∫a x f(x) dx = ∫a x e- x / θ / θ dx = -(x + θ)e - x / ]

θ
x= a
= (a + θ)e-a/θ - (b + θ)e-b/θ.
The resulting approximating densities at 0, 10, 20, 30,..., 600 were:

0.0661987, 0.219203, 0.0886528, 0.146936, 0.0594257, 0.0984942, 0.0398343,
0.0660226, 0.0267017, 0.0442563, 0.0178987, 0.0296659, 0.0119979, 0.0198856,
0.0080424, 0.0133297, 0.00539098, 0.00893519, 0.00361368, 0.00598944,
0.00242232, 0.00401484, 0.00162373, 0.00269123, 0.00108842, 0.00180398,
0.00072959, 0.00120925, 0.000489059, 0.000810582, 0.000327826.
Coverage Modifications:
I have previously discussed the effect of deductibles and maximum covered losses on the
aggregate distribution. Once one has the modified frequency and severity distributions, one can
apply the Panjer Algorithm or other technique of estimating the aggregate losses in the usual
manner.
In the case of a continuous severity, one could perform the modification and discretization in either
order. However, Loss Models recommends that you perform the modification first and discretization
second.137
137
Problems:

One creates a discrete approximation to a Weibull Distribution with
θ = 91 and τ = 1.4, using the method of rounding with a span of 25.
9.1 (1 point) What is the density of this discrete approximation at 150?

A. less than 6.0%
E. at least 6.3%
9.2 (1 point) For this discrete approximation, what is the probability of a loss less than or equal to
75?
A. less than 54%
E. at least 60%
9.3 (2 points) An Exponential Distribution with θ = 70 is approximated using the method of matching
means with a span of 5. What is the density of the approximating distribution at 60?
A. 2.0% B. 2.5% C. 3.0% D. 3.5% E. 4.0%
9.4 (2 points) A LogNormal Distribution with µ = 8 and σ = 2 is approximated using the method of
rounding with a span of 2000.
What is the density of the approximating distribution at 20,000?
A. 1.3% B. 1.5% C. 1.7% D. 1.9% E. 2.1%

A Pareto Distribution with θ = 1000 and α = 2 is approximated using the method of matching means
with a span of 100.
9.5 (2 points) What is the density of the approximating distribution at 500?

A. 4.0% B. 4.5% C. 5.0% D. 5.5% E. 6.0%

A. 8.0% B. 8.5% C. 9.0% D. 9.5% E. 10.0%
9.7 (1 point) An Exponential Distribution with θ = 300 is approximated using the method of rounding
with a span of 50. What is the density of the approximating distribution at 400?
A. 3.0% B. 3.5% C. 4.0% D. 4.5% E. 5.0%

• Frequency follows a Poisson Distribution with λ = 0.8.
• Severity is Exponential Distribution with θ = 3.
• The severity distribution is to be approximated via the method of rounding with a span of 1.
A. less than 35%
E. at least 50%
A. less than 8%
E. at least 11%
A. less than 8%
E. at least 11%
A. less than 5%
E. at least 8%
A. less than 5%
E. at least 8%
A. less than 5%
E. at least 8%
9.14 (3 points) Losses follow a Pareto Distribution with θ = 100 and α = 3.

There is a deductible of 5, coinsurance of 80%, and a maximum covered loss of 100.
The per loss variable is approximated using the method of rounding with a span of 4.
What is the density of the approximating distribution at 40?
A. 2.4% B. 2.6% C. 2.8% D. 3.0% E. 3.2%

A Pareto Distribution with θ = 1000 and α = 4 is approximated using the method of rounding with a
span of 100.

A. 4.7% B. 4.9% C. 5.1% D. 5.3% E. 5.5%
9.16 (1 point) What is the density of the approximating distribution at 0?

A. 16.9% B. 17.1% C. 17.3% D. 17.5% E. 17.7%
9.17 (3 points) A LogNormal Distribution with µ = 7 and σ = 0.5 is approximated using the method
of matching means with a span of 200.
A. 3.5% B. 3.7% C. 3.9% D. 4.1% E. 4.3%
9.18 (3 points) Losses follow a Pareto Distribution with θ = 100 and α = 3.

There is a deductible of 5, coinsurance of 80%, and a maximum covered loss of 100.
The per payment variable is approximated using the method of rounding with a span of 4.
A. 1.7% B. 1.9% C. 2.1% D. 2.3% E. 2.5%
9.19 (3 points) An Exponential Distribution with θ = 100 is approximated using the method of
matching means with a span of 25.
Let R be the density of the approximating distribution at 0.
Let S be the density of the approximating distribution at 75.
What is R + S?
A. 23% B. 25% C. 27% D. 29% E. 31%
9.1. E. F(x) = 1 - exp[-(x/91)1.4]. g(150) = F(150+12.5) - F(150-12.5) =

exp[-(137.5/91)1.4] - exp[-(162.5/91)1.4] = 0.16826 - 0.10521 = 6.305%.
Comment: Hereʼs is a table of some values of the approximating distribution:
x F(x+12.5) g(x)
0 0.060201 0.060201
25 0.251032 0.190831
50 0.446217 0.195185
75 0.611931 0.165714
100 0.739649 0.127718
125 0.831739 0.092090
150 0.894794 0.063055
175 0.936158 0.041363
200 0.962307 0.026149
225 0.978304 0.015998
250 0.987806 0.009502
275 0.993298 0.005492
300 0.996394 0.003096
9.2. E. The distribution function of the discrete approximating density at 75 is:

g(0) + g(25) + g(50) + g(75)
= F(12.5) + {F(37.5) - F(12.5)} + {F(62.5) - F(37.5)} + {F(87.5) - F(62.5)}
= F(87.5) = 1 - exp[-(87.5/91)1.4] = 61.2%.
Alternately, the approximating distribution and the Weibull Distribution are equal at the points
midway between the span points of: 50, 75, 100, etc.
The distribution function of the approximating distribution at 75 = the distribution function of the
approximating distribution at 87.5 = the Weibull Distribution at 87.5.
F(x) = 1 - exp[-(x/91)1.4]. F(87.5) = 1 - exp[-(87.5/91)1.4] = 61.2%.
Comment: The following diagram might be helpful:
x 0 25 50 75
x+12.5 12.5 37.5 62.5 87.5
F(x+12.5) 6.0% 25.1% 44.6% 61.2%
g(x) 6.0% 19.1% 19.5% 16.6%
9.3. C. E[X ∧ x] = θ(1 - e-x/θ) = 70(1 - e-x/70).

E[X ∧ 55] = 38.0944. E[X ∧ 60] = 40.2939. E[X ∧ 65] = 42.3418.
g(60) = {2E[X ∧ 60] - E[X ∧ 55] - E[X ∧ 65]}/5 = {(2)(40.2939) - 38.0944 - 42.3418}/5 = 3.03%.
9.4. A. g(20000) = F(21000) - F(19000) = Φ[(ln(21000) - 8)/2] - Φ[(ln(19000) - 8)/2] =

Φ[.98] - Φ[.93] = 0.8365 - 0.8238 = 0.0127.
9.5. E. E[X ∧ x] = {θ/(α-1)}(1 - {θ/(x+θ)}α−1) = 1000{1 - {1000/(x+1000)}1 ) = 1000x/(x+1000).

E[X ∧ 400] = 285.7. E[X ∧ 500] = 333.3. E[X ∧ 600] = 375.
g(500) = {2E[X ∧ 500] - E[X ∧ 400] - E[X ∧ 600]}/100 = {(2)(333.3) - 285.7 - 375}/100 = 5.9%.
9.6. C. E[X ∧ 100] = {1000/(2-1)}( 1 - {1000/(1000 + 100)}2-1) = 90.91.

g(0) = 1 - E[X ∧ 100]/100 = 1 - 90.91/100 = 9.09%.
9.7. D. g(400) = F(425) - F(375) = e-375/300 - e-425/300 = 0.044.
9.8. E. P(z) = eλ(z-1) = e.8(z-1).

The method of rounding assigns probability to zero of: F(.5) = 1 - e-.5/3 = 0.153518.
c(0) = P(s(0)) = P(.153518) = e.8(.153518-1) = 0.508045.
9.9. C. The method of rounding assigns probability to 1 of F(1.5) - F(.5) = e-.5/3 - e-1.5/3 =
0.846482 - 0.606531 = 0.239951.
x F(x+.5) s(x) x F(x+.5) s(x)
0 0.153518 0.153518 7 0.917915 0.032474
1 0.393469 0.239951 8 0.941184 0.023269
2 0.565402 0.171932 9 0.957856 0.016673
3 0.688597 0.123195 10 0.969803 0.011946
4 0.776870 0.088273 11 0.978363 0.008560
5 0.840120 0.063250 12 0.984496 0.006134
6 0.885441 0.045321 13 0.988891 0.004395
For the Poisson, a = 0 and b = λ = 0.8.
x x
∑ ∑ (0 + 0.8j / x) s(j) c(x - j) =

1 1
c(x) = (a + jb / x) s(j) c(x - j) =
1 - a s(0) 1 - (0)(0.153518)
j=1 j=1
x
0.8 ∑ (j/ x) s(j) c(x - j) .
j=1
c(1) = (0.8)(1/1)s(1)c(0) = (0.8)(1)(0.239951)(0.508045) = 0.097525.
9.10. A. c(2) = (0.8){(1/2)s(1)c(1) + (2/2)s(2)c(0)} =

(.8){(1/2)(0.239951)(0.097525) + (1)(0.171932)(0.508045)} = 0.079240.
9.11. C. c(3) = (.8){(1/3)s(1)c(2) + (2/3)s(2)c(1) + (3/3)s(3)c(0)} =

(.8){(1/3)(.239951)(.079240) + (2/3)(.171932)(.097525) + (3/3)(.123195)(.508045)} =
0.064084.
9.12. B. c(4) = (.8){(1/4)s(1)c(3) + (2/4)s(2)c(2) + (3/4)s(3)c(1) + (4/4)s(4)c(0)} =

(.8){(1/4)(.239951)(.064084) + (2/4)(.171932)(.079240) + (3/4)(.123195)(.097525)+
(4/4)(.088273)(.508045)} = 0.051611.
9.13. A. c(5) = (.8){(1/5)s(1)c(4) + (2/5)s(2)c(3) + (3/5)s(3)c(2) + (4/5)s(4)c(1) + (5/5)s(5)c(0) =

(.8){(1/5)(.239951)(.051611) + (2/5)(.171932)(.064084) + (3/5)(.123195)(.079240) +
(4/5)(.088273)(.097525) + (5/5)(.063250)(.508045)} = 0.0414097.
Comment: The distribution of aggregate losses from 0 to 30 is: 0.508045, 0.0975247, 0.07924,
0.064084, 0.0516111, 0.0414099, 0.033112, 0.0263947, 0.0209802, 0.0166327, 0.0131542,
0.0103797, 0.0081733, 0.00642328, 0.0050387, 0.00394576, 0.00308486, 0.0024081,
0.00187708, 0.00146114, 0.00113589, 0.000881934, 0.000683946, 0.000529806,
0.00040996, 0.000316896, 0.000244715, 0.000188795, 0.00014552, 0.000112066,
0.00008623.
1
0.1
0.01
0.001
0.0001
0 5 10 15 20 25 30
9.14. B. Prior to the policy modifications, F(x) = 1 - {100/(100 + x)}3 .

Let y be the payment per loss.
y = 0 for x ≤ 5.
y = 0.8(x - 5) = 0.8x - 4, for 5 < x ≤ 100.
y = (0.8)(95) = 76, for 100 ≤ x.
Let H(y) be the distribution of the payments per loss.
H(0) = F(5) = 1 - (100/(100 + 5))3 = 0.1362.
H(y) = F(x) = F((y+4)/.8) = F(1.25y + 5) = 1 - (100/(105 + 1.25y))3 , for 0 < y < 76.
H(76) = 1.
g(40) = H(42) - H(38) = (100/{105 + (1.25)(38)})3 - {(100/{105 + (1.25)(42)})3 = 2.6%.
Comment: Note that we apply the modifications first and then discretize.
g(0) = H(2) = 0.1950. Note that at 76 we would include all the remaining probability:
1 - H(74) = {100/(105 + 1.25(74))}3 = 0.1298.
Here is the whole approximating distribution:
y H(y+2) g(y)
0 0.1950 0.1950
4 0.2977 0.1026
8 0.3836 0.0859
12 0.4560 0.0724
16 0.5175 0.0615
20 0.5701 0.0526
24 0.6153 0.0452
28 0.6544 0.0391
32 0.6884 0.0340
36 0.7180 0.0297
40 0.7440 0.0260
44 0.7670 0.0229
48 0.7872 0.0203
52 0.8052 0.0180
56 0.8212 0.0160
60 0.8355 0.0143
64 0.8483 0.0128
68 0.8598 0.0115
72 0.8702 0.0104
76 0.1298
9.15. D. F(x) = 1 - {1000/(1000 + x)}4 .

g(500) = F(550) - F(450) = (1000/(1000 + 450))4 - (1000/(1000 + 550))4 = 0.053.
9.16. E. g(0) = F(50) = 1 - (1000/(1000 + 50))4 = 17.7%.

9.17. C. E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]} =

1242.6 Φ[2 lnx - 14.5] + x{1 - Φ[2 lnx - 14]}.
E[X ∧ 1800] = 1242.6 Φ[.49] + 1800{1 - Φ[.99]} = (1242.6)(.6879) + (1800)(1 - 0.8389) =
1144.8.
E[X ∧ 2000] = 1242.6 Φ[.70] + 2000{1 - Φ[1.20]} = (1242.6)(.7580) + (2000)(1 - 0.8849) =
1172.1.
E[X ∧ 2200] = 1242.6 Φ[.89] + 2000{1 - Φ[1.39]} = (1242.6)(.8133) + (2200)(1 - 0.9177) =
1191.7.
g(2000) = {2E[X ∧ 2000] - E[X ∧ 1800] - E[X ∧ 2200]}/200 =
{(2)(1172.1) - 1144.8 - 1191.7}/200 = 3.9%.
9.18. A. Prior to the policy modifications, F(x) = 1 - {100/(100 + x)}3 .

Let y be the (non-zero) payment. y is undefined for x ≤ 5.
y = 0.8(x - 5) = 0.8x - 4, for 5 < x ≤ 100. y = (0.8)(95) = 76, for 100 ≤ x.
Let H(y) be the distribution of the non-zero payments.
H(y) = {F(x) - F(5)}/S(5) = {F((y+4)/.8) - 0.1362}/.8638 = F(1.25y + 5)/0.8638 - 0.1577 =
{1 - (100/(105 + 1.25y))3 }/0.8638 - 0.1577 = 1 - {100/(105 + 1.25y)}3 /0.8638, for 0 < y < 76.
H(76) = 1.
g(60) = H(62) - H(58) = {(100/{105 + (1.25)(58)})3 - {(100/{105 + (1.25)(62)})3 }/.8638 = 1.7%.
Comment: See the latter portion of Example 9.14 in Loss Models.
Note that we apply the modifications first and then discretize.
g(0) = H(2) = 0.0682. Note that at 76 we would include all the remaining probability:
1 - H(74) = {100/(105 + 1.25(74))}3 /0.8638 = 0.1503.
Here is the whole approximating distribution:
y H(y+2) g(y)
0 0.0682 0.0682
4 0.1870 0.1188
8 0.2864 0.0994
12 0.3703 0.0839
16 0.4415 0.0712
20 0.5024 0.0609
24 0.5547 0.0523
28 0.5999 0.0452
32 0.6393 0.0393
36 0.6736 0.0343
40 0.7037 0.0301
44 0.7302 0.0265
48 0.7537 0.0234
52 0.7745 0.0208
56 0.7930 0.0185
60 0.8096 0.0166
64 0.8244 0.0148
68 0.8377 0.0133
72 0.8497 0.0120
76 0.1503
9.19. A. E[X ∧ x] = θ(1 - e-x/θ) = 100(1 - e-x/100).

E[X ∧ 25] = 22.120. E[X ∧ 50] = 39.347. E[X ∧ 75] = 52.763. E[X ∧ 100] = 63.212.
g(0) = 1 - E[X ∧ 25]/25 = 1 - 22.120/25 = 0.1152.
g(75) = {2E[X ∧ 75] - E[X ∧ 50] - E[X ∧ 100]}/25 = {(2)(52.763) - 39.347 - 63.212}/25 = 0.1187.
0.1152 + 0.1187 = 0.2339.
2016-C-3, Aggregate Distributions §10 Analytic Results, HCM 10/21/15, Page 287
Section 10, Analytic Results138
In some special situations the aggregate distribution has a somewhat simpler form.
Geometric-Exponential:139
One interesting special case of the collective risk model has a Geometric frequency and an
Exponential severity.
Exercise: Let frequency be given by a Geometric Distribution with β = 3.

Let severity be given by an Exponential with mean 10.
What is the moment generating function of the aggregate losses?
1 1
[Solution: For the Geometric Distribution, P(z) = . For β = 3, P(z) = .
1 - β(z - 1) 4 - 3z
1 1
For the Exponential Distribution MX(t) = . For θ = 10, MX(t) = .
1 - θt 1 - 10t
1 1 - 10t 1 - 10t
M Agg(t) = PN(MX(t)) = = = .]
4 - 3 / (1 - 10t) 4 - 40t - 3 1 - 40t
1 - 10t 0.25 - 10t + 0.75 (1/ 4)(1 - 40t) + (3 / 4) (3 / 4)

Note that = = = (1/4) + .
1 - 40t 1 - 40t 1 - 40t 1 - 40t
This is the weighted average of the m.g.f. of a point mass at zero and the m.g.f. of an Exponential
distribution with mean 40, with weights of 1/4 and 3/4.140 The moment generating function of the
mixture is the mixture of the moment generating functions. Thus the combination of a Geometric
frequency and an Exponential Severity gives an aggregate loss distribution that is a mixture of a
point mass at zero and an exponential distribution.
In this case, there is a point mass of probability 25% at zero. SA(y) = 0.75 e-y/40.
For example, SA(0) = 0.75, and SA(40) = 0.75e-1 = 0.276.
138
139
140
The m.g.f. is the expected value of exp[xt]. Thus the m.g.f. of a point mass at zero is E[e0t] = E[1] = 1.
In general, the m.g.f. of a point mass at c is ect.
The m.g.f. of an Exponential with θ = 40 is: 1/(1 - 40t).
This aggregate loss distribution is discontinuous at zero.141 This will generally be the case when there
is a chance of zero claims. If instead the frequency distribution has no probability at zero and the
severity is a continuous distribution with support from 0 to ∞, then the aggregate losses will be a
continuous distribution from 0 to ∞.142
In general, with a Geometric frequency and an Exponential severity:

1 1 1 - θt
M Agg(t) = PN(MX(t)) = = = =
1 - β{MX[t] - 1} 1 - β{1/ (1 - θt) - 1} 1 - θt - β{1 - (1 - θt)}
1 - θt (1 + β) (1 - θt) 1 + β - θt - βθt
= = =
1 - (1 + β)θt (1 + β) {1 - (1 + β)θt} (1 + β) {1 - (1 + β)θt}
1 - θt - βθt β 1 β / (1 + β)
+ = + .
(1 + β) {1 - θt + βθt} (1 + β) {1 - (1 + β)θt} 1 + β 1 - (1 + β)θt
This is the weighted average of the moment generating function of a point mass at zero and the
moment generating function of an Exponential with mean (1+β)θ.
1 β
The weights are: and .
1 + β 1 + β
Therefore:
1 1 β β
FAgg(0) = . FAgg(y) = + (1 - e-y/{(1+β)θ}) = 1 - e-y/{(1+β)θ}.
1 + β 1 + β 1 + β 1 + β
This mixture is mathematically equivalent to an aggregate situation with a Bernoulli frequency with
β
q= and an Exponential Severity with mean (1+β)θ.143 In the latter situation there is a
1 + β
1 β
probability of of no claim in which case the aggregate is 0, and a probability of of
1 + β 1 + β
1 claim and thus the aggregate is an Exponential with mean (1+β)θ.
141
The limit approaching from below zero is not equal to the limit approaching from above zero.
142
The aggregate distribution is discontinuous when the severity distribution is discontinuous.
143
This general technique can be applied to a mixture of a point mass at zero and another distribution.
Negative Binomial-Exponential:144
One can generalize the previous situation to a Negative Binomial frequency with r integer.
The Negative Binomial is a sum of r independent Geometrics each with β. Thus the aggregate is
the sum of r independent situations as before, each of which has a Bernoulli frequency with
β
q= and an Exponential Severity with mean (1+β)θ. Thus the aggregate is mathematically
1 + β
β
the same as a Binomial frequency with m = r and q = and an Exponential Severity with
1 + β
mean (1+β)θ.
Exercise: Determine the moment generating function of an aggregate distribution with a Binomial
β
frequency with m = r and q = and an Exponential Severity with mean (1+β)θ.
1 + β
[Solution: For a Binomial Distribution, P(z) = {1 + q(z-1)}m.

r r
β (1 + β + βz - β) (1 + βz)
For this Binomial Distribution, P(z) = {1 + (z-1)}r = r = .
1 + β (1 + β) (1 + β)r
1
For an Exponential Distribution with mean θ, MX(t) = .
1 - θt
1 1
For this Exponential Distribution, MX(t) = = .
1 - (1 + β)θt 1 - θt - βθt
β
(1 + )r
1 - θt - βθt {1 - θt - βθt + β}r
M Agg(t) = PN(MX(t)) = =
(1 + β)r {(1 + β) (1 - θt - βθt)} r
{(1 + β) (1 - θt)}r (1 - θt)r

= = .]
{(1 + β) (1 - θt - βθt)} r (1 - θt - βθt) r
144
Exercise: Determine the moment generating function of an aggregate distribution with a

Negative Binomial frequency and an Exponential Severity.
1
[Solution: For the Negative Binomial Distribution, P(z) = .
{1 - β(z -1)}r
1
For the Exponential Distribution, MX(t) = .
1 - θt
1 (1 - θt)r (1 - θt)r
M Agg(t) = PN(MX(t)) = = = r. ]
{1 - β (
1
- 1)}r {1- θt - β (1 - (1 - θt))} r {1- θt - βθt}
1 - θt
We have shown that the moment generating functions are the same; thus proving as stated above,
that with a Negative Binomial frequency with r integer, and an Exponential Severity, the aggregate is
β
mathematically the same as a Binomial frequency with m = r and q = and an Exponential
1 + β
Severity with mean (1+β)θ.
In the latter situation, the frequency has finite support, and severity is Exponential, so one can write
the aggregate in terms of convolutions as:
m
e- x/ {(1+ β)θ} xn - 1
fAgg(x) = fN(0) fX* 0 (x) + ∑ fN (n) (1+β) n θn (n-1)!
n=1
r
1 r! βn e- x / {(1+ β)θ } xn - 1
=
(1 + β)r
{point mass of prob. 1 @ 0} + ∑ .
n=1 n! (r - n)! (1+β) (1+ β) θ (n- 1)!
r n n
Adding Compound Poisson Distributions:145
A Compound Poisson means a Poisson frequency together with some severity distribution,
such that frequency and severity are independent.
Let us assume we have two independent Compound Poisson Distributions.146

The Poisson parameters are λ1 and λ2.
The severity distributions are F1 (x) and F2 (x).
Then the sum of these two Compound Poissons is another Compound Poisson,
with mean frequency of λ1 + λ2,
and severity distribution that is a mixture of the individual severity distributions:
λ1 λ2
F(x) = F1 (x) + F (x).
λ1 + λ 2 λ1 + λ 2 2
Exercise: A Compound Poisson Distribution has λ = 3 and discrete severity with a 60% of 10 and
40% chance of 20. Another independent Compound Poisson Distribution has λ = 2 and discrete
severity with a 30% of 10 and 70% chance of 20. What is their sum?
[Solution: Their sum is also a Compound Poisson Distribution. λ = 3 + 2 = 5.
For the severity, the probability of 10 is: (3/5)(60%) + (2/5)(30%) = 48%,
and the probability of 20 is: (3/5)(40%) + (2/5)(70%) = 52%.]
145
146
This result extends naturally to adding three or more independent Compound Poisson Distributions.
Exercise: A Compound Poisson Distribution has λ = 3 and discrete severity with a 60% of 10 and
40% chance of 20. Another independent Compound Poisson Distribution has λ = 2 and discrete
severity with a 30% of 10 and 70% chance of 20.
What is the probability that the sum of the two Compound Poisson Distributions is 20?
[Solution: Total payment can be 20 if there are two payments of size 10 or one payment of size 20.
Using the solution to the prior exercise, probability is: (52 e-5 / 2)(0.482 ) + (5 e-5)(0.52) = 0.0369.
Alternately, the total payment can be 20 if the two Poisson Process are:
20 and 0, 10 and 10, or 0 and 20.
Prob[first is 0] = Prob[0 payments] = e-3.
Prob[first is 10] = Prob[1 payment of size 10] = (3e-3)(0.6) = 1.8e-3.
Prob[first is 20] = Prob[2 payments of size 10 or 1 payment of size 20] =
(32 e-03 / 2)(0.62 ) + (3e-3)(0.4) = 2.82e-3.
Prob[second is 0] = Prob[0 payments] = e-2.
Prob[second is 10] = Prob[1 payment of size 10] = (2e-2)(0.3) = 0.6e-2.
Prob[second is 20] = Prob[2 payments of size 10 or 1 payment of size 20] =
(22 e-2 / 2)(0.32 ) + (2e-2)(0.7) = 1.58e-2.
Thus, the probability that the total of the two processes will be 20 is:
Prob[1st is 0] Prob[2nd is 20] + Prob[1st is 10] Prob[2nd is 10] + Prob[1st is 20] Prob[2nd is 0] =
(e-0.5)(1.58e-2) + (1.8e-3)(0.6e-2) + (2.82e-3)(e-2) = 5.48e-5 = 0.0369.
Comment: Similar to Example 9.9 in Loss Models. As in Example 9.10 in Loss Models, one could
instead use the Panjer Algorithm (Recursive Method), although that is not needed here.]
Exercise: A Compound Poisson Distribution has λ = 7 and an Exponential severity with mean 100.
Another independent Compound Poisson Distribution has λ = 3 and a Pareto severity with α = 3
and θ = 500. What is their sum?
The severity is a mixture of an Exponential Distribution with mean 100 and a Pareto Distribution with
α = 3 and θ = 500; the weights are 7/10 and 3/10.]
Exercise: Compute the variance of the sum in the previous exercise.

[Solution: The variance of the first Compound Poisson is: (7)(2)(1002 ) = 140,000.
The variance of the first Compound Poisson is: (3)(2)(5002 )/{(3-1)(2-1)} = 750,000.
Since they are independent, their variances add: 140,000 + 750,000 = 890,000.
Alternately, the 2nd moment of the mixture is: (0.7)(2)(1002 ) + (0.3)(2)(5002 )/{(3-1)(2-1)} =
89,000. Thus the variance of the mixture is: (10)(89,000) = 890,000.]
Exercise: A Compound Poisson Distribution has λ = 7 and an Exponential severity with mean 100.
Another independent Compound Poisson Distribution has λ = 3 and a Pareto severity with α = 3
and θ = 500. For their sum, what is the probability that a claim is greater than 200?
The severity is a mixture of an Exponential Distribution with mean 100 and a Pareto Distribution with
α = 3 and θ = 500; the weights are 7/10 and 3/10.
For the Exponential, S(200) = exp[-200/100] = 0.1353.
For the Pareto, S(200) = (500/700)3 = 0.3644.
For the mixture, S(200) = (7/10)(0.1353) + (3/10)(0.3644) = 0.2040.]
Proof of the Result for Adding Compound Poissons:147
Let us assume that we have two independent Compound Poisson Distributions.

Then as will be shown the sum of these two Compound Poissons is another Compound Poisson,
λ1 λ2
F(x) = F1 (x) + F (x).
λ1 + λ 2 λ1 + λ 2 2
The sum of two independent Poissons is another Poisson with the sum of the lambdas.
We can thin their sum, based on whether a claim came from the first or second process, and get two
independent Poissons.148 Therefore, since for each Compound Poisson Distribution frequency and
severity are independent, for their sum frequency and severity are independent. Thus the sum is a
Compound Poisson Process.
The chance that a claim came from the first process is λ1/(λ1 +λ 2).
Thus, the chance that a claim picked at random is less than or equal to x is:
λ1 λ2
F(x) = F1 (x) + F2 (x).
λ1 + λ 2 λ1 + λ 2
Thus, for the sum of the two Compound Poissons, severity is a mixture of two individual severities,
with weights: λ1/(λ1 +λ 2) and λ2/(λ1 +λ 2).
147
Loss Models proof of Theorem 9.7 relies on the existence of moment generating functions for the severities.
148
Recall that when thinning either a Binomial or Negative Binomial, independence does not hold. Thus adding
independent Compound Negative Binomials will not in general produce another Compound Negative Binomial.
Closed Under Convolution:149
If one adds two independent Gamma Distributions with the same θ, then one gets another Gamma
Distribution with the sum of the α parameters.
Gamma(α1, θ) + Gamma (α2, θ) = Gamma(α1 + α2, θ).
A distribution is closed under convolution, if when one adds independent identically distributed
copies, one gets a member of the same family.
Gamma(α, θ) + Gamma (α, θ) = Gamma(2α, θ).
Thus a Gamma Distribution is closed under convolution.
Distributions that are closed under convolution include: Gamma, Inverse Gaussian, Normal, Binomial,
Poisson, and Negative Binomial.
If severity is Gamma(α, θ), then fX* n (x) = Gamma(nα, θ), for n ≥ 1.
Gamma(nα, θ) has density: e-x/θ xnα-1 / {θnα Γ(nα)}.

Thus if severity is Gamma, the aggregate distribution can be written in terms of convolutions:
∞
fA(x) = ∑ fN(n) fX* n (x) =
n=0
∞
fN(0){point mass of prob. 1 @ 0} + ∑ fN(n) e- x / θ xnα - 1/ {θnα Γ(nα)} . 150
n=1
This is particularly useful when α is an integer. α = 1 is an Exponential Distribution.
Exercise: Severity is Exponential. Write a formula for the density of the aggregate distribution.
[Solution:
∞
fA(x) = fN(0){point mass of probability 1 @ 0} + ∑ fN(n) e- x / θ xn - 1/ {θn (n -1)!} . ]
n=1
149
Not on the syllabus. While closed under convolution is covered in Example 9.10 and Exercise 9.37 of the 3rd
edition of Loss Models, it is not covered in the 4th edition.
150
Where for fX*0 (x) has a point mass of probability 1 at zero.
Problems:
10.1 (1 point) Which of the following distributions is not closed under convolution?
A. Binomial B. Gamma C. Inverse Gaussian D. Negative Binomial E. Pareto
10.2 (3 points) Frequency is: Prob[n = 0] = 60%, Prob[n = 1] = 30%, and Prob[n = 2] = 10%.
Severity is Gamma with α = 3 and θ = 10.
Determine the form of the aggregate distribution.
For example, what are the densities of the aggregate distribution at 10, 50, and 100?
10.3 (2 points) Claims due to illness follow a Compound Poisson Distribution has λ = 5%
and an Exponential Severity with mean 100.
Claims due to accident another independent Compound Poisson Distribution has λ = 8%
and an Exponential Severity with mean 200.
What is the probability that a claim is of size between 40 and 90?
A. 20% B. 21% C. 22% D. 23% E. 24%
10.4 (3 points) Calculate the density at 6 for a Compound Binomial-Poisson frequency distribution
with parameters m = 4, q = 0.6, and λ = 3.
(Poisson frequency with λ = 3, and Binomial severity with m = 4 and q = 0.6.)
A. 6% B. 7% C. 8% D. 9% E. 10%
10.5 (3 points) Frequency is Geometric with β = 1.4.

Severity is Exponential with θ = 5.
What is the density of the aggregate distribution at 10?
A. 2.0% B. 2.5% C. 3.0% D. 3.5% E. 4.0%
10.6 (3 points)
(a) A Compound Geometric Distribution has β = 3 and severity always 10.
Another independent Compound Geometric Distribution has β = 3 and severity always 20.
Determine the variance of the sum of these two Compound Geometric Distributions.
(b) A Compound Negative Binomial has r = 2 and β = 3,
with severity 10 half of the time and 20 the other half of the time.
Determine its variance.
10.7 (4 points) Frequency is Negative Binomial with r = 3 and β = 1.4.

A. 2.2% B. 2.6% C. 3.0% D. 3.4% E. 3.8%

A. 1.0% B. 1.2% C. 1.4% D. 1.6% E. 1.8%
10.9 (2 points) There are three independent Compound Poisson Distributions.

The first has λ = 1 and discrete severity with a 60% of 5 and 40% chance of 10.
The second has λ = 3 and discrete severity with a 20% of 5 and 80% chance of 25.
The third has λ = 4 and discrete severity with a 30% of 10 and 70% chance of 25.
What is the sum of these three independent Compound Poisson Distributions?
10.10 (2 points) There are two independent Compound Poisson Distributions.

The first has λ = 0.5 and a Exponential Distribution with θ = 1000.
The second has λ = 0.2 and a Pareto Distribution with α = 3 and θ = 2000.
For the sum of these two Compound Poisson Distribution, what is the probability that a claim is of
size less than 700?
A. 51% B. 52% C. 53% D. 54% E. 55%
10.11 (3 points) Frequency is logarithmic with β = 9. Severity is Exponential with θ = 5.

Determine the probability density function of the aggregate losses at 30.
A. 0.006 B. 0.007 C. 0.008 D. 0.009 E. 0.010

The first has λ = 14 and a Weibull Distribution with θ = 25 and τ = 3.
The second has λ = 6 and an Inverse Gamma Distribution with α = 5 and θ = 90.
What is the sum of these two independent Compound Poisson Distributions?
10.13 (3 points)
(a) A Compound Bernoulli Distribution has q = 1/2 and severity always 1.
Another independent Compound Bernoulli Distribution has q = 1/2 and severity always 2.
Determine the variance of the sum of these two Compound Bernoulli Distributions.
(b) A Compound Binomial has m = 2 and q = 1/2,
with severity 1 half of the time and 2 the other half of the time.
Determine its variance.
10.14 (3 points) There are two buildings.

Each has an independent 20% chance of a claim each year.
Each building can have at most one claim per year.
The severity for each building is Exponential with mean 50.
Determine the probability that the aggregate annual losses exceed 150.
A. 1.8% B. 2.0% C. 2.2% D. 2.4% E. 2.6%
10.15 (2 points) Policies A and B are independent.

Policy A has a Compound Poisson Distributions with λ = 0.5 and severity probabilities 0.8 on a
payment of 50 and 0.2 on a payment of 100.
Policy B has a Compound Poisson Distributions with λ = 1.5 and severity probabilities 0.4 on a
payment of 50 and 0.6 on a payment of 100.
Determine the probability that the total payment on the two policies will be 100.
A. 18% B. 20% C. 22% D. 24% E. 26%

The first has λ = 4 and a Weibull Distribution with θ = 100 and τ = 1/2.
The second has λ = 7 and a Loglogistic Distribution with γ = 2 and θ = 150.
For the sum of these two Compound Poisson Distribution, what is the probability that a claim is of
size greater than 200?
A. 24% B. 26% C. 28% D. 30% E. 32%
10.17 (5A, 11/96, Q.36) The frequency distribution is Geometric with parameter β.
The severity distribution is Exponential with a mean of 1.
(1/2 point) What is the Moment Generating Function of the frequency?
(1/2 point) What is the Moment Generating Function of the severity?
(1 point) What is the Moment Generating Function of the aggregate losses?
10.18 (Course 151 Sample Exam #1, Q.19) (2.5 points) SA and SB are independent random
variables and each has a compound Poisson distribution. You are given:
(i) λA = 3, λB =1
(ii) The severity distribution of SA is: pA(1) = 1.0.
(iii) The severity distribution of SB is: pB(1) = pB(2) = 0.5.
(iv) S = SA + SB
Determine FS(2).
(A) 0.12 (B) 0.14 (C) 0.16 (D) 0.18 (E) 0.20
10.19 (5A, 11/99, Q.24) (1 point)

Which of the following are true regarding collective risk models?
A. If we combine insurance portfolios, where the aggregate claims of each of the portfolios have
compound Poisson Distributions and are mutually independent, then the aggregate claims for
the combined portfolio will also have a compound Poisson Distribution.
B. When the variance of the number of claims exceeds its mean, the Poisson distribution is
appropriate.
C. If the claim amount distribution is continuous, it can be concluded that the distribution of the
aggregate claims is continuous.
D. A Normal Distribution is usually the best approximation to the aggregate claim distribution.
E. All of the above are false.
10.1. E. The sum of two independent Pareto Distributions is not another Pareto Distribution.
10.2. The sum of two of the Gammas is also Gamma, with α = 6 and θ =10.
Thus the aggregate distribution is:
(0.6)(point mass of prob. 1 @ 0) + (0.3)Gamma(3, 10) + (0.1)Gamma(6, 10).
For y > 0, the density of the aggregate distribution is:
fA(y) = (0.3){y2 e-y/10 / (103 Γ(3))} + (0.1){y5 e-y/10 / (106 Γ(6))} =
e-y/10{1,500,000y2 + 8.3333y5 } / 1010.

fA(10) = 0.00555. fA(50) = 0.004281. fA(100) = 0.000446.
Comment: This density integrated from 0 to infinity is 0.4. The remaining 60% of the probability is in
the point mass at zero, corresponding to the probability of zero claims.
10.3. B. Their sum is also a Compound Poisson Distribution. λ = 5% + 8% = 13%.

Severity is a mixture of the two Exponentials, with weights 5/13 and 8/13.
For the first Exponential, F(150) - F(50) = exp[-40/100] - exp[-90/100] = 0.2638.
For the second Exponential, F(150) - F(50) = exp[-40/200] - exp[-90/200] = 0.1811.
For the mixture: (5/13)(0.2638) + (8/13)(0.1811) = 0.213.
10.4. E. When the Binomial primary distribution is 2, the compound distribution is the sum of two
independent Poisson distributions each with λ = 3, which is Poisson with λ = 6.
Density of the compound at 6 is:
Prob[Binomial = 1] (Density at 6 of a Poisson with λ = 3) +
Prob[Binomial = 4] (Density at 6 of a Poisson with λ = 12) =
(0.1536)(0.05041) + (0.3456)(0.16062) + (0.3456)(0.09109) + (0.1296)(0.02548) = 0.0980.
Comment: Here is the density of the compound distribution, out to 16:
Binomial 0.0256 0.1536 0.3456 0.3456 0.1296 Distribution
x f*0 λ=3 λ=6 λ=9 λ = 12
0 1 0.04979 0.00248 0.00012 0.00001 0.034147
1 0.14936 0.01487 0.00111 0.00007 0.028475
2 0.22404 0.04462 0.00500 0.00044 0.051617
3 0.22404 0.08924 0.01499 0.00177 0.070664
4 0.16803 0.13385 0.03374 0.00531 0.084417
5 0.10082 0.16062 0.06073 0.01274 0.093636
6 0.05041 0.16062 0.09109 0.02548 0.098037
7 0.02160 0.13768 0.11712 0.04368 0.097036
8 0.00810 0.10326 0.13176 0.06552 0.090957
9 0.00270 0.06884 0.13176 0.08736 0.081063
10 0.00081 0.04130 0.11858 0.10484 0.068967
11 0.00022 0.02253 0.09702 0.11437 0.056172
12 0.00006 0.01126 0.07277 0.11437 0.043871
13 0.00001 0.00520 0.05038 0.10557 0.032891
14 0.00000 0.00223 0.03238 0.09049 0.023690
15 0.00000 0.00089 0.01943 0.07239 0.016405
16 0.00000 0.00033 0.01093 0.05429 0.010929
Sum 1 1.00000 0.99983 0.98889 0.89871 0.982974
10.5. A. In general, with a Geometric frequency and an Exponential severity:

M A(t) = PN(MX(t)) = 1/{1 - β(1/(1 - θt) -1)} = (1 - θt)/{1 - (1+β)θt} =
(1+β)(1 - θt)/{(1+β){1 - (1+β)θt}} = (1 + β - θt - βθt)/{(1+β){1 - (1+β)θt}} =

(1 - θt - βθt)/{(1+β){1 - (1+β)θt}} + β/{(1+β){1 - (1+β)θt}} = 1/(1+β) + {β/(1+β)}/{1 - (1+β)θt}.
This is the weighted average of the moment generating function of a point mass at zero and the
moment generating function of an Exponential with mean (1+β)θ.
In this case, the point mass at zero is: 1/(1+β) = 1/2.4 = 5/12, and the Exponential with mean
(1+β)θ = (2.4)(5) = 12 is given weight: {β/(1+β)} = 1.4/2.4 = 7/12.
Therefore, the density of the aggregate distribution at 10 is: (7/12)e-10/12/12 = 0.0211.
10.6. (a) The variance of the first is: (102 )(3)(4) = 1200.
The variance of the second is: (202 )(3)(4) = 4800.
Since they are independent, their variances add: 1200 + 4800 = 6000.
(b) Frequency has mean: (2)(3) = 6, and variance: (2)(3)(4) = 24.
Severity has mean: 15, and variance: 52 = 25.
Thus the variance of the Compound Negative Binomial is: (6)(25) + (152 )(24) = 5500.
Comment: Since the variances in the two parts are unequal, the sum of the two independent
Compound Geometric Distributions in part (a) is not equal to the Compound Negative Binomial in
part (b). It is true that the sum of the two independent Geometrics with the same beta is a Negative
Binomial with r = 2; however, Theorem 9.7 in Loss Models, the nice result for summing independent
Compound Poissons, does not hold for summing independent Compound Negative Binomials
with the same β, nor for summing independent Compound Binomials with the same q.
10.7. B. The Negative Binomial is the sum of three independent Geometric Distributions with
β = 1.4. In the previous solution, the aggregate was equivalent to a Bernoulli frequency with
q = 7/12 and an Exponential Severity with mean 12.
This is the sum of three independent versions of the previous solution, which is equivalent to a
Binomial frequency with m = 3 and q = 7/12, and an Exponential Severity with mean 12.
For the Binomial, Prob[n = 0] = (5/12)3 = 0.0723, Prob[n = 1] = (3)(7/12)(5/12)2 = 0.3038,
Prob[n = 2] = (3)(7/12)2 (5/12) = 0.4253, Prob[n = 3] = (7/12)3 = 0.1985.
When n = 1 the aggregate is Exponential with θ = 12, with density e-x/12/12.
When n = 2 the aggregate is Gamma with α = 2 and θ = 12, with density x e-x/12/144.
When n = 3 the aggregate is Gamma with α = 3 and θ = 12, with density x2 e-x/12/3456.
Therefore, the density of the aggregate distribution at 10 is:
(0.3038)e-10/12/12 + (0.4253)(10)e-10/12/144 + (0.1985)(100)e-10/12/3456 = 0.0606e-10/12 =
0.0263.
Comment: Similar to Example 9.7 in Loss Models. Beyond what you are likely to be asked.
10.8. B. Since the sum of n Exponentials is a Gamma with α = n, the density of the aggregate at
x > 0 is:
∞ ∞
fN(n) e- x / θ xn - 1 {e- 2 2n / n!} e - x / 10 xn - 1
∑ θn (n-1)!
= ∑ 10n (n- 1)!
=
n=1 n=1
∞ ∞
(0.2x)n 6n
∑ ∑
e - (2 + x / 10)
. For x = 30 this is: (1/4452.395) =
x n! (n-1)! n! (n-1)!
n=1 n=1
(1/4452.395) {6 + 36/2 + 216/12 + 1296/144 + 7776/2880 + 46656/86400 + ...) = 0.0122.

Comment: There is a point mass of probability at zero of: e−λ = e-2 = 13.53%.
An example of what is called a Tweedie Distribution, where more generally the severity is
Gamma. The Tweedie distribution is used in Generalized Linear Models. See for example,
“A Practitioners Guide to Generalized Linear Models,” by Duncan Anderson, Sholom Feldblum,
Claudine Modlin, Dora Schirmacher, Ernesto Schirmacher and Neeza Thandi,
or “A Primer on the Exponential Family of Distributions,” by David R. Clark and Charles A.
Thayer, both in the 2004 CAS Discussion Paper Program.
10.9. Their sum is also a Compound Poisson Distribution. λ = 1 + 3 + 4 = 8.

For the severity, the probability of 5 is: (1/8)(60%) + (3/8)(20%) = 15%,
the probability of 10 is: (1/8)(40%) + (4/8)(30%) = 20%,
and the probability of 25 is: (3/8)(80%) + (4/8)(70%) = 65%.
10.10. C. The sum of the two independent Compound Poisson Distributions is a Compound
Poisson Distribution with λ = 0.5 + 0.2 = 0.7, and severity a mixture of the two severities, with
weights 5/7 and 2/7.
For the Exponential, F(700) = 1 - exp[-700/1000] = 0.5034.
For the Pareto, F(700) = 1 - {2000 / (2000 + 700)}3 = 0.5936.
For the mixture, F(700) = (5/7)(0.5034) + (2/7)(0.5936) = 0.5292.
⎛ β ⎞k
⎜ ⎟
⎝ 1+ β⎠ 0.9k
10.11. The density for the logarithmic distribution is: pk = = , k = 1, 2, 3, ...
k ln(1+ β) k ln(10)
xk - 1 e- x / θ 30k - 1 e- 30 / 5
The sum of k Exponentials is Gamma with α = k, and density: = .
θ k Γ[k] 5k (k - 1)!
∞ ∞
∑ ∑
0.9k 30 k - 1 e- 6 e- 6 5.4k e- 6
Thus fAgg(x) = = = (e5.4 - 1) = 0.0079.
k ln(10) 5k (k - 1)! 30 ln(10) k! 30 ln(10)
k=1 k=1

In general, for a Logarithmic frequency with parameter β and Exponential severity with mean θ,
x x
exp[- ] - exp[- ]
θ (1+ β) θ
fAgg(x) = .
x ln(1+β)
10.12. Their sum is also a Compound Poisson Distribution. λ = 14 + 6 = 20.

The severity is a mixture of a Weibull Distribution with θ = 25 and τ = 3 and an Inverse Gamma
Distribution with α = 5 and θ = 90; the weights are 14/20 and 6/20.
10.13. (a) The variance of the first is: (12 )(1/2)(1 - 1/2) = 0.25.
The variance of the second is: (22 )1/2)(1 - 1/2) = 1.
Since they are independent, their variances add: 0.25 + 1 = 1.25.
Alternately, we can list all of the possibilities.
0 claims from #1 and 0 claims from #2; 25% probability.
claim of size 1 from #1 and 0 claims from #2; 25% probability.
0 claims from #1 and claim of size 2 from #2; 25% probability.
claim of size 1 from #1 and claim of size 2 from #2; 25% probability.
Thus equally likely to have an aggregate of 0, 1, 2, or 3.
Mean aggregate is 1.5.
Variance of aggregate is: (1.52 + 0.52 + 0.52 + 1.52 )/4 = 1.25.
(b) Frequency has mean: (2)(1/2) = 1, and variance: (2)(1/2)(1 - 1/2) = 1/2.
Severity has mean: 1.5, and variance: 0.52 = 0.25.
Thus the variance of the Compound Binomial is: (1)(0.25) + (1.52 )(1/2) = 1.375.
Alternately, we can list all of the possibilities.
0 claims; 25% probability.
one claim of size 1; 50%/2 probability. one claim of size 2; 50%/2 probability.
2 claims each of size 1; 25%/4 probability.
2 claims one of size 1 and one of size 2; 25%/2 probability.
2 claims each of size 2; 25%/4 probability.
Thus distribution of aggregate: 0 @ 25%, 1 @ 25%, 2 @ 31.25%, 3 @12.5%, 4 @ 6.25%.
Mean aggregate is 1.5. Second moment of aggregate is 3.625.
Variance of aggregate is: 3.625 - 1.52 = 1.375.
Comment: By listing the possibilities, we can see that the distributions in part (a) and (b) are not
equal. In any case, since the variances in the two parts are unequal, the sum of the two independent
Compound Bernoulli Distributions in part (a) is not equal to the Compound Binomial in part (b).
It is true that the sum of the two independent Bernoullis with the same q is a Binomial with m = 2;
however, Theorem 9.7 in Loss Models, the nice result for summing independent Compound
Poissons, does not hold for summing independent Compound Binomials with the same q, nor for
summing independent Compound Negative Binomials with the same β.
10.14. D. The number of claims is Binomial, with m = 2 and q = 0.2.

f(0) = 0.82 = 0.64. f(1) = (2)(0.8)(0.2) = 0.32. f(2) = 0.22 = 0.04.
If there is one claim, then the chance that the aggregate is greater than 150 is: e-150/5 = e-3.
If there are two claims, then the aggregate distribution is Gamma with α = 2 and θ = 50.
S(150) = 1 - Γ[2; 150/50] = 1 - Γ[2 ; 3] = e-3 + 3 e-3 = 4e-3.
Combining the two cases: Prob[Agg > 150] = (0.32)(e-3) + (0.04)(4e-3) = 0.48 e-3 = 2.39%.
α-1
∑
xi e- x
Comment: I have used the result that Γ(α ; x) = 1 - .
i!
i=0
Therefore, Γ(2 ; x) = 1 - e-x - xe-x. This result can also be obtained from the density of the Gamma
Distribution for α = 2 by using integration by parts.
10.15. B. Their sum is also a Compound Poisson Distribution. λ = 0.5 +1.5 = 2.

For the severity, the probability of 50 is: (0.5/2)(80%) + (1.5/2)(40%) = 50%,
and the probability of 100 is: (0.5/2)(20%) + (1.5/2)(60%) = 50%.
The total payment can be 100 if there are two payments of size 50 or one payment of size 100.
Probability is: (22 e-2 / 2)(0.52 ) + (2 e-2)(0.5) = 0.203.
Alternately, the total payment can be 100 if A = 100 and B = 0, A = 50 and B = 50,
or A = 0 and B = 100.
Prob[A = 0] = Prob[0 payments] = e-0.5.
Prob[A = 50] = Prob[1 payment of size 50] = (0.5e-0.5)(0.8) = 0.4e-0.5.
Prob[A = 100] = Prob[2 payments of size 50 or 1 payment of size 100] =
(0.52 e-0.5 / 2)(0.82 ) + (0.5e-0.5)(0.2) = 0.18e-0.5.
Prob[B = 0] = Prob[0 payments] = e-1.5.
Prob[B = 50] = Prob[1 payment of size 50] = (1.5e-1.5)(0.4) = 0.6e-1.5.
Prob[B = 100] = Prob[2 payments of size 50 or 1 payment of size 100] =
(1.52 e-1.5 / 2)(0.42 ) + (1.5e-1.5)(0.6) = 1.08e-1.5.
Thus, the probability that the total payment on the two policies will be 100 is:
Prob[A = 0] Prob[B = 100] + Prob[A = 50] Prob[B = 50] + Prob[A = 100] Prob[B = 0] =
(e-0.5)(1.08e-1.5) + (0.4e-0.5)(0.6e-1.5) + (0.18e-0.5)(e-1.5) = 1.5e-2 = 0.203.
Comment: Similar to Example 9.9 in Loss Models. As in Example 9.10 in Loss Models, one could
instead use the Panjer Algorithm (Recursive Method), although that is not needed here.
10.16. E. The sum of the two independent Compound Poisson Distributions is a Compound
Poisson Distribution with λ = 4 + 7 = 11, and severity a mixture of the two severities, with weights
4/11 and 7 /11.
For the Weibull, S(200) = exp[-(200/100)1/2] = 0.2431.
For the Loglogistic, S(200) = 1 / {1 + (200/150)2 } = 0.3600.
For the mixture, S(200) = (4/11)(0.2431) + (7/11)(0.3600) = 0.3175.
10.17. As shown in Appendix B of Loss Models, for a Geometric frequency P(z) = 1/(1-β(z-1)).
For an Exponential with θ = 1, M(t) = 1/(1- θt) = 1/(1-t).
For the aggregate losses, MA(t) = PN(MX(t)) = 1/(1-β(1/(1-t)-1)) = (1 - t) / (1 - t - βt).
10.18. E. Since pA(1) = 1, SA is just a Poisson Distribution with mean 3.

Prob(SA = 0) = e-3. Prob(SA = 1) = 3e-3. Prob(SA = 2) = (9/2)e-3.
For B the chance of 0 claims is e-1, the chance of one claim is e-1, and the chance of two claims is: e-
1 /2. Prob(SB = 0) = Prob(zero claims) = e-1.
Prob(SB = 1) = Prob(one claim)Prob(claim amount = 1) = e-1/2.

Prob(SB = 2) = Prob(one claim)Prob(amount = 2) + Prob(two claims)Prob( both amounts = 1) =
e-1/2 + (e-1/2)(1/4) = 5e-1/8.
Prob(S≤2) = Prob(SA = 0)Prob(SB ≤ 2) + Prob(SA = 1)Prob(SB ≤ 1) + Prob(SA = 2)Prob(SB = 0)
= (e-3)(17e-1/8) + (3e-3)(12e-1/8) + (9e-3/2)(e-1) = 89e-4/8 = 0.2038.
Alternately, the combined process is the sum of two independent compound Poisson processes,
so it in turn is a compound Poisson process. It is has claims of sizes
1 and 2. The expected number of claims of size 1 is: (3)(1) + (1)(.5) = 3.5.
The expected number of claims of size 2 is: (3)(0) + (1)(.5) = 0.5.
The small claims (those of size 1) and the large claims (those of size 2), form independent Poisson
Processes.
Prob(S ≤ 2) = Prob(no claims of size 1)Prob(0 or 1 claims of size 2) +
Prob(1 claim of size 1)Prob(no claims of size 2) +
Prob(2 claims of size 1)Prob(no claims of size 2) =
(e-3.5)(e-0.5 + 0.5e-0.5) + (3.5e-3.5)(e-0.5) + (3.52 e-3.5/2)(e-0.5) = 11.125e-4 = 0.2038.
Comment: One could use either the Panjer algorithm or convolutions in order to compute the
distribution of SB.
10.19. A. The sum of independent Compound Poissons is also Compound Poisson, so

Statement A is True. When the variance of the number of claims equals its mean, the Poisson
distribution is appropriate, so Statement B is False. If there is a chance of no claims, then there is an
extra point mass of probability at zero in the aggregate distribution, and the distribution of aggregate
losses is not continuous at zero, so Statement C is not True. When, as is common, the distribution of
aggregate losses is significantly skewed, the Normal Distribution is not the best approximation, so
Statement D is not True.
2016-C-3, Aggregate Distributions §11 Stop Loss Premiums, HCM 10/21/15, Page 308
Section 11, Stop Loss Premiums
Loss Models discusses stop loss insurance, in which the aggregate losses excess of an
aggregate deductible are being covered.151 The net stop loss premium is the expected
aggregate losses excess of an aggregate deductible, or the expected cost for stop loss
insurance, ignoring expenses, taxes, risk loads, etc.
For example, assume Merlinʼs Mall buys stop loss insurance from Halfmoon Insurance, such that
Halfmoon will pay for any aggregate losses excess of a $100,000 aggregate deductible per year.
Exercise: If Merlinʼs Mall has aggregate losses of $302,000 in 2003, how much does Halfmoon
Insurance pay?
[Solution: 302,000 - 100,000 = $202,000.
Comment: If instead Merlinʼs Mall had $75,000 in aggregate losses, Halfmoon would pay nothing.]
In many cases, the stop loss premium just involves the application to somewhat different situations
of mathematical concepts that have already been discussed with respect to a per claim deductible.152
One can have either a continuous or a discrete distribution of aggregate losses.
Discrete Distributions of Aggregate Losses:
Exercise: Assume the aggregate losses in thousands of dollars for Merlinʼs Mall are approximated
by the following discrete distribution: f(50) = 0.6, f(100) = 0.2, f(150) = 0.1, f(200) = 0.05,
f(250) = 0.03, f(300) = 0.02.
What is the stop loss premium, for a deductible of 100 thousand?
[Solution: For aggregate losses of: 50, 100, 150, 200, 250, and 300, the amounts paid Halfmoon
Insurance are respectively: 0, 0, 50, 100, 150, 200.
Thus the expected amount paid by Halfmoon is:
(0)(0.6) + (0)(0.2) + (50)(0.1) + (100)(0.05) + (150)(0.03) + (200)(0.02) = 18.5 thousand.]
In general, for any discrete distribution, one can compute the losses excess of d, the stop loss
premium for a deductible of d, by taking a sum of the payments times the density function:
E[(Agg - d)+] = ∑ (agg - d) f(agg).

agg > d
151
Stop loss insurance is mathematically identical to stop loss reinsurance.
Reinsurance is protection insurance companies buy from reinsurers.
152
See “Mahlerʼs Guide to Loss Distributions.” Those who found Lee Diagrams useful for understanding excess
losses for severity distributions, will probably also find them helpful here.
For example, the stop loss premium for Merlinʼs Mall with a deductible of 150 thousand is:
(50)(0.05) + (100)(0.03) + (150)(0.02) = 8.5 thousand.
Note that one could arrange this calculation in a spreadsheet as follows:

Aggregate Probability Amount Paid by Stop Loss Amount paid
Losses Insurance, Ded. of 150 times Probability
50 0.6 0 0
100 0.2 0 0
150 0.1 0 0
200 0.05 50 2.5
250 0.03 100 3
300 0.02 150 3
Sum 1 8.5
This technique is generally applicable to stop loss premium calculations involving discrete
distributions of aggregate losses.
Exercise: What is the stop loss premium for Merlinʼs Mall with a deductible of 120 thousand?
[Solution: (30)(0.1) + (80)(0.05) + (130)(0.03) + (180)(0.02) = 14.5 thousand.
Aggregate Probability Amount Paid by Stop Loss Amount paid
Losses Insurance, Ded. of 120 times Probability
50 0.6 0 0
100 0.2 0 0
150 0.1 30 3
200 0.05 80 4
250 0.03 130 3.9
300 0.02 180 3.6
Sum 1 14.5
Note that in this case there is no probability between $100,000 and $150,000. $120,000 is 40% of
the way from $100,000 to $150,000. The stop loss premium for a deductible of $120,000 is
14,500, 40% of the way from the stop loss premium for a deductible of $100,000 to the the stop
loss premium at a deductible of $150,000; 14,500 = (0.6)(18.5) + (0.4)(8.5).
In general, when there is no probability for the aggregate losses in an interval, the stop
loss premium for deductibles in this interval can be gotten by linear interpolation.153
Exercise: What is the stop loss premium for Merlinʼs Mall with a deductible of 140 thousand ?
[Solution: (0.2)(18.5) + (0.8)(8.5) = 10.5 thousand.]
153
Thus for a discrete probability distribution, the excess losses and the excess ratio decline lineally
over intervals in which there is no probability; the slope changes at any point which is part of the
support of the distribution. For continuous distributions, the excess losses and excess ratio decline
faster than linearly; the graphs of the excess losses and excess ratio are concave upwards.154
One can also calculate the stop loss premium, as the mean aggregate loss minus the expected
value of the aggregate loss limited to d. E[(Agg - d)+ ] = E[Agg] - E[Agg ∧ d].
For the prior example, the mean aggregate loss is:

(50)(0.6) + (100)(0.2) + (150)(0.1) + (200)(0.05) + (250)(0.03) + (300)(0.02) = 88.5 thousand.
One would calculate the expected value of the aggregate loss limited to 150 as follows:
A B C D
Aggregate Probability Aggregate Product of
Losses Loss Limited Col. B
to 150 & Col. C
50 0.6 50 30
100 0.2 100 20
150 0.1 150 15
200 0.05 150 7.5
250 0.03 150 4.5
300 0.02 150 3
Sum 1 80
E[(Agg - 150)+ ] = E[Agg] - E[Agg ∧ 150] = 88.5 - 80 = 8.5.
Recursion Formula:
When one has a discrete distribution with support spaced at regular intervals, then Loss Models
presents a systematic way to calculate the excess losses. As above, assume the aggregate losses
in thousands of dollars for Merlinʼs Mall are approximated by the following discrete distribution:
f(50) = 0.6, f(100) = 0.2, f(150) = 0.1, f(200) = 0.05, f(250) = 0.03, f(300) = 0.02. I
n this case, the density is only positive at a finite number of points, each 50 thousand apart.
∞
Losses excess of 150 thousand = 50,000 ∑ S(150 + 50j)
j=0
= 50,000 {S(150) + S(200) + S(250) + S(300)} = 50,000(0.1 + 0.05 + 0.02 + 0) = 8,500.155

154
Excess losses are the integral of the survival function from x to infinity. With x at the lower limit of integration, the
derivative of the excess losses is -S(x) < 0. The second derivative of the excess losses is f(x) > 0.
155
Which matches the result calculated directly. One could rearrange the numbers that entered into the two
calculations in order to see why the results are equal.
Thus in analogy to the continuous case, where one can write the excess losses as an integral of the
survival function, in the discrete case, one can write the excess losses as a sum of survival functions,
times ΔAgg:156
∞
E[(Agg - j ΔAgg)+] = ΔA ∑ S(j ΔAgg + k ΔAgg).
k=0
This result can be turned into a recursion formula.

For the above example, Losses excess of 150,000 = 50,000(0.1 + 0.05 + 0.02 + 0) = 8500.
The losses excess of 200,000 = 50,000(0.05 + 0.02 + 0) = 3500 = 8500 - 5000 =
(losses excess of 150,000) - (50000)(0.1) = losses excess of 150,000 - ΔA S(150,000).
More generally, we can write the excess losses at the larger deductible, (j+1) ΔAgg in terms of
those at the smaller deductible, j ΔAgg, and the Survival Function at the smaller deductible:157
E[(Agg - (j+1) ΔAgg)+ ] = E[(Agg - j ΔAgg)+ ] - ΔAgg S(j ΔAgg).
In other words, in this type of situation, raising the aggregate deductible of the insured by ΔAgg,
eliminates additional losses of ΔAgg S(jΔA), from the point of view of the insurer.158
This recursion can be very useful if there is some maximum value Agg can take on. In which case we
could start at the top and work our way down. In the example, we know there are no aggregate
losses excess of 300,000; the stop loss loss premium for a deductible of 300,000 is zero. Then we
could calculate successively, the stop loss premiums for deductibles 250,000, 200,000, 150,000,
etc. Then any other deductibles can be handled via linear interpolation.
However, it is more generally useful to start at a deductible of zero and work ones way up.
The stop loss premium at a deductible of zero is the mean. Usually we would have already
calculated the mean aggregate loss as the product of the mean frequency and the mean severity.
In the example, the mean aggregate loss is:
(50)(0.6) + (100)(0.2) + (150)(0.1) + (200)(0.05) + (250)(0.03) + (300)(0.02) = 88.5 thousand.
Then the stop loss premium at a deductible of 50,000 is: 88.5 - (50)(1) = 38.5 thousand.
The stop loss premium at a deductible of 100 is: 38.5 - (50)(0.4) = 18.5.
156
See Theorem 9.5 in Loss Models. In the corresponding Lee Diagram, the excess losses are a sum of horizontal
rectangles of width S(Ai) and height ΔA.
157
See Corollary 9.6 in Loss Models.
158
Or adds additional losses of ΔAS( jΔA), from the point of view of the insured.
This calculation can be arranged in a spreadsheet as follows:
Deductible Survival Function Stop Loss Premium

0 1 88.5
50 0.4 38.5
100 0.2 18.5
150 0.1 8.5
200 0.05 3.5
250 0.02 1
300 0 0
Note that the stop loss premium at a deductible of 300 is 0.
In general, the stop loss premium at a deductible of ∞ (or the largest possible aggregate
loss) is zero.
Continuous Distributions of Aggregate Losses:
Assume the aggregate annual losses for Halfmoon Insurance are closely approximated by a
LogNormal Distribution with µ = 16 and σ = 1.5.
Exercise: What are the mean aggregate annual losses for Halfmoon?
[Solution: exp(16 + 1.52 /2) = 27,371,147.]
Exercise: For Halfmoon, what is the probability that the aggregate losses in a year will be larger than
$100,000,000.
[Solution: 1 - Φ[(ln(100,000,000) - 16)/1.5] = 1 - Φ[1.62] = 5.26%.]
Halfmoon Insurance might buy stop loss reinsurance from Global Reinsurance.159
For example, assume Halfmoon buys stop loss reinsurance excess of $100 million.
If Halfmoonʼs aggregate losses exceed $100 million in any given year, then Global Reinsurance will
pay Halfmoon the amount by which the aggregate losses exceed $100 million.
Exercise: Halfmoonʼs aggregate losses in 2002 are $273 million.

How much does Global Reinsurance pay Halfmoon?
[Solution: $273 million - $100 million = $173 million.]
159
Also called aggregate excess reinsurance. Stop loss reinsurance is mathematically identical situation of the
purchase of insurance excess of an aggregate deductible.
Mathematically, the payments by Global Reinsurance are the same as the losses excess of a
deductible or maximum covered loss of $100 million. The expected excess losses are the mean
minus the limited expected value.160 The expected losses retained by Halfmoon are the limited
expected value.
Exercise: What are the expected losses retained by Halfmoon and the expected payments by
Global Reinsurance?
[Solution: For the LogNormal Distribution, the limited expected value is:
E[X ∧ x] = exp(µ + σ2/2) Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}

E[X ∧ 100,000,000] = (27.37 million)Φ[(ln(100 million) - 16 - 1.52 )/1.5] +
(100 million){1-Φ[(ln(100 million)- 16)/ 1.5]} =
(27.37 million)Φ[0.11] + (100 million){1 - Φ[1.61]} =
(27.37 million)(0.5438) + (100 million)(0.0537) = $20.25 million.
Thus Halfmoon retains on average $20.25 million of losses. Global Reinsurance pays on average
E[X] - E[X ∧ 100,000,000] = 27.37 million - 20.25 million = $7.12 million.
Comment: The formula for the limited expected value for the LogNormal Distribution is given in
Appendix A of Loss Models.]
Thus ignoring Global Reinsuranceʼs expenses, etc., the net stop loss premium Global
Reinsurance would charge Halfmoon would be in this case $7.12 million.161
In general, the stop loss premium depends on both the deductible and the distribution of the
aggregate losses. For example, the stop loss premium for a deductible of $200 million would have
been less than that for a deductible of $100 million.
Given the LogNormal Distribution one could calculate variances and higher moments for either the
losses excess of the deductible or below the deductible. One could also do calculations concerning
layers of loss. Mathematically these are the same type of calculations as were performed on
severity distributions.162
160
161
162
Exercise: What is the variance of the losses retained by Halfmoon?

What is the variance of the payments by Global Reinsurance?
E[(X ∧ x)2 ] = exp[2µ + 2σ2]Φ[{ln(x) − (µ+ 2σ2)} / σ] + x2 {1- Φ[{ln(x) − µ} / σ] }
For µ = 16 and σ = 1.5,
E[(X ∧ 100 million)2 ] = 1.122 x 1015. E[X2 ] = exp[2µ + 2σ2] = 7.109 x 1015.
E[X ∧ 100 million] = 2.025 x 107 . E[X] = exp[µ + 0.5σ2] = 27.37 million
The variance of Halfmoonʼs retained losses is:
E[(X ∧ 100 million)2 ] - E[X ∧ 100 million]2 = 1.122 x 1015 - (2.025 x 107 )2 =
7.12 x 1014. The second moment of Globalʼs payments is:
E[X2 ] - E[(X ∧ 100 m)2 ] - 2(100 million){E[X] - (E[X ∧ 100 m]} =
7.109 x 1015 - 1.122 x 1015 - (2x 108 )(2.737 x 107 - 2.025 x 107 ) = 4.563 x 1015.
From the previous solution, the mean of Globalʼs payments is $7.23 million.
Therefore, the variance of Globalʼs payments is: 4.563 x 1015 - (7.23 x 106 )2 = 4.512 x 1015.]
There is nothing special about the LogNormal Distribution. One could apply the same ideas to the
Uniform, Exponential, or other continuous distributions.
Exercise: Aggregate losses are uniformly distributed on (50, 100).

What is the net stop loss premium for a deductible of 70?
100 100
[Solution: losses excess of 70 = ∫70 (t - 70) f(t) dt = 70∫ (t - 70) / 50 dt = 9.
Alternately, for the uniform distribution, E[X ∧ x] = (2xb - a2 - x2 ) / {2(b-a)}, for a ≤ x ≤ b.
E[X ∧ 70] = (2(70)(100) - 502 - 702 ) / {2(100-50)} = 66.
E[X] = (50+100)/2 = 75.
E[X] - E[X ∧ 70] = 75 - 66 = 9.]
For any continuous distribution, F(x), the mean, limited expected value, and therefore the excess
losses can be written as an integral of the survival function S(x) = 1 - F(x).163
∞
E[Agg] = ∫0 S(t) dt .
163
d
E[Agg ∧ d] = ∫0 S(t) dt .
∞
losses excess of d = E[Agg] - E[Agg ∧ d] = ∫d S(t) dt .
Loss Models also uses the notation E[(Agg - d)+] for the excess losses, where y+ is defined as
0 if y < 0 and y if y ≥ 0.
∞ ∞
losses excess of d = E[(Agg - d)+] =
∫d (t - d) f(t) dt = ∫d S(t) dt .
The stop loss premium at 0 is the mean: E[(Agg - 0)+ ] = E[Agg].

The stop loss premium at ∞ is 0: E[(Agg - ∞)+ ] = E[0] = 0.
Other Quantities of Interest:
Once one has the distribution of aggregate losses, either discrete or continuous, one can calculate
other quantities than the expected losses excess of an aggregate deductible; i.e., other than the
stop loss premium. Basically any quantity we could calculate for a severity distribution,164 we could
calculate for an aggregate distribution.
For example, one can calculate higher moments. In particular one could calculate the variance of
aggregate losses excess of an aggregate deductible.
164
Exercise: Assume the aggregate losses in thousands of dollars for Merlinʼs Mall are approximated
by the following discrete distribution: f(50) = 0.6, f(100) = 0.2, f(150) = 0.1,
f(200) = 0.05, f(250) = 0.03, f(300) = 0.02. Merlinʼs Mall buys stop loss insurance from Halfmoon
Insurance, such that Halfmoon will pay for any aggregate losses excess of a $100 thousand
deductible per year. What is the variance of payments by Halfmoon?
[Solution: For aggregate losses of: 50, 100, 150, 200, 250, and 300, the amounts paid Halfmoon
Insurance are respectively: 0, 0, 50, 100, 150, 200.
Thus the expected amount paid by Halfmoon is:
(0)(0.6) + (0)(0.2) + (50)(0.1) + (100)(0.05) + (150)(0.03) + (200)(0.02) = 18.5 thousand.
(02 )(0.6) + (02 )(0.2) + (502 )(0.1) + (1002 )(0.05) + (1502 )(0.03) + (2002 )(0.02) = 2225 million.
Therefore, the variance is: 2225 million - 342.25 million = 1882.75 million.]
One could calculate the mean and variance of aggregate losses subject to an aggregate limit. The
losses not paid by Halfmoon Insurance due to the aggregate deductible are paid for by the insured,
Merlinʼs Mall. Thus from Merlinʼs Mallʼs point of view, it pays for aggregate losses subject to an
aggregate maximum of $100,000.
Exercise: In the previous exercise, what are the mean and variance of Merlinʼs Mallʼs aggregate
losses after the impact of insurance?
[Solution: For aggregate losses of: 50, 100, 150, 200, 250, and 300, the amounts paid Merlinʼs Mall
after the effect of insurance are respectively: 50, 100, 100, 100, 100, 100.
Thus the expected amount paid by Merlinʼs Mall is:
(50)(0.6) + (100)(0.2) + (100)(0.1) + (100)(0.05) + (100)( 0.03) + (100)(0.02) = 70 thousand.
The second moment is: (502 )(0.6) + (1002 )(0.4) = 5500 million.
Therefore, the variance is: 5500 million - 4900 million = 600 million.]
Here is an example of how one can do calculations related to layers of loss.
Assume the aggregate annual losses for Halfmoon Insurance are closely approximated by a
LogNormal Distribution with µ = 16 and σ = 1.5.
If Halfmoonʼs aggregate losses exceed $100 million in any given year, then Global Reinsurance will
pay Halfmoon the amount by which the aggregate losses exceed $100 million.
However, Global will pay no more than $250 million per year.

[Solution: $273 million - $100 million = $173 million.]

[Solution: $517 million - $100 million = $417 million.
However, Globalʼs payment is limited to $250 million.
Comment: Unless Halfmoon has additional reinsurance, Halfmoon pays $517 - $250 = $267 million
in losses, net of reinsurance.]
Mathematically, the payments by Global Reinsurance are the same as the layer of losses from
$100 to $350 million.
The expected losses for Global are: E[X ∧ 350 million] - E[X ∧ 100 million].165
The expected losses retained by Halfmoon are: E[X] + E[X ∧ 100 million] - E[X ∧ 350 million].
Exercise: What are the expected losses retained by Halfmoon and the expected payments by
Global Reinsurance?
[Solution: For the LogNormal Distribution, the limited expected value is:

For µ = 16 and σ = 1.5:
E[X ∧ 100 million] = $20.25 million. E[X ∧ 350 million] = $25.17 million.
E[X] = exp(µ + σ2/2) = $27.37 million.
Thus Global Reinsurance pays on average E[X ∧ 350 million] - E[X ∧ 100 million] =
25.17 million - 20.25 million = $4.92 million.
Halfmoon retains on average $27.37 - 4.92 = $22.45 million of losses.]
Similarly, one could calculate the variance of the layers of losses.

The second moment of the layer of loss from d to u is:
E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - (E[X ∧ d]}.166
165
166
Exercise: What is the variance of payments by Global Reinsurance?

E[(X ∧ x)2 ] = exp[2µ + 2σ2] Φ[{ln(x) - (µ+ 2σ2)} / σ] + x2 {1- Φ[{ln(x) - µ} / σ] }.
For µ = 16 and σ = 1.5,
E[(X ∧ 100 million)2 ] = 1.122 x 1015. E[(X ∧ 350 million)2 ] = 2.940 x 1015.
E[X ∧ 100 million] = 2.025 x 107 . E[X ∧ 350 million] = 2.517 x 107 .
The second moment of Globalʼs payments is:
E[(X ∧ 350 m)2 ] - E[(X ∧ 100 m)2 ] - 2(100 million){E[X ∧ 350 m] - (E[X ∧ 100 m]} =
2.940 x 1015 - 1.122 x 1015 - (2 x 108 )( 2.517 x 107 - 2.025 x 107 ) = 8.4 x 1014.
From the previous solution, the mean of Globalʼs payments is $4.92 million.
Therefore, the variance of Globalʼs payments is: 8.34 x 1014 - (4.92 x 106 )2 = 8.10 x 1014.
Comment: The formula for the limited moments for the LogNormal Distribution is given in
Appendix A of Loss Models.]
Problems:
11.1 (1 point) The stop loss premium for a deductible of $1 million is $120,000.
The stop loss premium for a deductible of $1.1 million is $111,000.
Assuming the aggregate losses are very unlikely to be between $1 million and $1.1 million dollars,
what is the stop loss premium for a deductible of $1.08 million?
E. At least 112,800
11.2 (3 points) The aggregate annual losses have a mean of 13,000 and a standard deviation of
92,000. Approximate the distribution of aggregate losses by a LogNormal Distribution, and then
estimate the stop loss premium for a deductible of 25,000.
A. Less than 7000
E. At least 7300
The aggregate losses have been approximated by the following discrete distribution:
f(0) = 0.3, f(10) = 0.4, f(20) = 0.1, f(30) = 0.08, f(40) = 0.06, f(50) = 0.04, f(60) = 0.02.
11.3 (1 point) What is the mean aggregate loss?

A. Less than 15
E. At least 18
11.4 (1 point) What is the stop loss premium, for a deductible of 10?
A. Less than 4
E. At least 7
A. Less than 2
E. At least 5
A. Less than 0.5
E. At least 2.0
A. Less than 0.25
E. At least 1.00
A. Less than 0.1
E. At least 0.4
A. Less than 0.1
E. At least 0.4
A. Less than 1.4
E. At least 1.7
11.11 (1 point) If the stop loss premium is 3.7, what is the corresponding deductible?
A. Less than 19
E. At least 22

• Aggregate losses follow an Exponential distribution.
• There is an aggregate deductible of 250.
• An insurer has priced the stop loss insurance assuming a gross premium 30% more
than the stop loss premium.
11.12 (1 point) What is the stop loss premium if the mean of the Exponential is 100?
A. Less than 8
E. At least 11
11.13 (1 point) What is the stop loss premium if the mean of the Exponential is 110?
A. Less than 12
E. At least 15
11.14 (1 point) If the insurer assumed that the mean of the exponential was 100, but it was actually
110, then what is the ratio of gross premium charged to the (correct) stop loss premium?
A. Less than 94%
E. At least 97%
11.15 (2 points) The stop loss premium at a deductible of 150 is 11.5. The stop loss premium for
a deductible of 180 is 9.1. There is no chance that the aggregate losses are between 140 and 180.
What is the probability that the aggregate losses are less than or equal to 140?
A. Less than 90% B. 90% C. 91% D. 92% E. More than 92%
11.16 (2 points) The average disability lasts 47 days. The insurer will pay for all days beyond the
first 10. The insurer will only pay for 75% of the cost of the first 10 days. The cost per day is $80.
60% of disabilities are 10 days or less. Assume that those disabilities of 10 days or less are
uniformly distributed from 1 to 10.
What is the expected cost for the insurer per disability?
A. Less than 3580
E. At least 3640

• Aggregate losses for Slippery Rock Insurance have the following distribution:
f(0) = 47% f(5) = 13% f(50) = 1%
f(1) = 10% f(10) = 6%
f(2) = 20% f(25) = 3%
• Slippery Rock Insurance buys aggregate reinsurance from Global Reinsurance.
Global will pay those aggregate losses in excess of 8 per year.
• Slippery Rock collects premiums equal to 110% of its expected losses prior to
the impacts of reinsurance.
• Global Reinsurance collects from Slippery Rock Insurance 125% of the losses
Global Reinsurance expects to pay.
11.17 (1 point) How much premium does Slippery Rock Insurance collect?
A. Less than 3.5
E. At least 3.8
11.18 (1 point) What is the variance of Slippery Rockʼs aggregate losses, prior to the impact of
reinsurance?
A. Less than 30
E. At least 60
11.19 (1 point) What are the expected aggregate losses for Slippery Rock after the impact of
reinsurance?
A. Less than 1.5
E. At least 1.8
11.20 (1 point) What are the variance of aggregate losses for Slippery Rock after the impact of
reinsurance?
A. Less than 6.5
E. At least 6.8
11.21 (1 point) How much does Slippery Rock pay Global Reinsurance?
A. Less than 1.4
E. At least 1.7
11.22 (1 point) Global Reinsurance in turns buys reinsurance from Cosmos Assurance covering
payments due to Globalʼs contract with Slippery Rock.
Cosmos Assurance will reimburse Global for the portion of its payments in excess of 12.
What are Globalʼs expected aggregate losses, after the impact of its reinsurance with Cosmos?
A. Less than 0.6
E. At least 0.9
11.23 (2 points) The aggregate annual losses follow approximately a LogNormal Distribution with
parameters µ = 9.902 and σ = 1.483.
Estimate the stop loss premium for a deductible of 100,000.
(A) 25,000 (B) 27,000 (C) 29,000 (D) 31,000 (E) 33,000
11.24 (2 points) The aggregate losses for Mercer Trucking are given by a Compound Poisson
Distribution with λ = 3. The mean severity is $10. The net stop loss premium at $25 is $14.2.
The insurer will pay Mercer Trucking a dividend if Mercer Truckingʼs aggregate losses are less than
$25. The dividend will be 30% of the amount by which $25 exceeds Mercer Truckingʼs aggregate
losses. What is the expected value of next yearʼs dividend?
A. Less than 2.8
E. At least 3.1

Aggregate Stop Loss Aggregate Stop Loss
Deductible Premium Deductible Premium
100,000 2643 1,000,000 141
150,000 1633 2,000,000 53
200,000 1070 3,000,000 26
250,000 750 4,000,000 15
300,000 563 5,000,000 10
500,000 293
Assume there is no probability between the given amounts.
11.25 (1 point) A stop loss insurance pays the excess of aggregate losses above 700,000.
Determine the amount the insurer expects to pay.
(A) 200 (B) 210 (C) 220 (D) 230 (E) 240
11.26 (1 point) A stop loss insurance pays the excess of aggregate losses above 250,000
subject to a maximum payment of 750,000. Determine the amount the insurer expects to pay.
(A) 600 (B) 610 (C) 620 (D) 630 (E) 640
11.27 (2 points) A stop loss insurance pays 75% of the excess of aggregate losses above
500,000 subject to a maximum payment of 1,000,000.
Determine the amount the insurer expects to pay.
(A) 140 (B) 150 (C) 160 (D) 170 (E) 180
11.28 (3 points) A manufacturer will buy stop loss insurance with an annual aggregate deductible of
D. If annual aggregate losses are less than D, then the manufacturer will pay its workers a safety
bonus of one third the amount by which annual losses are less than D.
D is chosen so as to minimize the sum of the stop loss premium and the expected bonuses.
What portion of the time will the annual aggregate losses exceed D?
(A) 1/4 (B) 1/3 (C) 2/5 (D) 1/2 (E) 3/5
11.29 (5 points) The Duff Brewery buys Workersʼ Compensation Insurance.

Duffʼs annual aggregate losses are LogNormally Distributed with µ = 13.5 and σ = 0.75.
Duffʼs premiums depend on its actual aggregate annual losses, A.
Premium = 1.05(200,000 + 1.1A), subject to a minimum premium of 500,000 and a maximum
premium of 2,500,000.
What is Duffʼs average premium?
11.30 (2 points) The distribution of annual aggregate losses is discrete:

Amount Probability
0 10%
10 20%
20 30%
30 20%
50 10%
100 10%
For what annual aggregate deductible is the stop loss premium equal to 10?
A. 17.5 B. 20.0 C. 22.5 D. 25.0 E. 27.5

• Frequency is equally likely to be: 0, 1, 2, 3, 4, or 5.
• Severity is equally likely to be: 10, 20, 30, 40, 50, or 60.
• An insurance policy has an annual aggregate deductible of 40.
11.31 (3 points) Determine the expected annual payments by the insurer.

A. 50 B. 52 C. 54 D. 56 E. 58
11.32 (3 points) Determine the variance of the distribution of these annual payments.
A. 1900 B. 2200 C. 2500 D. 2800 E. 3100
11.33 (3 points) For a collective risk model:

(ii) The common distribution of the individual losses is:
x fx(x)
10 0.7
20 0.3
An insurance covers aggregate losses subject to an aggregate deductible of 30.
Calculate the expected aggregate payments of the insurance.
(A) 13 (B) 14 (C) 15 (D) 16 (E) 17
11.34 (2 points) A reinsurer pays aggregate claim amounts in excess of d, and in return it receives a
stop-loss premium E[(S - d)+].
E[(S - 200)+] = 1000.
E[(S - 300)+] = 930.
The probability that the aggregate claim amounts are greater than 200 and less 300 is zero.
Determine the probability that the aggregate claim amounts are greater than 200.
A. 40% B. 50% C. 60% D. 70% E. 80%
11.35 (2 points) X is uniform from zero to one thousand.

⎧ 0, x < 600 / c
For some positive constant c, a policy pays for a loss of size x: ⎨ .
⎩cx - 600, x ≥ 600 / c
If the expected amount paid under the policy is equal to 500/c, determine c.
A. 1.0 B. 1.2 C. 1.4 D. 1.6 E. 1.8
11.36 (2 points) A reinsurance policy reimburses aggregate insured expenses at a rate of 50% for
the first 4000 in excess of 1000, 75% of the next 20,000, and 100% thereafter.
Express the expected cost of this coverage in terms of E[(S - d)+] for different values of d.
11.37 (3 points) An employer self insures a life-insurance program:

• The annual frequency distribution is Geometric with β = 2.
• Severity is equally likely to be 10,000 or 20,000.
For 13,500, the employer purchases aggregate stop-loss coverage that limits the employerʼs
annual claim cost to 40,000.
Determine the employerʼs expected annual cost of the program, including the cost of the stop-loss
coverage.
A. 31,500 B. 32,000 C. 32,500 D. 33,000 E. 33,500
11.38 (2 points) An insurance pays 80% of aggregate losses excess of 5000, subject to a
maximum payment of 2000.
E[(S - 5000)+] = 2000.
E[(S - 8000)+] = 1400.
The probability that the aggregate claim amount is between 5000 and 8000 is zero.
Determine the expected amount paid by this insurance.
A. 400 B. 420 C. 440 D. 460 E. 480

Aggregate claims have a compound Poisson distribution with λ = 4, and a severity distribution:
p(1) = 3/4 and p(2) = 1/4.
Determine the stop loss premium at 2.
(A) 3.05 (B) 3.07 (C) 3.09 (D) 3.11 (E) 3.13
11.40 (Course 151 Sample Exam #1, Q.16) (1.7 points) A stop-loss reinsurance pays 80% of
the excess of aggregate claims above 20, subject to a maximum payment of 5.
All claim amounts are non-negative integers.
Let In be the the stop loss premium for a deductible of n, (and no limit), then you are given:
E[I16] = 3.89 E[I25] = 2.75
E[I20] = 3.33 E[I26] = 2.69
E[I24] = 2.84 E[I27] = 2.65
Determine the total amount of claims the reinsurer expects to pay.
(A) 0.46 (B) 0.49 (C) 0.52 (D) 0.54 (E) 0.56

A random loss is uniformly distributed over (0 , 80).
Two types of insurance are available.
Type Premium
Stop loss with deductible 10 Insurer's expected claim plus 14.6
Complete Insurer's expected claim times (1+k)
The two premiums are equal.
Determine k.
(A) 0.07 (B) 0.09 (C) 0.11 (D) 0.13 (E) 0.15

Aggregate claims has a compound Negative Binomial distribution with r = 2 and β = 7/3,
and individual claim distribution:
x p(x)
2 2/3
5 1/3
Determine the stop loss premium at 2.
(A) 11.4 (B) 11.8 (C) 12.2 (D) 12.6 (E) 13.0

For aggregate claims S, you are given:
(i) S can only take on positive integer values.
(ii) The stop loss premium at zero is 5/3.
(iii) The stop loss premium at two is 1/6.
(iv) The stop loss premium at three is 0.
Determine fS(1).
(A) 1/6 (B) 7/18 (C) 1/2 (D) 11/18 (E) 5/6
11.44 (5A, 5/94, Q.24) (1 point) Suppose S has a compound Poisson distribution with Poisson
parameter of 2 and E(S) = $200. Net stop-loss premiums with deductibles of $400 and $500 are
$100 and $25, respectively. The premium is $500.
The insurer agrees to pay a dividend equal to the excess of 80% of the premium over the claims.
What is the expected value of the dividend?
A. Less than $200
E. $350 or more
11.45 (5A, 5/94, Q.38) (2 points) Assume that the aggregate claims for an insurer have a
compound Poisson Distribution with lambda = 2. Individual claim amounts are equal to 1, 2, 3 with
probabilities 0.4, 0.3, 0.3, respectively. Calculate the net stop-loss premium for a deductible of 2.
11.46 (CAS9, 11/98, Q.30a) (1 point) Your company has an expected loss ratio of 50%.
You have analyzed year-to-year variation and determined that any particular accident yearʼs loss
ratio will be uniformly distributed on the interval 40% to 60%.
If expected losses are $5.0 million on subject premium of $10.0 million, what is the expected value
of losses ceded to an aggregate stop-loss cover with a retention of a 55% loss ratio?

• An aggregate loss distribution has a compound Poisson distribution with expected number
of claims equal to 1.25.
• Individual claim amounts can take only the values 1, 2 or 3, with equal probability.

Determine the probability that aggregate losses exceed 3.

Calculate the expected aggregate losses if an aggregate deductible of 1.6 is applied.
11.49 (3, 5/00, Q.11) (2.5 points) A company provides insurance to a concert hall for losses due
to power failure. You are given:
(i) The number of power failures in a year has a Poisson distribution with mean 1.
(ii) The distribution of ground up losses due to a single power failure is:
x Probability of x
10 0.3
20 0.3
50 0.4
(iii) The number of power failures and the amounts of losses are independent.
(iv) There is an annual deductible of 30.
Calculate the expected amount of claims paid by the insurer in one year.
(A) 5 (B) 8 (C) 10 (D) 12 (E) 14
11.50 (3 points) In the previous question, calculate the expected amount of claims paid by the
insurer in one year, if there were an annual deductible of 50 rather than 30.
A. 6.5 B. 7.0 C. 7.5 D. 8.0 E. 8.5
11.51 (3, 5/01, Q.19 & 2009 Sample Q.107) (2.5 points)
For a stop-loss insurance on a three person group:
(i) Loss amounts are independent.
(ii) The distribution of loss amount for each person is:
Loss Amount Probability
0 0.4
1 0.3
2 0.2
3 0.1
(iii) The stop-loss insurance has a deductible of 1 for the group.
Calculate the net stop-loss premium.
(A) 2.00 (B) 2.03 (C) 2.06 (D) 2.09 (E) 2.1
11.52 (3, 5/01, Q.30) (2.5 points)

You are the producer of a television quiz show that gives cash prizes.
The number of prizes, N, and prize amounts, X, have the following distributions:
n Pr(N = n) x Pr (X=x)
1 0.8 0 0.2
2 0.2 100 0.7
1000 0.1
You buy stop-loss insurance for prizes with a deductible of 200.
The cost of insurance includes a 175% relative security load.
Calculate the cost of the insurance.
(A) 204 (B) 227 (C) 245 (D) 273 (E) 357
11.53 (3, 11/01, Q.18 & 2009 Sample Q.99) (2.5 points) For a certain company, losses follow a
Poisson frequency distribution with mean 2 per year, and the amount of a loss is 1, 2, or 3, each with
probability 1/3. Loss amounts are independent of the number of losses, and of each other.
An insurance policy covers all losses in a year, subject to an annual aggregate deductible
of 2. Calculate the expected claim payments for this insurance policy.
(A) 2.00 (B) 2.36 (C) 2.45 (D) 2.81 (E) 2.96
11.54 (2 points) In the previous question, 3, 11/01, Q.18, let Y be the claim payments for this
insurance policy. Determine E[Y | Y > 0].
(A) 3.5 (B) 3.6 (C) 3.7 (D) 3.8 (E) 3.9
11.55 (3, 11/02, Q.16 & 2009 Sample Q.92) (2.5 points) Prescription drug losses, S, are
modeled assuming the number of claims has a geometric distribution with mean 4, and the amount
of each prescription is 40. Calculate E[(S - 100)+].
(A) 60 (B) 82 (C) 92 (D) 114 (E) 146
11.56 (SOA M, 5/05, Q.18 & 2009 Sample Q.165) (2.5 points) For a collective risk model:
(ii) The common distribution of the individual losses is:
x fx(x)
1 0.6
2 0.4
An insurance covers aggregate losses subject to an aggregate deductible of 3.
Calculate the expected aggregate payments of the insurance.
(A) 0.74 (B) 0.79 (C) 0.84 (D) 0.89 (E) 0.94
Comment: I have rewritten slightly this past exam question.
11.57 (2 points) In the previous question, SOA M, 5/05, Q.18, for those cases where the
aggregate payment is positive, what is the expected aggregate payment?
(A) 2.1 (B) 2.3 (C) 2.5 (D) 2.7 (E) 2.9
11.58 (SOA M, 11/05, Q.19 & 2009 Sample Q.206) (2.5 points) In a given week, the number
of projects that require you to work overtime has a geometric distribution with β = 2.
For each project, the distribution of the number of overtime hours in the week is the following:
x f(x)
5 0.2
10 0.3
20 0.5
The number of projects and number of overtime hours are independent.
You will get paid for overtime hours in excess of 15 hours in the week.
Calculate the expected number of overtime hours for which you will get paid in the week.
(A) 18.5 (B) 18.8 (C) 22.1 (D) 26.2 (E) 28.0
11.59 (SOA M, 11/06, Q.7 & 2009 Sample Q.280) (2.5 points) A compound Poisson claim
distribution has λ = 5 and individual claim amounts distributed as follows:
x fX(x)
5 0.6
k 0.4 where k > 5
The expected cost of an aggregate stop-loss insurance subject to a deductible of 5 is 28.03.
Calculate k.
(A) 6 (B) 7 (C) 8 (D) 9 (E) 10
11.1. E. Linearly interpolate: (0.2)(120,000) + (0.8)(111,000) = 112,800.

Comment: If as is commonly the case, there is some probability that the aggregate losses are
between $1 and $1.1 million, then the stop loss premium at $1.08 million is likely to be somewhat
closer to $111,000 than calculated here.
11.2. E. Set the observed and theoretical first two moments equal:
mean = 13,000 = exp(µ + σ2/2).
second moment = exp(2µ + 2σ2) = 92,0002 + 13,0002 = 8633 million.
σ= ln(8633 million) - 2 ln(13000) = 3.933 = 1.983.
µ = ln(13000) - σ2/2 = 7.507. E[X] = exp(µ + σ2/2) = 13000.

E[X ∧ 25,000] = 13000Φ[-.66] + (25000){1 - Φ[1.32]} =
(13000)(1 - 0.7454) + (25000)(1 - 0.9066) = 5645.
E[X] - E[X ∧ 25,000] = 13000 - 5645 = 7355.
11.3. A. (0)(.3) + (10)(.4) + (20)(.1) + (30)(.08) + (40)(.06) + (50)(.04) + (60)(.02) = 14.0.
11.4. E., 11.5. D., 11.6. E., 11.7. D., 11.8. C., and 11.9. A.
The fastest way to do this set of problems is to use the recursion formula:
E[(A - (j+1)ΔA)+] = E[(A - jΔA)+] - ΔA S( jΔA).
0 0.7 14.00
10 0.3 7.00
20 0.2 4.00
30 0.12 2.00
40 0.06 0.80
50 0.02 0.20
60 0 0.00
11.10. D. Since there is no probability between 30 and 40, we can linearly interpolate:
(0.7)(2.00) + (0.3)(0.8) = 1.64.
11.11. D. Linearly interpolating between a deductible of 20 and 30:

d = {(4 - 3.7)(30) + (3.7 - 2)(20)} / (4 - 2) = 21.5.
11.12. B. The mean excess loss for an Exponential is equal to its mean.
The losses excess of 250 are: e(250) S(250) = 100( e-250/100) = 8.21.
11.13. A. The mean excess loss for an Exponential is equal to its mean.
The losses excess of 250 are: e(250)S(250) = 110(e-250/110) = 11.33.
11.14. B. The insurer would charge a gross premium of: (1.3)(8.21) = 10.67.
The mean excess losses are: 11.33. 10.67 / 11.33 = 0.942.
Comment: Thus the expected excess losses are 94.2% of charged premiums, rather than the
desired: 1/1.3 = 76.9%.
Even a 10% mistake in estimating the mean, had a large effect on excess losses.
This is the same mathematical reason why the inflation rate of excess losses over a fixed limit is
greater than that of the total losses.
11.15. D. Since there is no probability in the interval 150 to 180,

the stop loss premium at 180 = stop loss premium at 150 - (180 - 150)S(150).
Therefore, S(150) = (11.5 - 9.1)/(180 - 150) = 0.08. Therefore F(150) = 1 - 0.08 = 0.92.
Since there is no probability in the interval (140, 150], F(140) = F(150) = 0.92.
11.16. C. If the insurer paid for everything, then the expected cost = (47)(80) = $3760.
There is the equivalent of a (25%)(80)= $20 per day deductible for the first 10 days.
For those who stay more than 10 days this is $200. For those who stay 10 days or less, they
average: (1+10)/2 = 5.5 days, so the deductible is worth on average: (5.5)($20) = $110.
Weighting together the two cases, the deductible is worth: (60%)(110) + (40%)(200) = $146.
Thus the insurer expects to pay: 3760 - 146 = $3614.
11.17. A. Slippery Rockʼs expected losses are:

(1)(0.1) + (2)(0.2) + (5)(0.13) + (10)(0.06) + (25)(0.03) + (50)(0.01) = 3.
Thus it charges of a premium of: (1.1)(3) = 3.3.
11.18. C. The second moment of Slippery Rockʼs losses are:

(1)(0.1) + (4)(0.2) + (25)(0.13) + (100)(0.06) + (625)(0.03) + (2500)(0.01) = 53.9.
Therefore, the variance = 53.9 - 32 = 44.9.
11.19. E. (1)(0.1) + (2)(0.2) + (5)(0.13) + (8)(0.06) + (8)(0.03) + (8)(0.01) = 1.95.
11.20. D. After reinsurance, the second moment of Slippery Rockʼs losses are:
(1)(0.1) + (4)(0.2) + (25)(0.13) + (64)(0.06) +(64)(0.03) + (64)(0.01) = 10.55.
Therefore, the variance = 10.55 - 1.952 = 6.75.
11.21. A. Globalʼs expected payments are 3 - 1.95 = 1.05.

Therefore, Global charges Slippery Rock: (1.25)(1.05) = 1.31.
11.22. B. Cosmos pays Global when Slippery Rockʼs losses exceed: 8 + 12 = 20.
Thus Cosmosʼs expected losses are: (5)(0.03) + (30)(0.01) = 0.45.
Thus Globalʼs expected losses net of reinsurance are 1.05 - 0.45 = 0.60.
11.23. A. E[X] = exp(µ + σ2/2) = exp(9.902 + 1.4832 /2) = 59,973.

E[X ∧ 100,000] = (59,973) Φ[(ln100000 - 9.902 - 1.4832 )/1.483]) +
(100000) {1 - Φ[(ln100000 - 9.902 )/1.483]}} =
59973Φ[-.40] + (100000)(1 - Φ[1.09]) = (59,973)(.3446) + (100,000)(.1379) = 34,457.
E[X] - E[X ∧ 100,000] = 59,973 - 34,457 = 25.5 thousand.
11.24. A. Let A be the aggregate losses.

The net stop loss premium at 25 is: E[A] - E[A ∧ 25] = 14.2.
Thus E[A ∧ 25] = E[A] - 14.2 = (3)(10) - 14.2 = 15.8.
E[25 - A]+ = 25 - E[A ∧ 25] = 25 - 15.8 = 9.2.
The dividend is 0.3[25 - A]+. Therefore, the expected dividend is: (0.3)(9.2) = 2.76.
11.25. D. Linearly interpolating: (0.6)(293) + (0.4)(141) = 232.
11.26. B. In order to get the aggregate layer from 250,000 to 250,000 + 750,000 = 1,000,000,
subtract the stop loss premiums: 750 - 141 = 609.
11.27. D. An aggregate loss of 1,833,333 results in a payment of:

(0.75)(1,833,333 - 500,000) = 1 million.
Thus the insurer pays 75% of the layer from 500,000 to 1.833 million.
In order to get the stop loss premium at 1.833 million, linearly interpolate:
(0.167)(141) + (0.833)(53) = 67.7.
In order to get the aggregate layer from 500,000 to 1,833,333, subtract the stop loss premiums:
293 - 67.7 = 225.3. The insurer pays 75% of this layer: (75%)(225.3) = 169.
11.28. A. The stop loss premium is: E[(A - D)+] = E[A] - E[A ∧ D].
The average amount by which aggregate losses are less than D is: E[(D - A)+] = D - E[A ∧ D].
The stop loss premium plus expected bonus is:
E[(A - D)+] + E[(D - A)+]/3 = E[A] - E[A ∧ D] + (D - E[A ∧ D])/3 = E[A] + D/3 - (4/3)E[A ∧ D].
Note that E[A ∧ D] is the integral of the survival function from 0 to D, and therefore,
d E[A ∧ D] / dD = S(D).
Setting equal to zero the derivative of the stop loss premium plus expected bonus:
1/3 - (4/3)S(D) = 0. ⇒ S(D) = 1/4.
11.29. A. Premium = 500,000. ⇒ 500,000 = 1.05(200,000 + 1.1A). ⇒

A = (500000/1.05 - 200000))/1.1 = 251,082.
So the minimum premiums are paid if A ≤ 251,082.
Premium = 2,500,000. ⇒ 2,500,000 = 1.05(200,000 + 1.1A). ⇒
A = (2500000/1.05 - 200000))/1.1 = 1,982,684.
So the maximum premiums are paid if A ≥ 1,982,684.
If there were no maximum or minimum premium, then the average premium would be:
1.05(200,000 + 1.1E[A]).
If there were no minimum premium, then the average premium would be:
1.05(200,000 + 1.1E[A ∧ 1,982,684]).
Due to the minimum premium, we add to [A ∧ 1,982,684] the average amount by which losses are
less than 251,082, which is: 251,082 - E[A ∧ 251,082].
Thus the average premiums are:
1.05(200,000 + 1.1{E[A ∧ 1982684] + 251,082 - E[A ∧ 251,082]}) =
500,000 + (1.05)(1.1){E[A ∧ 1,974,026] - E[A ∧ 251,082]} =
Minimum Premium + (1.05)(1.1){Layer of Loss from 251,082 to 1,974,026}.
For the LogNormal, E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}.
E[X ∧ 1982684] = exp(13.5 + 0.752 /2)Φ[(ln1982684 - 13.5 - 0.752 )/.75] +
1982684 {1 - Φ[(ln1982684 - 13.5)/.75]} = 966320 Φ[.59] + 1982684 {1 - Φ[1.33]} =
(966,320)(.7224) + (1,982,684)(1 - 0.9082) = 880,080.
E[X ∧ 251082] = exp(13.5 + 0.752 /2)Φ[(ln251082 - 13.5 - 0.752 )/.75] +
251082 {1 - Φ[(ln251082 - 13.5)/.75]} = 966320 Φ[-2.17] + 251082 {1 - Φ[-1.42]} =
(966,320)(.0150) + (251,082)(.9222) = 246,048.
Therefore, the average premium is:
500,000 + (1.05)(1.1)(880,080 - 246,048) = 1.23 million.
Comment: A simplified example of Retrospective Rating.
11.30. E. The mean aggregate loss is:

(10%)(0) + (20%)(10) + (30%)(20) + (20%)(30) + (10%)(50) + (10%)(100) = 29.
The stop loss premium for a deductible of zero is the mean of 29.
Then, E[(Agg - (j+1) ΔAgg)+] = E[(Agg - j ΔAgg)+] - ΔAgg S(j ΔAgg).
0 0.9 29.00
10 0.7 20.00
20 0.4 13.00
30 0.2 9.00
40 0.2 7.00
50 0.1 5.00
100 0 0.00
The stop loss premiums at 20 and 30 are 13 and 9.
Since there is no probability between these two values, we can linearly interpolate,
and the stop loss premium is 10 for an aggregate deductible of 27.5.
Comment: E[(Agg - 27.5)+] = (20%)(30 - 27.5) + (10%)(50 - 27.5) + (10%)(100 - 27.5) = 10.
11.31. D. & 11.32. E. Let N be frequency, X be severity, and S be aggregate losses.

E[N] = 2.5. E[N2 ] = 55/6 = 9.1667. Var[N] = 9.1667 - 2.52 = 2.9167.
E[X] = 35. E[X2 ] = 9100/6 = 1516.67. Var[X] = 1516.67 - 352 = 291.67.
E[S] = (2.5)(35) = 87.5.
For S to be 0, there must be zero claims. Thus, Prob[S = 0] = 1/6.
For S to be 10, there must be one claim of size 10.
Thus, Prob[S = 10] = (1/6)(1/6) = 1/36.
For S to be 20, there must be either one claim of size 20 or two claims each of size 10.
Thus, Prob[S = 20] = (1/6)(1/6) + (1/6)(1/6)2 = 7/216.
For S to be 30, there must be either one claim of size 30, two claims of size 10 and 20 or 20 and
10, or 3 claims each of size 10.
Thus, Prob[S = 30] = (1/6)(1/6) + (2)(1/6)(1/6)2 + (1/6)(1/6)3 = 49/1296.
E[S ∧ 40] =
(1/6)(0) + (1/36)(10) + (7/216)(20) + (49/1296)(30) + (1 - 1/6 - 1/36 - 7/216 - 49/1296)(40) =
31.474.
E[(S-40)+] = E[S] - E[S ∧ 40] = 87.5 - 31.474 = 56.026.
Var[S] = (2.5)(291.67) + (352 )(2.9167) = 4302.1.
Thus, E[S2 ] = 4302.1 + 87.52 = 11,958.4.
E[(S ∧ 40)2 ] = (1/6)(0) + (1/36)(100) + (7/216)(400) + (49/1296)(900)
+ (1 - 1/6 - 1/36 - 7/216 - 49/1296)(1600) = 1226.3.
E[(S-40)+2 ] = E[S2 ] - E[(S ∧ 40)2 ] - (2)(40){E[S] - E[S ∧ 40]}
= 11,958.4 - 1226.3 - (80)(56.026) = 6250.0.
Thus, Var[(S-40)+] = 6250.0 - 56.0262 = 3111.
Comment: In general, for a distribution uniform and discrete on the integers from i to j inclusive:
Mean = (i + j)/2 and Variance = {(j + 1 - i)2 - 1}/12.
With no maximum covered loss, in other words with u = ∞ in the formula for the second moment of a
layer of loss: E[(X-d)+2 ] = E[X2 ] - E[(X ∧ d)2 ] - 2d {E[X] - E[X ∧ d]}.
11.33. B. Prob[Agg = 0] = e-3 = 0.0498. Prob[Agg = 10] = 3e-3(0.7) = 0.1046.

Prob[Agg = 20] = Prob[1 loss of size 20 or 2 losses of size 10]
= 3e-3(0.3) + (32 e-3/2)(0.72 ) = 0.1546.
E[Agg ∧ 30] = (0)(0.0498) + (10)(0.1046) + (20)(0.1546) + (30)(1 - 0.0498 - 0.1046 - 0.1546) =
24.868.
E[Agg] = (mean frequency)(mean severity) = (3)(13) = 39.
E[(Agg - 30)+] = E[Agg] - E[Agg ∧ 30] = 39 - 24.868 = 14.13.
∞ ∞
11.34. D. E[(S- 200)+] = ∫200 S(x) dx . E[(S- 300)+] = ∫300 S(x) dx .
Since there is no probability of the aggregate loss being between 200 and 300, the survival
function of aggregate losses is constant on the interval 200 to 300.
300 300
1000 - 930 = E[(S- 300)+] - E[(S- 200)+] =
∫ S(x) dx =
∫ S(200) dx = 100 S(200).
200 200
Thus, S(200) = (1000 - 930) / 100 = 0.7.

If the distribution of aggregate losses were discreet with support at 0, 100, 200, 300, etc., then
E[(S- 300)+] = E[(S- 200)+] - 100 S(200).
11.35. D. For 600/c ≤ 1000, the expected amount paid under the policy is:
1000
∫ (cx - 600) (1/ 1000) dx = c 10002 / 2000 - c (600/c)2 / 2000 - 0.6 (1000 - 600/c) =
600/ c
500 c - 600 + 180/c.

Setting this equal to 500/c: 500/c = 500 c - 600 + 180/c. ⇒ 50c2 - 60c - 32 = 0.
60 ± 602 - (4)(50)(-32)
⇒c= = -0.4 or 1.6.
(2)(50)
11.36. The layer for 1000 to 5000 is: E[(S- 1000)+] - E[(S- 5000)+].
The layer for 5000 to 25,000 is: E[(S- 5000)+] - E[(S - 25,000)+].
The layer excess of 25,000 is: E[(S - 25,000)+].
Thus the expected cost for this coverage is:
(0.5)([(S- 1000)+] - E[(S- 5000)+]) + (0.75)(E[(S- 5000)+] - E[(S - 25,000)+]) + E[(S - 25,000)+] =
0.5 [(S- 1000)+ ] + 0.25 E[(S- 5000)+ ] + 0.25 E[(S- 25,000)+ ].
11.37. D. Prob[Agg = 0] = 1/3. Prob[Agg = 10,000] = (2/9)(1/2) = 1/9.

Prob[Agg = 20,000] = Prob[1 claim of size 20,000] + Prob[2 claims each of size 10,000] =
(2/9)(1/2) + (4/27)(1/2)2 = 4/27.
Prob[Agg = 30,000] =
Prob[2 claims one of size 10,000 and one of size 20,000] + Prob[3 claims each of size 10,000] =
(4/27)(2)(1/2)2 + (8/81)(1/2)3 = 7/81.
E[Agg ∧ 40,000] = (0)(1/3) + (10,000)(1/9) + (20,000)(4/27) + (30,000)(7/81) +
(40,000)(1 - 1/3 - 1/9 - 4/27 - 7/81) = 19,506.
Expected retained losses plus premium for stop-loss = 19,506 + 13,500 = 33,006.
E[Agg] = (mean frequency)(mean severity) = (2)(15,000) = 30,000.
E[(Agg - 40,000)+] = E[Agg] - E[Agg ∧ 40,000] = 30,000 - 19,506 = 10,494.
The insurer who sold the stop-loss coverage charges more than its expected costs.
11.38. A. The maximum payment of 2000, corresponds to an aggregate amount of:

5000 + 2000/0.8 = 7500.
Since there is no probability between 5000 and 8000, we can get E[(S - 7500)+] via linear
interpolation: (500/3000)(2000) + (2500/3000)(1400) = 1500.
The insurance pays for 80% of the layer from 5000 to 7500, with expected value:
(0.8) (E[(S - 5000)+] - E[(S - 7500)+]) = (0.8) (2000 - 1500) = 400.
11.39. C. The mean severity is: (3/4)(1) + (1/4)(2) = 5/4.

The mean aggregate loss is (5/4)(4) = 5.
The probability that the aggregate losses are zero is the probability of zero claims, which is: e-4.
The probability that the aggregate losses are one, is the probability that there is one claims and it is
of size one, which is: (3/4)4e-4 = 3e-4.
The stop loss premium at 0 is the mean aggregate loss of 5.
We can make use of the recursion: E[(X - (j+1)Δx)+] = E[(X - jΔx)+] - Δx S( jΔx).
0 0.9817 5
1 0.9267 4.0183
2 3.0916
Alternately, the expected value of aggregate losses limited to 2 is:
(0)( e-4) + (1)( 3e-4) + (2)(1 - 4e-4) = 1.9084.
The expected value of aggregate losses unlimited is 5.
Thus the expected value of aggregate losses excess of 2 is: 5 - 1.9084 = 3.0916.
11.40. C. If the insurer has aggregate losses of 26.25, then the reinsurer pays: 0.8(26.25 - 20) =
5. If the insurer has losses greater than 26.25, then the reinsurer still pays 5. If the insurer has losses
less than 20, then the reinsurer pays nothing. Thus the reinsurerʼs payments are 80% of the layer of
aggregate losses from 20 to 26.25. The layer from 20 to 26.25 is the difference between the
aggregate losses excess of 20 and those excess of 26.25: I20 - I26.25.
By linear interpolation I26.25 = 2.68.
Thus the reinsurerʼs expected payments are: (0.8)(I20 - I26.25) = (0.8)(3.33 - 2.68) = 0.52.
11.41. D. For a uniform distribution on (0, 80), the expected loss is 40.
Thus the premium for complete insurance is (1+k)(40).
For a deductible of 10 the average payment per loss is:
80 x = 80
∫ (x -10) (1/ 80) dx = (1/80) (x -10) 2 / 2 ]

x = 10
= 702 /160 = 30.625.
10
Thus with a deductible of 10, the premium is: 30.625 + 14.6 = 45.225.
Setting the two premiums equal: (1+k)(40) = 45.225, or k = 0.13.
11.42. C. The mean severity is: (2)(2/3) + (5)(1/3) = 3. The mean frequency is: (2)(7/3) = 14/3.
Thus the mean aggregate loss is: (14/3)(3) = 14.
The aggregate losses are at least 2 if there is a claim.
The chance of no claims is: 1/(1+β)r = 1/(10/3)2 = 0.09.
Thus the expected aggregate losses limited to 2, are: (0)(0.09) + (2)(1 - 0.09) = 1.82.
Thus the expected aggregate losses excess of 2 are: 14 - 1.82 = 12.18.
11.43. C. Since the stop loss premium at 3 is zero, S is never greater than 3.
Since S can only take on positive integer values, S can only be 1, 2, or 3.
Stop loss premium at zero = (1)f(1) + (2)f(2) + (3)f(3) = 5/3.
Stop loss premium at two = (0)f(1) + (0)f(2) + (1)f(3) = 1/6.
Therefore, f(3) = 1/6 and f(1) + 2f(2) = 5/3 - 3/6 = 7/6. Therefore, f(2) = (7/6 - f(1))/2.
Now f(1) + f(2) + f(3) = 1. Therefore, f(1) + (7/6 - f(1))/2 + 1/6 = 1. Solving, f(1) = 1/2.
Comment: For f(1) = 1/2, f(2) = 1/3, and f(3) =1/6,
the stop loss premium at 0 is the mean: (1)(1/2) + (2)(1/3) + (3)(1/6) = 5/3.
The stop loss premium at 2 is: (0)(1/2) + (0)(1/3) + (1/6)(3 - 2) = 1/6.
11.44. D. If the loss is (80%)(500) = 400 or more, then the insurer pays no dividend.
If the loss S is less than 400, then the insurer pays a dividend of 400 - S.
Thus the dividend is 400 - S when S ≤ 400, and is zero when S ≥ 400.
The net stop loss premium at 400 is: E [zero when S ≤ 400 and S - 400 when S≥ 400].
Dividend + S - 400 is zero when S ≤ 400, and S - 400 when S ≥ 400.
Therefore, E[dividend + S - 400] = net stop loss premium at 400.
E[dividend] + E[S] - 400 = net stop loss premium at 400.
E[dividend] = 400 + (net stop loss premium at 400) - E[S] = 400 + 100 - 200 = 300.
Comment: Somewhat similar to 3, 5/00, Q.30 and Course 151 Sample Exam #1, Q.16.
When the dividend is the excess of y over the aggregate loss, then the expected dividend is:
y + net stop loss premium at y - mean aggregate loss. In the following Lee Diagram, applied to
the distribution of aggregate losses, Area A is the average dividend, Area C is the net stop loss
premium at 400. Area B + C is the average aggregate loss. Therefore, Area B = Average
aggregate loss - stop loss premium at 400. Area A + Area B = 400.
Therefore, 400 = average dividend + average aggregate loss - stop loss premium at 400.
C
400
A B
11.45. The average severity is (1)(0.4) + (2)(0.3) + (3)(0.3) = 1.9.

The average aggregate losses are (1.9)(2) = 3.8.
The only way the aggregate losses can be zero is if there are no claims, which has probability
e-2 = 0.1353.
The only way the aggregate losses can be 1 is if there is one claim of size 1, which has probability:
(0.4)(2e-2) = 0.1083.
Thus E[A ∧ 2] = (0)(0.1353) + (1)(0.1083) + (2)(1 - 0.1353 - 0.1083) = 1.6212.
Thus the net stop-loss premium at 2 is: E[A] - E[A ∧ 2] = 3.8 - 1.62 = 2.18.
11.46. The expected loss ratio excess of 55% is:

60%
∫ (x - 55%) / 20% dx = 0.0625%.

55%
The corresponding premium is: ($10.0 million)(0.0625%) = $62,500.
11.47. We need to calculate the density of the aggregate losses at 0, 1, 2 and 3, then sum them
and subtract from unity.
The aggregate losses are 0, if there are no claims; f(0) = e-1.25. The aggregate losses are 1 if there
is a single claim of size 1; f(1) = (1/3) 1.25 e-1.25. The aggregate losses are 2 if either there is a
single loss of size 2 or there are two losses each of size 1;
f(2) = (1/3) 1.25 e-1.25 + (1/9) (1.252 /2)e-1.25. The aggregate losses are 3 if either there is a single
loss of size 3, there are two losses of sizes 1 and 2 or 2 and 1, or there are three losses each of size
1; f(3) = (1/3) 1.25 e-1.25 + (2/9) (1.252 /2) e-1.25 + (1/27) (1.253 /6) e-1.25.
Thus, f(0) + f(1) + f(2) + f(3) = e-1.25 { 1 + 1.25 + (1.252 /6) + (1.253 /162)} = 0.723.
Thus the chance that the aggregate losses are greater than 3 is: 1 - 0.723 = 0.277.
Alternately, one can compute the convolutions of the severity distribution and weight them together
using the Poisson probabilities of various numbers of claims.
For example (f*f*f) (7) = Σ (f*f)(7-x) f(x) = (1/9)(1/3) + (2/9)(1/3) + (3/9)(1/3) = 6/27 = 0.2222.
Note that I have shown more than it is necessary in order to answer this question. One need only
calculate up to the f*f*f and only for values up to 3. I have not shown the aggregate distribution for
larger values, since that would require the calculation of higher convolutions.
Poisson Probability 0.2865 0.3581 0.2238 0.0933 0.0291
Number of Claims 0 1 2 3 4
Aggregate
Dollars of Loss f*0 f f*f f*f*f f*f*f*f Distribution
0 1.0000 0.0000 0.0000 0.0000 0 0.2865
1 0.3333 0.0000 0.0000 0 0.1194
2 0.3333 0.1111 0.0000 0 0.1442
3 0.3333 0.2222 0.0370 0 0.1726
4 0.3333 0.1111 0.0123 0.0853
5 0.2222 0.2222 0.0494 N.A.
Then the chance of aggregate losses of 0, 1, 2 or 3 is: 0.2865 + 0.1194 + 0.1442 + 0.1726 =
0.7227. Thus the chance that the aggregate losses are greater than 3 is: 1 - 0.723 = 0.277.
Alternately, we can use the Panjer Algorithm, since this is a compound Poisson Distribution. The
severity distribution is s(1) = s(2) = s(3) = 1/3.
The p.g.f of a Poisson is eλ(x-1). s(0) = severity distribution at zero = 0.
c(0) = Pf(s(0)) = p.g.f. of frequency dist. at (density of severity distribution at zero.) = e1.25(0-1) =
0.2865. For the Poisson Distribution, a = 0 and b = λ = 1.25.

x x
∑ (a + jb / x) s(j) c(x - j) = (1.25/x) ∑ j s(j) c(x - j) .

1
c(x) =
1 - a s(0)
j=1 j=1
c(1) = (1.25/1) (1) s(1) c(1-1) = (1.25)(1/3)(.2865)} = 0.1194.

c(2) = (1.25/2) {(1)(1/3)(.1194) +(2)(1/3)(.2865)} = 0.1442.
c(3) = (1.25/3) {(1)(1/3)(.1442) +(2)(1/3)(.1194) +(3)(1/3)(.2865)} = 0.1726.
Then the chance of aggregate losses of 0, 1, 2 or 3 is: 0.2865 + 0.1194 + 0.1442 + 0.1726 =
0.7227. Thus the chance that the aggregate losses are > 3 is: 1 - 0.723 = 0.277.
11.48. In the absence of a deductible, the mean aggregate losses are:

(average frequency)(average severity) = 1.25(2) = 2.5. In the previous solution, we calculated f(0)
= 0.2865, and f(1) = 0.1194. Therefore, the limited expected value at 1.6 of the aggregate losses
is: (0)f(0) + (1)f(1) + 1.6(1 - (f(0) + f(1)) = 1.6 - 1.6 f(0) - 0.6 f(1) =
1.6 - (1.6)(.2865) - (.6)(0.1194) = 1.07. Thus the average aggregate losses with the deductible of
1.6 are: E[A] - E[A ∧ 1.6] = = 2.5 - 1.07 = 1.43.
11.49. E. The mean severity is (0.3)(10) + (0.3)(20) + (0.4)(50) = 29. The mean frequency is 1.
Therefore, prior to a deductible the mean aggregate losses are: (1)(29) = 29.
The probability of no claims is: e-1 = 0.3679. The probability of one claim is: e-1 = 0.3679.
The probability of two claims is: e-1/2 = 0.1839. Therefore, the probability of no aggregate losses is
0.3679. Aggregate losses of 10 correspond to one power failure costing 10, with probability
(0.3)(0.3679) = 0.1104. Aggregate losses of 20 correspond to either one power failure costing 20,
or two power failures each costing 10, with probability: (0.3)(0.3679) + (0.32 )(0.1839) = 0.1269.
Thus the chance of aggregate losses of 30 or more is: 1 - (0.3679 + 0.1104 + 0.1269) = 0.3948.
Therefore, the limited expected value of aggregate losses at 30 is:
(0)(0.3679) + (10)(0.1104) + (20)(0.1269) + (30)(0.3948) = 15.49.
Thus the losses excess of 30 are: 29 - 15.49 = 13.5.
Alternately, one could use the Panjer Algorithm (Recursive Method) to get the distribution of
aggregate losses. Since the severity distribution has support 10, 20, 50, we let Δx = 10:
10 ⇔ 1, 20 ⇔ 2, 30 ⇔ 3, ... For the Poisson, a = 0, b = λ = 1, and P(z) = eλ(z-1).

c(0) = Pf(s(0)) = Pf(0) = e1(0-1) = 0.3679.
x x

1
c(x) =
1 - a s(0)
j=1 j=1
c(1) = (1/1) (1) s(1) c(1-1) = {(.3)(.3679)} = 0.1104.

c(2) = (1/2){(1)s(1)c(1) + (2)s(2)c(0)} = (1/2) {(.3)(.1104) +(2)(.3)(.3679)} = 0.1269.
One can also calculate the distribution of aggregate losses using convolutions.
For the severity distribution, s* s(20) = 0.09, s* s(30) = 0.18, s* s(40) = 0.09, s* s(60) = 0.24,
s* s(70) = 0.24, and s* s(100) = 0.16.
Number of Losses 0 1 2
Poisson Frequency 0.3679 0.3679 0.1839
Aggregate Losses s s*s Aggregate Distribution
0 1 0 0.3679
10 0.3 0.1104
20 0.3 0.09 0.1269
Once one has the distribution of aggregate losses, one can use the recursion formula:
E[(A - (j+1)ΔA)+] = E[(A - jΔA)+] - ΔA S(jΔA).
0 0.6321 29
10 0.5217 22.679
20 0.3948 17.462
30 13.514
Alternately, once one has the aggregate distribution, one can calculate the expected amount not paid
by the stop loss insurance as follows:
A B C D
Aggregate Probability Amount Not Paid Product of
Losses by Stop Loss Col. B
Insurance & Col. C
Ded. of 30
0 0.3679 0 0
10 0.1104 10 1.104
20 0.1269 20 2.538
30 or more 0.3948 30 11.844
Sum 1 15.486
Since the mean aggregate loss is 29, the expected amount paid by the stop loss insurance is:
29 - 15.486 = 13.514.
11.50. A. The densities of the Poisson frequency are:

0 1 2 3 4
0.3679 0.3679 0.1839 0.0613 0.0153
Aggregate losses of 30 correspond to either two power failures costing 10 and 20, or three power
failures each costing 10, with probability: (2)(0.32 )(0.1839) + (0.33 )(0.0613) = 0.0348.
Aggregate losses of 40 correspond to either two power failures costing 20 each, three failures
costing 10, 10 and 20, or four power failures each costing 10, with probability:
(0.32 )(0.1839) + (3)(0.33 )(0.0613) + (0.34 )(0.0153) = 0.0216.
Thus the chance of aggregate losses of 50 or more is:
1 - (0.3679 + 0.1104 + 0.1269 + 0.0348 + 0.0216) = 0.3384.
Therefore, the limited expected value of aggregate losses at 50 is:
(0)(0.3679) + (10)(0.1104) + (20)(0.1269) + (30)(0.0348) + (40)(0.0216) + (50)(0.3384) =
22.47.
Thus the losses excess of 50 are: 29 - 22.47 = 6.5.
Alternately, one could use the Panjer Algorithm to get the distribution of aggregate losses.
Continuing from the previous solution:
c(3) = (1/3){(1)s(1)c(2) + (2)s(2)c(1) + (3)s(3)c(0)} =
(1/3) {(.3)(.1269) + (2)(.3)(.1104) + (3)(0)(.3679)} = 0.0348.
c(4) = (1/4){(1)s(1)c(3) + (2)s(2)c(2) + (3)s(3)c(1) + 4s(4)c(0)} =
(1/4) {(.3)(.0348) + (2)(.3)(.1269) + (3)(0)(.1104) + (4)(0)(.3679)} = 0.0216.
One can also calculate the distribution of aggregate losses using convolutions.
Number of Losses 0 1 2 3 4
Poisson Frequency 0.3679 0.3679 0.1839 0.0613 0.0153
Aggregate Losses s s*s s*s*s s*s*s*s Aggregate Distrib.
0 1 0 0.3679
10 0.3 0.1104
20 0.3 0.09 0.1269
30 0 0.18 0.027 0.0348
40 0 0.09 0.081 0.0081 0.0216
Once one has the distribution of aggregate losses, one can use the recursion formula:
E[(A - (j+1)ΔA)+] = E[(A - jΔA)+] - ΔA S(jΔA).
0 0.6321 29.000
10 0.5217 22.679
20 0.3948 17.462
30 0.3600 13.514
40 0.3384 9.914
50 6.530
11.51. C. For each person the mean is: (0.4)(0) + (0.3)(1) + (0.2)(2) + (0.1)(3) = 1.
Therefore, the overall mean for 3 people is: (3)(1) = 3.
Prob(Aggregate loss = 0) = 0.43 = 0.064.
Therefore, the limited expected value at 1 is: (0.064)(0) + (1 - 0.064)(1) = 0.936.
Net Stop Loss Premium at 1 is: Mean - Limited Expected Value at 1 = 3 - 0.936 = 2.064.
11.52. D. The probability of zero loss: Prob(n = 1)Prob(x = 0) + Prob(n =2)Prob(x =0)2 =
(.8)(.2) + (.2)(.2)2 = 0.168. The probability of an aggregate loss of 100 is:
Prob(n = 1)Prob(x = 100) + Prob(n = 2)(2)Prob(x = 0)Prob(x = 100) =
(.8)(.7) + (.2)(2)(.2)(.7) = 0.616. Therefore, the probability that the aggregate losses are 200 or
more is: 1 - (0.168 + 0.616) = 0.216.
Therefore, E[A ∧ 200] = (.168)(0) + (.616)(100) + (.216)(200) = 104.8.
Mean frequency is: (.8)(1) + (.2)(2) = 1.2.
Mean severity is: (.2)(0) + (.7)(100) + (.1)(1000) = 170.
Mean aggregate loss is: (1.2)(170) = 204.
Stop loss premium is: E[A] - E[A ∧ 200] = 204 - 104.8 = 99.2.
With a relative security loading of 175%, the insurance costs: (1 + 1.75)(99.2) = 273.
Alternately, the probability of 2000 in aggregate loss is:
Prob(n = 2)Prob(x = 1000)2 = (0.2)(0.12 ) = 0.002.
The probability of 1100 in aggregate loss is:
Prob(n = 2)(2)Prob(x = 100)Prob(x = 1000) = (.2)(2)(0.7)(0.1) = 0.028.
The probability of 1000 in aggregate loss is:
Prob(n = 2)(2)Prob(x = 0)Prob(x = 1000) + Prob(n = 1)Prob(x = 1000) =
(.2)(2)(0.2)(0.1) + (0.8)(0.1) = 0.088.
These are the only possible aggregate values greater than 200.
Therefore, the expected aggregate loss excess of 200 is:
(2000 - 200)(.002) + (1100 - 200)(.028) + (1000 - 200)(0.088) = 99.2.
With a relative security loading of 175%, the insurance costs: (1 + 1.75)(99.2) = 273.
11.53. B. Prob[0 claims] = e-2. Prob[1 claim] = 2e-2. Prob[aggregate = 0] = Prob[0 claims] = e-2.
Prob[aggregate = 1] = Prob[1 claim] Prob[size = 1] = (2 e-2) (1/3) = 2e-2/3.
Limited Expected Value of Aggregate at 2 = (0)e-2 + (1)2e-2/3 + (2){1- (e-2 + 2e-2/3)} =
2 - 8e-2/3. Mean Severity = (1 + 2 + 3)/3 = 2. Mean Aggregate Loss = (2)(2) = 4.
Expected Excess of 2 = 4 - (2 - 8e-2/3) = 2 + 8e-2/3 = 2.36.
Alternately, let A = aggregate loss. E[A] = (2)(2) = 4.
E[(A-1)+] = E[A] - S(0) = 4 - (1 - e-2).
E[(A-8)+] = E[(A-1)+] - S(1) = 4 - (1 - e-2) - (1 - e-2 - 2e-2/3) = 2 + 8e-2/3 = 2.36.

11.54. B. From the previous solution, Prob[aggregate = 0] = Prob[0 claims] = e-2.

Prob[aggregate = 1] = Prob[1 claim] Prob[size = 1] = 2e-2/3.
Now the aggregate can be two if there are 2 claims of size 1 or 1 claim of size 2.
Prob[aggregate = 2] = (22 e-2 / 2)(1/3)2 + (2 e-2)(1/3) = 8e-2/9.
Thus the probability of a zero total payment by the insurer is: e-2 + 2e-2/3 + 8e-2/9 = 0.3459.
From the previous solution, expected claim payments are 2.36.
Thus the expected claim payments for this insurance policy when it is positive is:
2.36 / (1 - 0.3459) = 3.61.
11.55. C. For a geometric with β = 4: f(0) = 1/5 = 0.2, f(1) = 0.8f(0) = 0.16, f(2) = 0.8f(1) = 0.128.
E[S] = (4)(40) = 160. E[S ∧ 100] = 0f(0) + 40f(1) + 80f(2) + 100{1 - (f(0) + f(1) + f(2))} =
(40)(0.16) + (80)(0.128) + (100){1 - (0.2 + 0.16 + 0.128)} = 67.84.
E[(S - 100)+] = E[S] - E[S ∧ 100] = 160 - 67.84 = 92.16.
11.56. A. Prob[Agg = 0] = e-2 = 0.1353. Prob[Agg = 1] = 2e-2(0.6) = 0.1624.

Prob[Agg = 2] = Prob[1 loss of size 2 or 2 losses of size 1] = 2e-2(0.4) + (22 e-2/2)(0.62 ) = 0.2057.
E[A ∧ 3] = 0.1624 + (2)(0.2057) + (3)(1 - 0.1353 - 0.1624 - 0.2057) = 2.0636.
E[A] = (mean frequency)(mean severity) = (2)(1.4) = 2.8.
E[(A - 3)+] = E[A] - E[A ∧ 3] = 2.8 - 2.0636 = 0.7364.
Comment: The Exam Committee meant to say “subject to an aggregate deductible of 3.”
11.57. B. From the previous solution, Prob[Agg = 0] = e-2 = 0.1353,

Prob[Agg = 1] = 2e-2(0.6) = 0.1624,
Prob[Agg = 2] = Prob[1 loss of size 2 or 2 losses of size 1] = 2e-2(0.4) + (22 e-2/2)(.62 ) = 0.2057.
The aggregate can be three if: 3 clams of size 1, or one claim of size one and one claim of size 2.
Prob[Agg = 3] = (23 e-2/6)(0.63 ) + (22 e-2/2) {(2)(0.6)(0.4)} = 0.1689.
Thus the chance the insurer makes a positive payment is:
1 - (0.1353 + 0.1624 + 0.2057 + 0.1689) = 0.3277.
From the previous solution, the expected aggregate payment is 0.7364.
Thus the average of the positive aggregate payments is: 0.7364 / 0.3277 = 2.247.
Comment: In the exam question we are determining the average aggregate payment that the
insurer makes in a year, including those years in which the aggregate payment is zero. In contrast, in
this followup question we restrict our attention to only those years where the insurer makes a
positive payment.
11.58. B. For a Geometric Distribution with β = 2:

f(0) = 1/3, f(1) = (2/3)f(0) = 2/9, f(2) = (2/3)f(1) = 4/27.
The mean of the distribution of overtime hours is: (5)(.2) + (10)(.3) + (20)(.5) = 14.
The mean aggregate is: (2)(14) = 28.
Prob[Agg = 0] = Prob[0 projects] = 1/3.
Prob[Agg = 5] = Prob[1 project]Prob[5 overtime] = (2/9)(.2) = 0.04444.
Prob[Agg = 10] = Prob[1 project]Prob[10 overtime] + Prob[2 projects]Prob[5 overtime]2 =
(2/9)(0.3) + (4/27)(0.2)2 = 0.07259.
E[Agg ∧ 15] = (0)(1/3) + (5)(0.04444) + (10)(0.07259) + 15(1 - 1/3 - 0.04444 - 0.07259) = 9.19.
Expected overtime in excess of 15 is:
Mean[Agg] - E[Agg ∧ 15] = 28 - 9.19 = 18.81.
Alternately, one can use a recursive method, with steps of 5.
As above, E[A] = 28. Also get the first few values of the aggregate distribution as above.
E[(A - 5)+] = E[A] - 5SA(0) = 28 - (5)(1 - 1/3) = 24.667.
E[(A - 10)+] = E[(A - 5)+] - 5SA(5) = 24.667 - (5)(1 - 1/3 - 0.04444) = 21.556.
E[(A - 15)+] = E[(A - 10)+] - 5SA(10) = 21.556 - (5)(1 - 1/3 - 0.04444 - 0.07259) = 18.81.
11.59. D. Let A be the aggregate loss. E[A] = (5){(0.6)(5) + 0.4 k} = 15 + 2k.

Prob[A = 0] = Prob[0 claims] = e-5. Prob[A ≥ 5] = Prob[at least 1 claim] = 1 - e-5.
E[A ∧ 5] = (0)e-5 + 5(1 - e-5) = 5 - 5e-5.
28.03 = E[(A - 5)+] = E[A] - E[A ∧ 5] = 10 + 2k + 5e-5. ⇒ k = (18.03 - 5e-5)/2 = 9.
Comment: Given the output, solve for the missing input.
2016-C-3, Aggregate Distributions §12 Important Ideas, HCM 10/21/15, Page 350
Introduction (Section 1)
The Aggregate Loss is the total dollars of loss for an insured or set of an insureds.
Aggregate Losses = (Exposures) (Frequency) (Severity).
number of exposures, whatever they are for the particular situation, then
Aggregate Losses = (Frequency) (Severity).
aggregate losses ⇔ S.
frequency ⇔ N.
severity ⇔ X.
The collective risk model adds up the individual losses. Frequency is independent of severity
and the sizes of loss are independent, identically distributed variables.
The individual risk model adds up the amount paid on each insurance policy.
Loss Models list of advantages of separately analyzing frequency and severity:

1. The number of claims changes as the volume of business changes.
2. The effects of inflation can be incorporated.
3. One can adjust the severity distribution for changes in deductibles, maximum covered loss, etc.
4. One can adjust frequency for changes in deductibles.
5. One can appropriately combine data from policies with different deductibles and
maximum covered losses into a single severity distribution.
6. One can create consistent models for the insurer, insured, and reinsurer.
7. One can analyze the tail of the aggregate losses by separately analyzing the tails of
the frequency and severity.
Convolutions (Section 2)
Convolution calculates the density or distribution function of the sum of two independent variables.
There are discrete and continuous cases.
(f*g)(z) = ∑ f(x) g(z - x) = ∑ f(z - y) g(y) .

x y
(F*G)(z) = ∑ F(x) g(z - x) = ∑ f(z - y) G(y) = ∑ f(x) G(z- x) = ∑ F(z - y) g(y) .

x y x y
(f*g)(z) = ∫ f(x) g(z - x) dx = ∫ f(z - y) g(y) dy .

(F*G)(z) = ∫ f(x)G(z - x)dx = ∫ F(z - y)g(y)dy = ∫ F(x)g(z - x)dx = ∫ f(z - y)G(y)dy .
The convolution operator is commutative and associative: f* g = g* f. (f* g)* h = f* (g* h).
Using Convolutions (Section 3)
If frequency is N, if severity is X, frequency and severity are independent, and aggregate losses are
Agg then:
∞
FAgg(x) = ∑ fN (n) FX * n(x) .
n=0
∞
fA g g(x) = ∑ fN (n) fX * n (x).
n= 0
Generating Functions (Section 4)
Probability Generating Function: P X(t) = E[tx ] = MX(ln(t))
Moment Generating Function: M X(t) = E[et x] = PX( et )
The Moment Generating Functions of severity distributions, when they exist, are given in
Appendix A of Loss Models. The Probability Generating Functions of frequency distributions are
given in Appendix B of Loss Models. M(t) = P(et).
For an Exponential, M(t) = 1 / (1 - θt), t < 1/θ.
For a Poisson, P(z) = exp[λ(z - 1)], and M(t) = exp[λ(et - 1)].
The moment generating function of the sum of two independent variables is the product
of their moment generating functions: MX+Y(t) = MX(t) MY(t).
The Moment Generating Function converts convolution into multiplication:

M f * g = Mf M g .
The sum of n independent identically distributed variables has the Moment Generating
Function taken to the power n.
The m.g.f. of f*n is the nth power of the m.g.f. of f.
M X+b (t) = ebt MX(t). M cX(t) = E[ecxt] = MX(ct).
M cX + b (t) = ebt MX(ct).
M cX + dY + b(t) = ebt MX(ct) MY(dt), for X and Y independent.
The Moment Generating Function of the average of n independent, identically distributed variables
is the nth power of the Moment Generating Function of t/n.
The moment generating function determines the distribution, and vice-versa. Therefore,
one can take limits of a distribution by instead taking limits of the Moment Generating Function.
M(0) = 1 M ′(0) = E[X] M ′′(0) = E[X2 ]
M′′′(0) = E[X3 ] M ( n )(0) = E[Xn ]
∞
M X(t) = ∑ (nth moment of X) tn / n!
n=0
Moment Generating Functions only exist for distributions all of whose moments exist. However the
converse is not true. For the LogNormal Distribution the Moment Generating Function fails to exist,
even though all of its moments exist.
3
d2 ln[MX(t)] d ln[MX(t)]
dt2
| t = 0 = Var[X]. | t = 0 = 3rd central moment of X.
dt3
Let Agg be Aggregate Losses, X be severity and N be frequency, then the probability generating
function of the Aggregate Losses can be written in terms of the p.g.f. of the frequency and p.g.f. of
the severity:
PAgg(t) = PN(PX(t)).
The Moment Generating Function of the Aggregate Losses can be written in terms of the p.g.f. of
the frequency and m.g.f. of the severity:
M A g g(t) = PN [MX(t)] = MN[ln(MX(t))].
For any Compound Poisson distribution, MAgg(t) = exp[λ(MX(t)-1)].
The Moment Generating Function of a mixture is a mixture of the Moment Generating Functions.
Moments of Aggregate Losses (Section 5)
Mean Aggregate Loss = (Mean Frequency)(Mean Severity)
When frequency and severity are independent:

Process Variance of Aggregate Loss =
(Mean Freq.)(Variance of Severity) + (Mean Severity)2 (Variance of Freq.)
σAgg2 = µFreq σSev2 + µSev2 σFreq2.
The variance of a Compound Poisson is: λ (2nd moment of severity).
The mathematics of Aggregate Distributions and Compound Frequency Distributions are

the same:
Aggregate Dist. Compound Frequency Dist.
Frequency ⇔ Primary (# of cabs)
Severity ⇔ Secondary (# of passengers)
One can approximate the distribution of aggregate losses using the Normal
Approximation. One could also approximate aggregate losses via a LogNormal Distribution by
matching the first two moments.

(mean frequency) (third moment of the severity).
Recursive Method / Panjer Algorithm (Sections 7 and 8)
The Panjer Algorithm (recursive method) can be used to compute the aggregate distribution when
the severity distribution is discrete and the frequency distribution is a member of the
(a, b, 0) class.
If the frequency distribution is a member of the (a, b, 0) class:
x
1
c(0) = Pf (s(0)). c(x) = ∑ (a + jb / x) s(j) c(x - j) .
1 - a s(0) j=1
In situations in which there is a positive chance of a zero severity, it may be helpful to thin the
frequency distribution and work with the distribution of nonzero losses.
In the same manner, the Panjer Algorithm (recursive method) can be used to compute a compound
frequency distribution when the primary distribution is a member of the (a, b, 0) class.
If the frequency distribution, pk, is a member of the (a, b, 1) class:

x
s(x) {p1 - (a+ b) p0} 1
c(0) = Pf(s(0)). c(x) =
1 - a s(0)
+ ∑ (a + jb / x) s(j) c(x - j) .
1 - a s(0) j=1
Discretization (Section 9)
For the method of rounding with span h, construct the discrete distribution g:
g(0) = F(h/2). g(ih) = F(h(i + 1/2)) - F(h(i - 1/2)).
For the method of rounding, the original and approximating Distribution Function match at all of the
points halfway between the support of the discretized distribution.
In order to instead have the means match, the approximating densities are:
g(0) = 1 - E[X ∧ h]/h. g(ih) = {2E[X ∧ ih] - E[X ∧ (i-1)h] - E[X ∧ (i+1)h]} / h.
Analytic Results (Section 10)
Assume we have two independent Compound Poisson Distributions.

Then the sum of these two Compound Poissons is another Compound Poisson,
λ1 λ2
F(x) = F1 (x) + F (x).
λ1 + λ 2 λ1 + λ 2 2
Stop Loss Premiums (Section 11)
The stop loss premium is the expected aggregate losses excess of an aggregate
deductible.
The stop loss premium at zero is the mean; the stop loss premium at infinity is zero.
∞ ∞
expected losses excess of d = E[(Agg - d)+] = ∫d (t - d) f(t) dt = ∫d S(t) dt .
expected losses excess of d = E[(Agg - d)+] = ∑ (agg - d) f(agg).

agg > d
expected aggregate losses excess of d = E[Agg] - E[Agg ∧ d].
When there is no probability for the aggregate losses in an interval, the stop loss premium for
deductibles in this interval can be gotten by linear interpolation.
If the distribution of aggregate losses is discrete with span ΔAgg:
E[(Agg - (j+1)ΔAgg)+] = E[(Agg - jΔAgg)+] - ΔAgg S(jΔAgg).

Mahlerʼs Guide to
Risk Measures
Exam C
prepared by
Study Aid 2016-C-4
Howard Mahler
hmahler@mac.com
2016-C-4, Risk Measures, HCM 10/21/15, Page 1
Mahlerʼs Guide to Risk Measures
The Risk Measure concepts in Loss Models are discussed.1 2
Information presented in italics (and sections whose titles are in italics) should not be needed to
directly answer exam questions and should be skipped on first reading. It is provided to aid the
readerʼs overall understanding of the subject, and to be useful in practical applications.
Solutions to the problems in each section are at the end of that section.3

2 4-12 Premium Principles
3 13-23 Value at Risk
4 58-77
24-57 Tail Value at Risk
B 5 58-77 Distortion Risk Measures
6 78-85 Coherence
7 86-95 Using Simulation
8 96-97 Important Ideas and Formulas
Exam 4/C Exam Questions by Section of this Study Aid4
Question 27 of the Spring 2007 exam, in my Section 5, was on the Proportional Hazard
Transform, no longer on the syllabus.
The 11/07 and subsequent exams were not released.
1
See Section 3.5 and Section 20.4.4 of Loss Models.
2
Prior to 11/09 this material was from “An Introduction to Risk Measures in Actuarial Applications” by Mary Hardy.
3
Note that problems include both some written by me and some from past exams. Since this material was added to
the syllabus for 2007, there are few past exam questions. Past exam questions are copyright by the Casualty
Actuarial Society and the Society of Actuaries and are reproduced here solely to aid students in studying for
exams. The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no
responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is
intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work
extremely long and hard to produce quality exams. In some cases Iʼve rewritten past exam questions in order to
match the notation in the current Syllabus.
4
This topic was added to the syllabus in 2007.
2016-C-4, Risk Measures §1 Introduction, HCM 10/21/15, Page 2
Assume that aggregate annual losses (in millions of dollars) follow a LogNormal Distribution with
µ = 5 and σ = 1/2, with mean = exp[5 + (1/2)2 /2] = 168.174, second moment =
exp[(2)(5) + (2)(1/2)2 ] = 36,315.5, and variance = 36,315.5 - 168.1742 = 8033:
Prob.
0.006
0.005
0.004
0.003
0.002
0.001
x
100 200 300 400 500 600
Assume instead that aggregate annual losses (in millions of dollars) follow a LogNormal Distribution
with µ = 4.625 and σ = 1, with mean = exp[4.625 + 12 /2] = 168.174, second moment =
exp[(2)(4.625) + (2)(12 )] = 76,879.9, and variance = 76,879.9 - 168.1742 = 48,597:
Prob.
0.006
0.005
0.004
0.003
0.002
0.001
x
100 200 300 400 500 600
While the two portfolios have the same mean loss, the second portfolio has a much bigger
variance. The second portfolio has a larger probability of an extremely bad year. Therefore, we
would consider the second portfolio “risker” to insure than the first portfolio.
We will discuss various means to quantify the amount of risk, so-called risk measures.
2016-C-4, Risk Measures §1 Introduction, HCM 10/21/15, Page 3
There are three main uses of risk measures in insurance:5

1. Helping to determine the premium to charge.
2. Determining the appropriate amount of policyholder surplus (capital).
3. Helping to determine an appropriate amount for loss reserves.
We would expect that all other things being equal, an insurer would charge more to insure the riskier
second portfolio, than the less risky first portfolio.
We would expect that all other things being equal, an insurer should have more policyholder
surplus if insuring the riskier second portfolio, than the less risky first portfolio.6
Definition of a Risk Measure:
A risk measure is defined as a functional mapping of an aggregate loss distribution to

the real numbers.
ρ(X) is the notation used for the risk measure.
Given a specific choice of risk measure, a number is associated with each loss distribution
(distribution of aggregate losses), which encapsulates the risk associated with that loss distribution.
Exercise: Let the risk measure be: the mean + two standard deviations.7
In other words, ρ(X) = E[X] + 2 StdDev[X].
Determine the risk of the two portfolios discussed previously.
[Solution: For the first portfolio: 168.2 + 2 8025 = 347.4.
For the second portfolio: 168.2 + 2 48,589 = 609.1
Comment: As expected, using this measure, the second portfolio has a larger risk than the first.]
5
These same ideas can be applied with appropriate modification to banking.
6
What is an appropriate amount of surplus might be determined by an insurance regulator or by the market effects
of the possible ratings given to the insurer by a rating agency.
7
This is an example of the standard deviation premium principle, to be discussed in the next section.
2016-C-4, Risk Measures §2 Premium Principles, HCM 10/21/15, Page 4
Section 2, Premium Principles
Three simple premium principles will be discussed:

1. The Expected Value Premium Principle
2. The Standard Deviation Premium Principle
3. The Variance Premium Principle
Each premium principle generates a premium which is bigger than the expected loss.
The difference between the premium and the mean loss is the premium loading, which acts as a
cushion against adverse experience.
For a given loss distribution, different choices of risk measure result in different premiums.
As elsewhere on the syllabus of this exam, we ignore expenses, investment income, etc., unless
specifically stated otherwise.
The Expected Value Premium Principle:
For example, let the premium be 110% of the expected losses.8
More generally, ρ(X) = (1 + k)E[X], k > 0.
In the above example, k = 10%.
The Standard Deviation Premium Principle:9
For example, let the premium be the expected losses plus 1.645 times the standard deviation.
More generally, ρ(X) = E[X] + k Var[X] , k > 0.10
In the above example, k = 1.645.

Using the Normal Approximation, since Φ[1.645] = 95%, E[X] + 1.645 Var[X] is approximately
the 95th percentile of the aggregate distribution.11 Thus we would expect that the aggregate loss
would exceed the premium approximately 5% of the time.
8
On the exam, you would be given the 110%; you would not be responsible for selecting it.
9
10
While I have used the same letter k in the different risk measures, k does not have the same meaning.
11
The Normal Approximation is one common way to approximate an aggregate distribution, but not the only
method. See “Mahlerʼs Guide to Aggregate Distributions.”
The Variance Premium Principle:
σ2 = E[(X - E[X])2 ].
For example, let the premium be the expected losses plus 20% times the variance.12
More generally, ρ(X) = E[X] + k Var[X], k > 0.
In the above example, k = 0.2.
Further Reading:
There have been many discussions of the use of possible different methods of calculating risks
loads along the lines of these simple premium principles.13
Other risk measures have been developed more recently and have certain nice mathematical
properties.14
12
On the exam, you would be given the 20%; you would not be responsible for selecting it.
13
See for example, “Reinsurer Risk Loads from Marginal Surplus Requirements” by Rodney Kreps, PCAS 1990
and discussion by Daniel F. Gogol, PCAS 1992; “Risk Loads for Insurers,” by Sholom Feldblum, PCAS 1990,
discussion by Steve Philbrick PCAS 1991, Authorʼs reply PCAS 1993, discussion by Todd R. Bault, PCAS 1995;
“The Competitive Market Equilibrium Risk Load Formula,” by Glenn G. Meyers, PCAS 1991, discussion by Ira
Robbin PCAS 1992, Authorʼs reply PCAS 1993; “Balancing Transaction Costs and Risk Load in Risk Sharing
Arrangements,” by Clive L. Keatinge, PCAS 1995; “Pricing to Optimize an Insurerʼs Risk-Return Relationship,”
Daniel F. Gogol,
14
In the CAS literature, there have continued to be many papers on this subject. See for example, “An Application
of a Game Theory: Property Catastrophe Risk Load,” PCAS 1998, “Capital Consumption: An Alternative
Methodology for Pricing Reinsurance”, by Donald Mango, Winter 2003 CAS Forum,
“Implementation of PH-Transforms in Ratemaking” by Gary Venter, PCAS 1998,
and The Dynamic Financial Analysis Call Papers in the Spring 2001 CAS Forum.
Problems:
2.1 (2 points) Suppose S is a compound Poisson distribution of aggregate claims with a mean
number of claims = 500 and with individual claim amounts distributed as an Exponential with mean
1000. The insurer wishes to collect a premium equal to the mean plus one standard deviation of
the aggregate claims distribution. Calculate the required premium. Ignore expenses.
(A) 500,000 (B) 510,000 (C) 520,000 (D) 530,000 (E) 540,000
2.2 (2 points) For an insured portfolio, you are given:

(i) the number of claims has a Geometric distribution with β = 12.
(ii) individual claim amounts can take values of 1, 5, or 25 with equal probability.
(iii) the number of claims and claim amounts are independent.
(iv) the premium charged equals expected aggregate claims
plus 2% of the variance of aggregate claims.
Determine the premium charged.
(A) 200 (B) 300 (C) 400 (D) 500 (E) 600
2.3 (2 points) For aggregate claims S = X1 + X2 + ...+ XN:

(i) N has a Poisson distribution with mean 400.
(ii) X1 , X2 . . . have mean 2 and variance 3.
(iii) N, X1 , X2 . . . are mutually independent.
Three actuaries each propose premiums based on different premium principles.
Wallace proposes using the expected value premium principle with k = 15%.
Yasmin proposes using the standard deviation premium principle with k = 200%.
Zachary proposes using the variance premium principle with k = 5%.
Rank the three proposed premiums from smallest to largest.
(A) Wallace, Yasmin, Zachary
(B) Wallace, Zachary, Yasmin
(C) Yasmin, Zachary, Wallace
(D) Zachary, Yasmin, Wallace
(E) None of A, B, C, or D

• An insurer has a portfolio of 1000 insured properties as shown below.
Property Value Number of Properties
$50,000 300
$100,000 500
$200,000 200
• The annual probability of a claim for each of the insured properties is .03.
• Each property is independent of the others.
• Assume only total losses are possible.
2.4 (2 points) Insurance premiums are set at the mean loss plus one standard deviation.
Determine the premium.
(A) Less than 3.5 million
(B) At least 3.5 million, but less than 3.6 million
(C) At least 3.6 million, but less than 3.7 million
(D) At least 3.7 million, but less than 3.8 million
(E) At least 3.8 million
2.5 (2 points) The insurer buys reinsurance with a retention of $75,000 on each property.
(For example, in the case of a loss of $200,000, the insurer would pay $75,000, while the
reinsurer would pay $125,000.)
The annual reinsurance premium is set at 110% of the expected annual excess claims.
Insurance premiums are set at the reinsurance premiums plus mean annual retained loss plus one
standard deviation of the annual retained loss.
Determine the premium.
(A) Less than 3.5 million
(B) At least 3.5 million, but less than 3.6 million
(C) At least 3.6 million, but less than 3.7 million
(D) At least 3.7 million, but less than 3.8 million
(E) At least 3.8 million
2.6 (2 points) Annual aggregate losses have the following distribution:

Annual Aggregate Losses Probability
0 50%
10 30%
20 10%
50 5%
100 5%
Determine the variance premium principle with k = 1%.
A. 16 B. 18 C. 20 D. 22 E. 24
2.7 (2 points) Aggregate losses have the following distribution:

Aggregate Losses Probability
0 70%
100 20%
500 9%
1000 1%
Determine the standard deviation premium principle with k = 1.645.
A. 325 B. 350 C. 375 D. 400 E. 425

For aggregate claims S = X1 + X2 + ...+ XN:
(i) N has a Poisson distribution with mean 0.5
(ii) X1 , X2 . . . have mean 100 and variance 100
(iii) N, X1 , X2 . . . are mutually independent.
For a portfolio of insurance policies, the loss ratio during a premium period is the ratio of aggregate
claims to aggregate premiums collected during the period.
The relative security loading, (premiums / expected losses) - 1, is 0.1.
Using the normal approximation to the compound Poisson distribution, calculate the probability that
the loss ratio exceeds 0.75 during a particular period.
(A) 0.43 (B) 0.45 (C) 0.50 (D) 0.55 (E) 0.57
2.9 (Course 151 Sample Exam #2, Q.12) (1.7 points) An insurer provides life insurance for the
following group of independent lives:
Number Death Probability
of Lives Benefit of Death
100 1 0.01
200 2 0.02
300 3 0.03
The insurer purchases reinsurance with a retention of 2 on each life.
The reinsurer charges a premium H equal to its expected claims plus the standard deviation of its
claims.
The insurer charges a premium G equal to expected retained claims plus the standard deviation of
retained claims plus H.
Determine G.
(A) 44 (B) 46 (C) 70 (D) 94 (E) 96
2.10 (Course 151 Sample Exam #3, Q.3) (0.8 points) A company buys insurance to cover
medical claims in excess of 50 for each of its three employees. You are given:
(i) claims per employee are independent with the following distribution:
x p(x)
0 0.4
50 0.4
100 0.2
(ii) the insurer's relative security loading, (premiums / expected losses) - 1, is 50%.
Determine the premium for this insurance.
(A) 30 (B) 35 (C) 40 (D) 45 (E) 50
2.11 (5A, 11/94, Q.34) (2 points) You are the actuary for Abnormal Insurance Company.
You are assigned the task of setting the initial surplus such that the probability of losses less
premiums collected exceeding this surplus at the end of the year is 2%.
Company premiums were set equal to 120% of expected losses.
Assume that the aggregate losses are distributed according to the information below:
Prob(Aggregate Losses < L) = 1 - [10,000,000 / (L + 10,000,000)]2 .
What is the lowest value of the initial surplus that will satisfy the requirements described above?
2.12 (5A, 5/95, Q.35) (1 point) Suppose S is a compound Poisson distribution of aggregate
claims with a mean number of claims = 2 and with individual claim amounts distributed as
exponential with E(X) = 5 and VAR(X) = 25.
The insurer wishes to collect a premium equal to the mean plus one standard deviation of the
aggregate claims distribution.
Calculate the required premium. Ignore expenses.
2.1. D. E[S] = λθ = (500)(1000)= 500,000. Var[S] = λ2θ2 = (500)(2)(10002 ) = 1,000,000,000.

E[S] + Var[S] = 500,000 + 1,000,000,000 = 531,622.
2.2. D. E[N] = 12. Var[N] = (12)(12 + 1) = 156.

E[X] = (1 + 5 + 25)/3 = 10.333.
E[X2 ] = (12 + 52 + 252 )/3 = 217.
Var[X] = 217 - 10.3332 = 110.2.
The aggregate has mean: (12)(10.333) = 124.
The aggregate has variance: (12)(110.2) + (10.3332 )(156) = 17,979.
E[S] + (0.02)Var[S] = 124 + (2%)(17,979) = 484.
2.3. E. Mean of aggregate is: (400)(2) = 800.

Variance of aggregate is: λ(second moment of severity) = (400)(3 + 22 ) = 2800.
Wallaceʼs proposed premium is: (1.15)(800) = 920.
Yasminʼs proposed premium is: 800 + (2) 2800 = 905.8.
Zacharyʼs proposed premium is: 800 + (.05)(2800) = 940
From smallest to largest: Yasmin, Wallace, Zachary.
2.4. D. Frequency is Binomial with m = 1000 and q = .03.

Mean frequency is: (1000)(.03) = 30. Variance of Frequency is: (1000)(.03)(.97) = 29.1.
Mean severity is: (30%)(50000) + (50%)(100000) + (20%)(200000) = 105,000.
Variance of severity is:
(30%)(50000 - 105000)2 + (50%)(100000 - 105000)2 + (20%)(200000 - 105000)2 =
2725 million.
Mean aggregate is: (30)(105,000) = 3.15 million.
Variance of aggregate is: (30)(2725 million) + (105,000)2 (29.1) = 402,577.5 million.
Premium is: 3.15 million + 402,577.5 million = 3.15 million + 0.63 million = 3.78 million.
2.5. C. For a $50,000 loss, all $50,000 is retained. For a loss of either $100,000 or $200,000,
$75,000 is retained. The mean retained severity is: (30%)(50000) + (70%)(75000) = 67,500.
The mean aggregate retained is: (30)(67,500) = 2.025 million.
Therefore the mean aggregate excess is: 3.15 million - 2.025 million = 1.125 million.
The reinsurance premium is: (110%)(1.125 million) = 1.238 million.
Variance of retained severity is: (30%)(50000 - 67,500)2 + (70%)(75000 - 67,500)2 =
131.25 million.
Variance of aggregate retained is: (30)(131.25 million) + (67,500)2 (29.1) = 136,524.4 million.
Premium is: 1.238 million + 2.025 million + 136,524.4 million = 3.63 million.
Comment: Purchasing reinsurance has reduced the risk of the insurer.
2.6. B. E[X] = (0)(50%) + (30%)(10) + (10%)(20) + (5%)(50) + (5%)(100) = 12.5.

σ2 = (0 - 12.5)2 (50%) + (30%)(10 - 12.5)2 + (10%)(20 - 12.5)2 + (5%)(50 - 12.5)2
+ (5%)(100 - 12.5)2 = 538.75.
E[X] + (1%)σ2 = 12.5 + (0.01)( 538.75) = 17.9.
2.7. B. The mean is: (0.7)(0) + (0.2)(100) + (0.09)(500) + (0.01)(1000) = 75.

The second moment is: (0.7)(02 ) + (0.2)(1002 ) + (0.09)(5002 ) + (0.01)(10002 ) = 34,500.
34,500 - 752 = 169.93.
The standard deviation is:
Mean plus 1.645 standard deviations is: 75 + (1.645)(169.93) = 355.
2.8. D. The mean aggregate loss is: (100)(0.5) = 50.

The premiums are: (1.1)(50) = 55.
Since frequency is Poisson, the variance of the aggregate loss is:
(mean frequency)(second moment of the severity) = (0.5)(100 + 1002 ) = 5050.
The loss ratio is 75% if the loss is: (55)(.75) = 41.25. Thus the loss ratio exceeds 75% if the loss
exceeds 41.25. Thus using the Normal approximation, the probability that the loss ratio exceeds
75% is: 1 - Φ[(41.25 - 50)/ 5050 ] = 1 - Φ(-0.12) = Φ(0.12) = 0.5478.
2.9. B. For the insurer, the mean payment is:

(100)(0.01)(1) + (200)(0.02)(2) +(300)(0.03)(2) = 1 + 8 + 18 = 27.
For the insurer, the variance of payments is :
(100)(0.01)(0.99)(12 ) + (200)(0.02)(0.98)(22 ) + (300)(0.03)(0.97)(22 ) = 51.59.
For the reinsurer, the mean payment is :
(100)(0.01)(0) + (200)(0.02)(0) + (300)(0.03)(1) = 9.
For the reinsurer, the variance of payments is :
(100)(0.01)(0.99)(02 ) + (200)(0.02)(0.98)(02 ) + (300)(0.03)(0.97)(12 ) = 8.73.
Reinsurerʼs premium = 9 + 8.73 = 11.955.
Insurerʼs premium = 27 + 51.59 + 11.955 = 46.14.
2.10. D. The expected payment per employee is: (0)(0.4) + (0)(0.4) + (100 - 50)(0.2) = 10.
The expected aggregate payments are: (3)(10) = 30. The premiums = (1.5)(30) = 45.
2.11. The distribution of L is a Pareto Distribution with α = 2 and θ = 10 million.

Therefore, E[L] = θ/(α-1) = $10 million.
Premiums are (1.2)E(L) = (1.2)($10 million) = $12 million.
The 98th percentile of the distribution of aggregate losses is such that
0.02 = [10,000,000 / (L + 10,000,000)]2 . Therefore 98th percentile of L = 60.71 million.
Therefore, we require that: 60.71 million = initial surplus + $12 million.
initial surplus = $48.71 million.
Comment: Use the 98th percentile of the given Pareto Distribution, rather than the Normal
Approximation to the Pareto Distribution.
2.12. The mean of the aggregate losses = (2)(5) =10.

The variance of aggregate losses = (2)(25) + (2)(52 ) = 100. Mean + Stddev = 10 + 10 = 20.
2016-C-4, Risk Measures §3 Value at Risk, HCM 10/21/15, Page 13
Section 3, Value at Risk15 16
In this section, another risk measure will be discussed:

Value at Risk ⇔ VaR ⇔ Quantile Risk Measure ⇔ Quantile Premium Principle.
Percentiles:
Exercise: Assume that aggregate annual losses (in millions of dollars) follow a LogNormal
Distribution with µ = 5 and σ = 1/2. Determine the 95th percentile of this distribution.
[Solution: 0.95 = F(x) = Φ[(lnx - 5)/(1/2)]. ⇒ (2)(lnx - 5) = 1.645.
⇒ x = exp[5 + (1/2)(1.645)] = 337.8.

Comment: Find the 95th percentile of the underlying Normal and exponentiate.]
In other words, for this portfolio, there is a 95% chance that the aggregate loss is less than 337.8.
π p is the 100pth percentile.

For this portfolio, π95% = 337.8.
Quantiles:
The 95th percentile is also referred to as Q0.95, the 95% quantile.

For this portfolio, the 95% quantile is 337.8.
median ⇔ Q0.50 ⇔ 50% quantile.
15
16
Value at Risk is also discussed in Chapter 25 of Derivative Markets by McDonald, not on the syllabus.
Definition of the Value at Risk:
The Value at Risk, VaRp , is defined as the 100pt h percentile.

p is sometimes called the security level.
VaRp (X) = π p .
If aggregate annual losses follow a LogNormal Distribution with µ = 5 and σ = 1/2, then
VaR95% is the 95th percentile, or 337.8.
For this LogNormal Distribution with µ = 5 and σ = 1/2, here is a graph of VaRp as a function of p:
VaR
700
600
500
400
300
200
100
p
0.2 0.4 0.6 0.8 0.9 0.999
Exercise: If annual aggregate losses follow a Weibull Distribution with θ = 10 and τ = 3,

determine VaR90%.
[Solution: 0.90 = 1 - exp[-(x/10)3 ]. ⇒ x = 13.205.

Comment: We have determined the 90th percentile of this Weibull Distribution.
As shown in Appendix A: VaRp (X) = θ {-ln(1-p)}1/τ ].
In Appendix A of the Tables attached to the exam, there are formulas for VaRp (X) for a
many of the distributions.17
17
This will also help in finding percentiles and in performing simulation by inversion.
Distribution VaRp (X)
Exponential -θ ln(1-p)
Pareto θ {(1-p)-1/α - 1}
Weibull θ {-ln(1-p)}1/τ
Single Parameter Pareto θ (1- p) - 1/ α
Loglogistic θ {p-1 - 1}-1/γ
Inverse Pareto θ {p-1/τ - 1}-1
Inverse Weibull θ {-ln(p)}−1/τ
Burr θ {(1-p)-1/α - 1}1/γ
Inverse Burr θ {p-1/τ - 1}-1/γ
Inverse Exponential θ {-ln(p)}-1
Paralogistic θ {(1-p)-1/α - 1}1/α
Inverse Paralogistic θ {p-1/τ - 1}-1/τ
Normal18 µ + σ zp .
18
Not shown in Appendix A attached to the exam.
zp is the pth percentile of the Standard Normal.
Problems:
3.1 (1 point) Losses are Normal with µ = 1000 and σ = 25.

Determine the VaR80%.
A. 1010 B. 1020 C. 1030 D. 1040 E. 1050
3.2 (2 points) Losses follow a LogNormal Distribution with µ = 8 and σ = 0.7.

Premiums are 110% of expected losses.
Determine the amount of policyholder surplus the insurer must have so that there is a 10% chance
that the losses will exceed the premium plus surplus.
(A) Less than 2500
(E) At least 4000
3.3 (1 point) If annual aggregate losses follow a Pareto Distribution with α = 4 and θ = 100,
determine VaR95%.
A. 80 B. 90 C. 100 D. 110 E. 120
3.4 (2 points) A group medical insurance policy covers the medical expenses incurred by 2000
mutually independent lives.
The annual loss amount, X, incurred by each life is distributed as follows:
x Pr(X=x)
0 0.40
100 0.40
1000 0.15
5000 0.05
The premium is equal to the 99th percentile of the normal distribution which approximates the
distribution of total claims. Determine the premium per life.
(A) Less than 470
(E) At least 500
3.5 (3 points) Annual Losses for the Rocky Insurance Company are Normal with mean 20 and
standard deviation 3.
Annual Losses for the Bullwinkle Insurance Company are Normal with mean 30 and standard
deviation 4.
The annual losses for the Rocky and Bullwinkle companies have a correlation of 60%.
(i) Determine the VaR90% for the Rocky Insurance Company.
(ii) Determine the VaR90% for the Bullwinkle Insurance Company.
(iii) The Rocky and Bullwinkle companies merge.
Determine the VaR90% for the merged company.
3.6 (1 point) Losses follow a Weibull Distribution with θ = 10 and τ = 0.3, for a 99% security level
determine the Value at Risk.
(A) Less than 1500
(E) At least 3000
3.7 (1 point) Annual aggregate losses have the following distribution:

0 50%
10 30%
20 10%
50 4%
100 2%
200 2%
500 1%
1000 1%
Determine the 95% Value at Risk.
A. 60 B. 70 C. 80 D. 90 E. 100
3.8 (Course 151 Sample Exam #2, Q.11) (1.7 points) A group medical insurance policy
covers the medical expenses incurred by 100,000 mutually independent lives.
The annual loss amount, X, incurred by each life is distributed as follows:
x Pr(X=x)
0 0.30
50 0.10
200 0.10
500 0.20
1,000 0.20
10,000 0.10
The policy pays 80% of the annual losses for each life.
The premium is equal to the 95th percentile of the normal distribution which approximates the
distribution of total claims.
Determine the difference between the premium and the expected aggregate payments.
(A) 1,213,000 (B) 1,356,000 (C) 1,446,000 (D) 1,516,000 (E) 1,624,000
3.9 (5A, 11/94, Q.36) (2 points) An auto insurer has 2 classes of insureds with the following claim
probabilities and distribution of claim amounts:
Number of Probability of Claim
Class Insureds One Claim Severity
1 400 0.10 3,000
2 600 0.05 2,000
An insured will have either no claims or exactly one claim.
The size of claim for each class is constant.
The insurer wants to collect a total dollar amount such that the probability of total claims dollars
exceeding that amount is 5%. Using the normal approximation and ignoring expenses, how much
should the insurer collect?
3.10 (5A, 11/95, Q.35) (2 points) An insurance company has two classes of insureds with the
following claim probabilities and distribution of claim amounts:
Number of Probability Claim
Class Insureds of 1 claim Severity
1 1,000 0.15 $600
2 5,000 0.05 $800
The probability of an insured having more than one loss is zero. The company wants to collect an
amount equal to the 95th percentile of the distribution of aggregate losses.
Determine the total premium.
3.11 (5A, 11/98, Q.35) (2 points) You are a pricing actuary offering a new coverage and you
have analyzed the distribution of losses capped at various limits shown below:
Capped Limit Expected Value Variance
30,000 500 100,000
25,000 450 50,000
20,000 400 40,000
15,000 350 28,000
10,000 250 14,000
5,000 200 9,000
Your chief actuary requires that the premiums be at the 95th percentile of the distribution of losses.
The general manager requires that the difference between the premiums and the expected losses
be no greater than $200. What is the highest limit of the new coverage that can be written
consistent with these requirements?
3.12 (5A, 11/99, Q.37) (2 points) An insurer issues 1-year warranty coverage policies to two
different types of insureds. Group 1 insureds have a probability of having a claim of .05 and
Group 2 insureds have a probability of having a claim of 0.10. There are two possible claim
amounts of $500 and $1,000. The following table shows the number of insureds in each class.
Class Prob. of Claim Claim Amount # of Insureds
1 0.05 $500 200
2 0.10 $500 200
3 0.05 $1000 300
4 0.10 $1000 250
Using the Normal Approximation, how much premium should the insurer collect such that the
collected premium equals the 95th percentile of the distribution of total claims?
3.13 (8, 5/09, Q.28) (2.25 points) Given the following information about Portfolios A and B:
• The returns on a stock are Normally distributed.
• The volatility is the standard deviation of the returns on a stock.
• If you buy stocks, then the loss is the difference between the initial cost of the portfolio
and the current value of the portfolio.
• The value of Portfolio A is $15 million and consists only of Company A stock.
• The daily volatility of Portfolio A is 3%.
• The value of Portfolio B is $7 million and consists only of Company B stock.
• The daily volatility of Portfolio B is 2%.
• The correlation coefficient between Company A and Company B stock prices is 0.40.
a. (0.75 point) Calculate the 10-day 99% Value-at-Risk (VaR) for Portfolio A.
b. (0.75 point) Calculate the 10-day 99% VaR for a portfolio consisting of Portfolios A and B.
Note: I have revised this past exam question.
3.1. B. .80 = F(x) = Φ[(x - 1000)/25]. ⇒ (x - 1000)/25 = 0.842
⇒ x = 1000 + (25)(0.842) = 1021.
3.2. C. E[X] = exp[8 + .72 /2] = 3808. Premium is: (1.1)(3808) = 4189.
.90 = F(x) = Φ[(lnx - 8)/0.7]. ⇒ (lnx - 8)/0.7 = 1.282.
⇒ x = exp[8 + (0.7)(1.282)] = 7313. 90th percentile of the LogNormal is 7313.

Required surplus is: 7313 - 4189 = 3124.
3.3. D. 0.95 = 1 - {100/(100 + x)}4 . ⇒ 20 = (1 + x/100)4 . ⇒ x = 111.5.

As shown in Appendix A, for a Pareto Distribution with parameters α and θ, α > 1:
VaRp (X) = θ [(1-p)-1/α - 1]. VaR0.95 = (100) {(0.05)-1/4 - 1} = 111.5.
3.4. D. E[X] = (0)(0.4) + (100)(0.4) + (1000)(0.15) + (5000)(0.05) = 440.

E[X2 ] = (02 )(0.4) + (1002 )(0.4) + (10002 )(0.15) + (50002 )(0.05) = 1,404,000.
Var[X] = 1,404,000 - 4402 = 1,210,400.
The aggregate has mean: (2000)(440) and variance: (2000)(1,210,400).
Φ[2.326] = 0.99. Total premium is: (2000)(440) + (2.326) (2000)(1,210,400) .
Premium per life is: 440 + (2.326) 1,210,400 / 2000 = 497.2.
3.5. (i) For Rocky, VaR90% is: 20 + (1.282)(3) = 23.846.

(ii) For Bullwinkle, VaR90% is: 30 + (1.282)(4) = 35.128.
(iii) Annual losses for Rocky plus Bullwinkle are Normal with mean: 20 + 30 = 50,
and variance: 32 + 42 + (2)(.6)(3)(4) = 39.4.
For Rocky plus Bullwinkle, VaR90% is: 50 + (1.282)( 39.4 ) = 58.047.
Comment: 58.047 < 58.974 = 23.846 + 35.128. Merging has reduced the risk measure, an
example of the advantage of diversification. As will be discussed with respect to coherent risk
measures, this property is called subadditivity. While Value at Risk is usually subadditive, it is not
always subadditive.
3.6. B. .99 = 1 - exp[-(x/10)0.3]. ⇒ 100 = exp[(x/10)0.3]. ⇒ x = 1625.
As shown in Appendix A: VaRp (X) = θ [ -ln(1-p) ]1/τ.
VaR0.99(X) = (10) { -ln(0.01) ]1/0.3 = 1625.
3.7. E. F(50) = 94% < 95%. F(100) = 96% ≥ 95%. Thus 100 is the 95% VaR.
3.8. A. The variance of the severity is: 10,254,250 - 13252 = 8,498,625.

first second
x density moment moment
0 0.3 0 0
50 0.1 5 250
200 0.1 20 4,000
500 0.2 100 50,000
1000 0.2 200 200,000
10000 0.1 1000 10,000,000
1325 10,254,250
The mean aggregate payment by the insurer is: (100000)(.8)(1325) = 106 million.
The variance of the insurerʼs aggregate payment is: (.82 )(100000)(8,498,625).
The standard deviation is: 737,503. For the 95th percentile, one adds 1.645 standard deviations
to the mean. Thus the premium is: 106,000,000 + (1.645)(737,503).
Premiums - expected aggregate payments = (1.645)(737,503) = 1,213,192.
3.9. Mean Aggregate Loss = (400)(.10)(3000) + (600)(.05)(2000) = 180,000.

Variance of Aggregate Losses = (400)(.10)(.9)(30002 ) + (600)(.05)(.95)(20002 ) =
438,000,000.
Since the 95th percentile of the Unit Normal Distribution is 1.645, we want to collect:
Mean + 1.645 Standard Deviations = 180,000 + 1.645 438,000,000 = 214,427.
3.10. The mean loss is: (0.15)(600)(1000) + (0.05)(800)(5000) = 290,000.

The variance of aggregate losses is:
(0.15)(.85)(6002 )(1000) + (0.05)(.95)(8002 )(5000) = 197,900,000.
The 95th percentile of aggregate losses is approximately:
290,000 + (1.645) 197,900,000 = 290,000 + 23,141 = 313,141.
Comment: The relative security loading is: 23,141/290,000 = 8.0%.
3.11. Using the Normal Approximation, The 95th percentile is approximately:

mean + 1.645(Standard Deviation), .
The difference between the premiums and the expected losses is: 1.645(Standard Deviation).
Therefore, we require 1.645(Standard Deviation) < 200 ⇒ Variance < 14,782. The highest limit of
the new coverage that can be written consistent with these requirements is $10,000.
3.12. With severity s, Bernoulli parameter q, and n insureds:

mean of aggregate losses = nqs, variance of aggregate losses = nq(1-q)s2 .
Class Frequency Severity # of Insureds Mean Variance
1 0.05 500 200 5,000 2,375,000
2 0.10 500 200 10,000 4,500,000
3 0.05 1000 300 15,000 14,250,000
4 0.10 1000 250 25,000 22,500,000
Overall 55,000 43,625,000
Approximate the distribution of aggregate losses by the Normal Distribution with the same mean
and variance. The 95th percentile ≅ 55000 + 1.645 43,625,000 = 65,865.
3.13. a. Φ[2.326] = 99%.

Assuming the returns on different days are independent, the variances add; variances are
multiplied by N, while standard deviations are multiplied by N.
The volatility over ten days is: 0.03 10 .
One standard deviation of movement in value is: ($15 million) (0.03 10 ).
The 1% worst outcomes are when the value declines by 2.326 standard deviations or more.
VaR0.99 = ($15 million) (2.326) (0.03 10 ) = 3.31 million.
b. The standard deviation of the daily change in the value of the portfolio is:
(152 )(0.032 ) + (7 2 )(0.022) + (2)(0.4)(15)(0.03)(7)(0.02) = 0.522 million.
VaR0.99 = (0.522 million) (2.326) 10 = 3.84 million.

2016-C-4, Risk Measures §4 Tail Value at Risk, HCM 10/21/15, Page 24
Section 4, Tail Value at Risk19 20 21
In this section, another risk measure will be discussed:

Tail Value at Risk ⇔ TVaR ⇔ Conditional Tail Expectation ⇔ CTE ⇔
⇔ Tail Conditional Expectation ⇔ TCE ⇔ Expected Shortfall ⇔ Expected Tail Loss.
Definition of the Tail Value at Risk:
For a given value of p, the security level, the Tail Value at Risk of a loss distribution is defined as
the average of the 1 - p worst outcomes: TVaRp (X) ≡ E[X | X > πp ].
The corresponding risk measure is: ρ(X) = TVaRp (X).
Exercise: The aggregate losses are uniform from 0 to 100. Determine TVaR0.80 and TVaR0.90.
[Solution: TVaR0.80 = (100 + 80)/2 = 90. TVaR0.90 = (100 + 90)/2 = 95.]
As with the Value at Risk, for larger choices of p, the Tail Value at Risk is larger, all other things
being equal.
TVaRp = average size of those losses of size greater than the pth percentile, πp .
∞ ∞ ∞
TVaRp = ∫ x f(x) dx / ∫ f(x) dx = ∫ x f(x) dx / (1−p).
πp πp πp
The average size of those losses of size between a and b is:22

E[X | b > X > a] = ({E[X ∧ b] - b S(b)} - {E[X ∧ a] - a S(a)}) / {F(b) - F(a)}.
Letting a = πp and b = ∞:
TVaRp = ({E[X] - 0} - {E[X ∧ πp ] - πp S(πp )} ) / {1 - F(πp )}
= {E[X] - E[X ∧ πp ] + πp (1 - p)}/(1 - p) = πp + (E[X] - E[X ∧ πp ])/(1 - p).
TVaRp (X) = π p + (E[X] - E[X ∧ πp ]) / (1 - p).
19
20
For an example of an application, see “DFA Insurance Company Case Study, Part 2 Capital Adequacy and
Capital Allocation,” by Stephen W. Philbrick and Robert A. Painter, in the Spring 2001 CAS Forum.
21
This is also discussed in Section 25.2 of Derivative Markets by McDonald, not on the syllabus.
22
Exercise: Losses follow a Pareto Distribution with α = 3 and θ = 20.

Determine TVaR0.90.
[Solution: Set 0.90 = F(π0.90) = 1 - {20/(π0.90 + 20)}3 . ⇒ π0.90 = 23.09.
E[X] = θ/(α -1) = 20/(3 - 1) = 10.

E[X ∧ 23.09] = {θ/(α -1)}{1 - {20/(23.09 + 20)}2 } = 7.846.
TVaR0.90 = π0.90 + (E[X] - E[X ∧ π0.90])/(1 - 0.90) = 23.09 + (10 - 7.846)/0.1 = 44.63.
Alternately, X truncated and shifted from below at 23.09 is Pareto with α = 3 and θ = 20 + 23.09 =
43.09, with mean 43.09/(3 - 1) = 21.54.
TVaR0.90 = E[X | X > π0.90] = 23.09 + 21.54 = 44.63.
Comment: As shown in Appendix A of the Tables attached to the exam:
θ (1- p) - 1/ α
TVaRp = θ [(1-p)−1/α - 1] + , for α > 1. ]
α - 1
For this Pareto Distribution with α = 3 and θ = 20, here is a graph of TVaRp as a function of p:
TVaR
250
200
150
100
50
p
0.2 0.4 0.6 0.8 0.9 0.999
TVaR0 (X) = E[X | over the worst 100% of outcomes] = E[X].

For a loss distribution with a maximum, TVaR1 (X) = Max[X].
In Appendix A, there are formulas for TVaRp (X) for a few of the distributions:
Exponential, Pareto, Single Parameter Pareto.
Distribution TVaRp (X)
Exponential -θ ln(1-p) + θ
θ (1- p) - 1/ α
Pareto θ {(1-p)-1/α - 1} + ,α>1
α - 1
αθ (1- p) -1/ α
Single Parameter Pareto ,α>1
α - 1
Normal23 µ + σ φ[zp ] / (1 - p)
23
Not shown in Appendix A attached to the exam.
zp is the pth percentile of the Standard Normal, and φ is the density of the Standard Normal.
For example, z0.975 = 1.960.
Relationship to the Mean Excess Loss:
The mean excess loss, e(x) = E[X - x | X > x] = E[X | X > x] - x.24
Therefore, E[X | X > x] = x + e(x).
Therefore, TVaRp (X) = E[X | X > πp ] = πp + e(πp ).
This matches a previous formula, since e(πp ) = (E[X] - E[X ∧ πp ])/S(πp ) =
(E[X] - E[X ∧ πp ])/(1 - p). This form of the formula for the TVaR can be useful in those cases where
one remembers the form of the mean residual life.
For example, for a Pareto Distribution with α = 3 and θ = 20, as determined previously,
π 0.90 = 23.09. The mean excess loss for a Pareto is e(x) = (x + θ )/(α - 1).
Therefore, e(23.09) = (23.09 + 20)/(3 - 1) = 21.54.
TVaR0.90 = 23.09 + 21.54 = 44.63, matching the previous result.
For a Pareto Distribution with parameters α and θ, α > 1:
π p = θ{(1 - p)−1/α - 1}.
e(πp ) = (πp + θ )/(α - 1) = θ(1 - p)−1/α/(α - 1).
TVaRp = πp + e(πp ) = θ{(1 - p)−1/α α/(α - 1) - 1}.25
For the above example, TVaR0.90 = 20{(0.1-1/3)(3/2) - 1} = 44.63, matching the previous result.
Exercise: For an Exponential Distribution with mean 600, determine TVaR0.99.
[Solution: Set 0.99 = 1 - exp[-π0.99/600]. ⇒ π0.99 = 2763.
For the Exponential, e(x) = θ = 600. ⇒ TVaR0.99 = π0.99 + e(π0.99) = 2763 + 600 = 3363.]
For an Exponential Distribution with mean θ:

π p = -θ ln[1 - p]. e(πp ) = θ.
TVaRp = πp + e(πp ) = θ(1 - ln[1 - p]).26

For the above example, TVaR0.99 = 600(1 - ln[.01]) = 3363, matching the previous result.
24
25
I would not memorize this formula.
26
A Example with a Discrete Distribution:
Let us assume that the aggregate distribution is:

10 50%
50 30%
100 10%
500 8%
1000 2%
Then E[L | L ≥ 500] = {(500)(8%) + (1000)(2%)}/10% = 600.

In contrast, E[L | L > 500] = 1000.
Neither 600 or 1000 is the average of the 5% worst outcomes. Thus neither is used for TVaR0.95.
Rather we compute TVaR0.95 by averaging the 5% worst possible outcomes:27
TVaR0.95 = {(500)(3%) + (1000)(2%)} / 5% = 700.
In general, in order to calculate TVaRp :28

(1) Take the 1 - p worst outcomes.
(2) Average over these worst outcomes.
27
The worst 2% is 1000. The next worst 3% is 500.
Although there is a total of 8% probability on 500, we only wish to have 3% + 2% = 5% worst outcomes.
28
This is equivalent to what had been done in the case of a continuous aggregate distribution.
TVaR Versus VaR:
Exercise: The aggregate losses are uniform from 0 to 100. Determine VaR95% and TVaR95%.
[Solution: VaR95% = π0.95 = 95. TVaR95% = (100 + 95)/2 = 97.5.]
Since TVaRp (X) ≡ E[X | X > πp ], TVaRp (X) ≥ VaRp (X). 29

Unlike VaRp, TVaRp is affected by the behavior in the extreme righthand tail of the distribution.
Exercise: The aggregate losses are a two-component splice between a uniform from 0 to 95, and
a uniform from 95 to 200, with 95% weight to the first component of the splice.
Determine VaR95% and TVaR95%.
[Solution: VaR95% = π0.95 = 95. TVaR95% = (200 + 95)/2 = 147.5.]
For a heavier-tailed distribution, TVaRp can be much larger than VaRp . 30
Exercise: The aggregate losses are a two-component splice between a uniform from 0 to 95, and
above 95 a density proportional to a Pareto with α = 3 and θ = 300, with 95% weight to the first
component of the splice. Determine VaR95% and TVaR95%.
[Solution: VaR95% = π0.95 = 95. Above 95 the density of the splice is proportional to a Pareto
∞
Distribution, let us say c fPareto(x). e(95) = ∫ (x - 95) c fPareto (x) dx / {c SPareto(x)} =
95
ePareto(95). For a Pareto with α = 3 and θ = 300, e(x) = (x + 300)/(3 - 1). e(95) = 395/2 = 197.5.
TVaR95% = 95 + e(95) = 95 + 197.5 = 292.5.
Comment: In this and the previous exercise, the 95% Values at Risk are the same, even though
the distribution in this exercise has a larger probability of extremely bad outcomes such as 300.]
TVaR95% is the average of the worst 5% of the outcomes. For a heavier-tailed distribution such as
a Pareto with an increasing mean excess loss, the average of the worst 5% of outcomes will be
significantly bigger than the 97.5th percentile or Var97.5%.31
29
Only in very unusual situations would the two be equal.
30
A heavier-tailed distribution has f(x) go to zero more slowly as x approaches infinity. The Pareto and LogNormal
are examples of heavier-tailed distributions. See “Mahlerʼs Guide to Loss Distributions.”
31
For a uniform distribution, TVaR95% would equal VaR97.5%.
Exercise: For a Pareto Distribution with α = 2 and θ = 1000, compute TVaR95% and VaR97.5%.
[Solution: VaR95% = θ {(1-p)-1/α - 1} = (1000) {(1 - 0.95)-1/2 - 1} = 3472.
θ (1- p) - 1/ α
TVaR95% = VaR95% + = 3472 + (1000) (1 - 0.95)-1/2 / (2 - 1) = 7944.
α - 1
VaR97.5% = (1000) {(1 - 0.975)-1/2 - 1} = 5324.
Comment: As expected TVaR95% is significantly more than VaR97.5%.
F(7994) = 1 - (1000/8944)2 = 98.7%. Thus in this case, TVaR95% = VaR98.7%.]
Exercise: For a LogNormal Distribution with µ = 6 and σ = 1, compute TVaR95% and VaR97.5%.
[Solution: VaR95% = Exp[6 + (1.645)(1)] = 2090. E[X] = exp[6 + 12 /2] = 665.
E[X ∧ 2090] = (665) Φ[(ln2090 - 6 - 12 )/1] + (2090) {1 - Φ[(ln2090 - 6)/1]}

= (665)Φ[0.65] + (2090){1 - Φ[1.645]} = (665)(0.7422) + (2090)(0.05) = 598.
TVaR95% = VaR95% + e(VaR95%) = 2090 + (665 - 598) / 0.05 = 3430.
VaR97.5% = Exp[6 + (1.960)(1)] = 2864.
Comment: As expected TVaR95% is significantly more than VaR97.5%.
F(3430) = Φ[(ln3430 - 6)/1] = Φ[2.1] = 98.3%. Thus in this case, TVaR95% = VaR98.3%.]
Here is a comparison of VaR and TVaR for this LogNormal Distribution with µ = 6 and σ = 1:
7000
6000
5000
TVaR
4000
3000
VaR
2000
p
0.90 0.92 0.94 0.96 0.98
Expected Policyholder Deficit:32
Define the Expected Policyholder Deficit as: (1-p) (TVaRp - VaRp ) = (1-p) e(πp ).
Looking in the Appendix A of the Tables attached to the exam, for the Pareto Distribution:
θ (1- p) - 1/ α
TVaRp = VaRp + , α > 1.
α - 1
θ
Therefore, the Expected Policyholder Deficit is: (1 - p)1-1/α, α > 1.
α - 1
Exercise: For a Pareto Distribution with α = 2 and θ = 1000,

determine the Expected Policyholder Deficit for p = 95%.
[Solution: {1000/(2-1)} (0.05)1-1/2 = 224.
Alternately, from a previous exercise: VaR95% = 3472, and TVaR95% = 7944.
Thus the Expected Policyholder Deficit is: (1 - 0.95) (7944 - 3472) = 224.]
Derivative of TVaR:
Let G(x) = E[X | X > x] = x + e(x) = x + ∫ S(t) dt /S(x).

x
∞ ∞
Then dG/dx = 1 - S(x)/S(x) + f(x) ∫ S(t) dt /S(x)2 = {f(x)/S(x)} ∫ S(t) dt /S(x) = h(x) e(x).
x x
d E[X | X > x]
= h(x) e(x).33
dx
d E[X | X > x]
> 0, and as expected E[X | X > x] is an increasing function of x.
dx
For example, for a Pareto with parameters α and θ, e(x) = (x + θ)/(α - 1), and h(x) = α / (θ+x).
d E[X | X > x]
Therefore, for a Pareto, = h(x) e(x) = α/(α - 1), for α > 1.34
dx
32
Not on the syllabus of this exam. See for example the syllabus of CAS Exam 7.
33
See Exercise 3.37 in Loss Models. h(x) is the hazard rate.
34
For the Pareto: E[X | X > x] = x + e(x) = x + (x + θ)/(γ - 1).
d E[X | X > x]
Exercise: For an Exponential Distribution, determine .
dx
d E[X | X > x]
[Solution: e(x) = θ, and h(x) = 1/θ. = h(x) e(x) = 1.
dx
Comment: For the Exponential: E[X | X > x] = x + e(x) = x + θ.]
TVaRp (X) = E[X | X > πp ] = G(πp ).
Therefore, by the Chain Rule, TVaRp /dp = h(πp ) e(πp ) d πp /dp.
For example, for a Pareto with parameters α and θ,
p = F(πp ) = 1 - {θ/(πp + θ)}1/α. ⇒ πp = θ{(1 - p)-1/α - 1}.
Therefore, for a Pareto, with the shape parameter α > 1,
TVaRp /dp = h(πp ) e(πp ) d πp /dp = {α/(α - 1)} (θ/α)(1 - p)-(1+1/α) = {θ/(α - 1)}(1 - p)-(1+1/α).35
Exercise: For an Exponential Distribution, determine TVaRp /dp.

[Solution: p = F(πp ) = 1 - exp[-πp /θ]. ⇒ πp = -θ ln[1 - p].
TVaRp /dp = h(πp ) e(πp ) d πp /dp = (1) θ/(1 - p) = θ/(1 - p).
Comment: For the Exponential: TVaRp = πp + e(πp ) = -θ ln[1 - p] + θ.]
Since πp is an increasing function of p, d πp /dp > 0.
Thus, TVaRp /dp = h(πp ) e(πp ) d πp /dp > 0, and as expected TVaRp is an increasing function of p.
35
As discussed previously, for the Pareto: TVaRp = θ{(1 - p)−1/α α/(α - 1) - 1}.
Normal Distribution:36
For a Normal Distribution, the pth percentile is: µ + σ zp ,
where zp is the pth percentile of the Standard Normal.
Exercise: For a Normal Distribution with µ = 100 and σ = 20, determine VaR0.95[X].
[Solution: The 95th percentile of the Standard Normal is 1.645.
VaR0.95[X] = 100 + (20)(1.645) = 132.9.]
As derived below, TVaRp [X] = µ + σ φ[zp ] / (1 - p).
Exercise: For a Normal Distribution with µ = 100 and σ = 20, determine TVaR0.95[X].
[Solution: φ[zp ] = φ[1.645] = exp[-1.6452 /2] / 2 π = 0.10311.

TVaR0.95[X] = 100 + (20)(0.10311)/(1 - 0.95) = 141.24.
Comment: Note that TVaR0.95[X] > VaRp [X].]
For the Standard Normal:

∞ ∞ ∞
∫x x φ(x) dx =
∫x x exp[-x2 / 2] / 2π dx = -exp[-x2 / 2] / 2 π ] = exp[-x2 /2] / 2 π = φ(x).
x
For the nonstandard Normal:

∞ ∞ ∞
∫x x f(x) dx = ∫x x φ[(x - µ) / σ] / σ dx = (x -∫µ)/ σ (σy + µ) φ[y] dy =

∞ ∞
σ ∫ y φ[y] dy + µ ∫ φ[y] dy = σ φ[(x-µ)/σ] + µ(1 - Φ[(x-µ)/σ]).
(x - µ)/ σ (x - µ)/ σ
∞
TVaRp [X] = ∫π x f(x) dx / (1 - p) = σ φ[(πp-µ)/σ] / (1 - p) + µ {1 - Φ[(πp-µ)/σ]} / (1 - p) =
p
σ φ[zp ] / (1 - p) + µ(1 - Φ[zp ]) / (1 - p) = σ φ[zp ] / (1 - p) + µ(1 - p)/(1 - p) = µ + σ φ[zp ] / (1 - p).

36
TVaR with a Limit:
If there is a limit of u on the amount paid, then the distribution is censored from above at u.37
Then for x ≤ u, the losses excess of x are: E[X ∧ u] - E[X ∧ x].
Thus the mean excess loss for x ≤ u is: e(x) = (E[X ∧ u] - E[X ∧ x]) / S(x).
Exercise: Unlimited losses follow an Exponential with mean 10,000.

There is a 25,000 policy limit. Determine VAR90%.
[Solution: -θ ln(1-p) = (-10,000) ln[1 - 0.90] = 23,026.
Comment: Note that 23,026 is less than the policy limit of 25,000.]

There is a 25,000 policy limit. Determine TVAR90%.
[Solution: E[X ∧ 23,026] = (10,000) (1 - e-23,026/10,000) = 9000.
E[X ∧ 25,000] = (10,000) (1 - e-25,000/10,000) = 9179.
e(23,026) = (9179 - 9000) / 0.1 = 1790.
TVAR90% = π90% + e(π90%) = 23,026 + 1790 = 24,816.]

There is a 25,000 policy limit. Determine TVAR95%.
[Solution: -θ ln(1-p) = (-10,000) ln[1 - 0.95] = 29,957 > 25,000.
Thus after censoring from above, all of the 5% worst outcomes are 25,000.
Thus the average of the 5% worst outcomes is 25,000.
TVAR95% = 25,000.]
37
For example, the aggregate annual losses an insurer pays may be limited due to a reinsurance treaty.
Generalizations of TVaR:38
Let us assume for example, we model the aggregate annual losses from hurricanes for the
Southeastern Insurance Company. Then the one-in-20 year event is the estimate of the 95th
percentile of the distribution of aggregate losses. Similarly the one-in-100 year event is the 99th
percentile.
The 95th percentile is the 95%-Value at Risk.

The Value at Risk, VaRp , is defined as the 100pth percentile.
one-in-100 year event. ⇔ VaR99%.
one-in-200 year event. ⇔ VaR99.5%.
Assume the following output from a hurricane model for the Southeastern Insurance Company:39
Aggregate Annual Losses ($ million) Probability Distribution Function

Less than 250 98.9% 98.9%
250 0.1% 99.0%
260 0.1% 99.1%
270 0.1% 99.2%
280 0.1% 99.3%
290 0.1% 99.4%
300 0.1% 99.5%
320 0.1% 99.6%
350 0.1% 99.7%
400 0.1% 99.8%
500 0.1% 99.9%
550 0.02% 99.92%
600 0.02% 99.94%
650 0.02% 99.96%
700 0.02% 99.98%
750 0.02% 100%
Then VaR99% = $250 million, VaR99.5% = $300 million, and VaR99.9% = $500 million.
A key problem with using the Value at Risk is that for the selected percentile it does not depend
on the heaviness of the righthand tail beyond that percentile.
38
Not on the syllabus of this exam.
39
I have assumed these discrete possibilities for simplicity.
If for example one simulated 50,000 years, then probabilities would be in increments of 0.002%.
The Tail Value at Risk overcomes this shortcoming of Value at Risk.40

TVaRp (X) ≡ E[X | X > πp ], where πp is the pth percentile.
When working with a discrete distribution as here, it is common and more convenient to define
TVaR with a greater than or equal sign: TVaRp (X) ≡ E[X | X ≥ πp ].
TVaR99.5% = E[X | X ≥ 99.5th percentile] =

{(0.1%)(300 + 320 + 350 + 400 + 500) + (0.02%)(550 + 600 + 650 + 700 + 750)} / 0.6% =
$420 million.
Exercise: Determine TVaR99.9%.

[Solution: E[X | X ≥ 99.9th percentile] =
{(0.1%)(500) +(0.02%) (550 + 600 + 650 + 700 + 750)} / 0.2% = $575 million.]
“While TVaR is an improvement over the VaR measure, it may not be necessary to take into
account the few events that make up the remote tail of the loss distribution for reinsurance pricing
decisions. Most insurers do not find it economically feasible to protect against such events and as a
result do not purchase catastrophe reinsurance for these extreme disaster scenarios.
Another risk measure to consider is called window value at risk, WVaR. This measure takes the
probability-weighted average of a distribution of losses within a range of practical bounds, or
percentiles, and can be mathematically described by the following formula:
WVaRp,q(X) = E[X | πp ≤ X ≤ πq ].
WVaR is less sensitive to changes in the tail resulting from updates to the underlying model or
exposure data and generally exhibits less volatility than other risk measures as a result. Most
importantly, risk managers can apply practical bounds that represent the appropriate range of
losses to support their risk management needs. This eliminates the need to reflect extreme losses
at the tail of the distribution in decision-making processes.”41
WVaR99%,99.5% = (250 + 260 + 270 + 280 + 290 + 300)/6 = $275 million.
Exercise: Determine WVaR99.5%,99.9%.

[Solution: E[X | 99.9th percentile ≥ X ≥ 99.5th percentile] =
(300 + 320 + 350 + 400 + 500) / 5 = $374 million.]
40
Also as discussed in a subsequent section, TVaR is a coherent risk measure, while VaR is not.
41
Quoted from “Modeling Fundamentals: Evaluating Risk Measures,” AIR Worldwide, by David Lalonde and Alissa
Legenza. When working with discrete distributions as here, it is more convenient to use less than or equal to
rather than strict inequality in the definition.
“While window value at risk does improve over tail value at risk by ignoring the extreme loss
behavior that exists in the tail of the distribution, is this the ideal approach? Perhaps the ultimate
technique for measuring exposure to catastrophic risk could be accomplished by restricting, rather
than ignoring, the tail domain of the loss distribution. Simulated event losses that fall outside of the
range of what might be considered economically feasible to protect against could be censored, or
capped, prior to applying the same conditional expectation methods used to calculate the tail value
at risk. This special case is a censored tail value at risk, or CenTVaR, and can be derived by the
following formula:
CenTVaRp,q(X) = E[Min[X, πq ] | πp ≤ X ]
E[X | π p ≤ X ≤ πq] Prob[πp ≤ X ≤ πq] + πq Prob[X > πq]
= ,
Prob[X ≥ πp]
where πq = censored event loss value.”42
Note that as q approaches 100%, CenTVaR approaches TVaR.
CenTVaR99%,99.5% = {(0.1%)(250 + 260 + 270 + 280 + 290 + 300) + (0.5%)(300)} / 1.1%

= $286.4 million.
Exercise: Determine CenTVaR99.5%,99.9%.

[Solution: E[ Min[X , 99.9th percentile] | X ≥ 99.5th percentile] =
{(0.1%)(300 + 320 + 350 + 400 + 500) + (0.1%)(500)} / 0.6% = $395 million.]
42
Quoted from “Modeling Fundamentals: Evaluating Risk Measures,” by David Lalonde and Alissa Legenza
I have corrected their formula for CenTVaR.
q > p.
Problems:
4.1 (2 points) What is the TVaR0.95 for an Exponential Distribution with mean 100?
A. 400 B. 425 C. 450 D. 475 E. 500
4.2 (3 points) Losses are Normal with µ = 300 and σ = 10.

Determine the 90% Tail Value at Risk.
Hint: For the Normal Distribution,
E[X ∧ x] = µ Φ[(x−µ)/σ] - σ φ[(x−µ)/σ] + x {1 - Φ[(x−µ)/σ]}.
A. less than 315
E. at least 330
4.3 (3 points) F(x) = 1 - {θ/(θ + x)}4 .

Calculate the Tail Value at Risk at a security level of 99%.
A. 2.6θ B. 2.8θ C. 3.0θ D. 3.2θ E. 3.4θ
4.4 (2 points) For an Exponential Distribution with mean θ, determine TVaRp - VaRp .
A. θ B. -θ ln(1 - p) C. θ - θ ln(1 - p) D. θ + θ ln(1/p) E. None of A, B, C, or D
4.5 (3 points) Losses follow a LogNormal Distribution with µ = 7 and σ = 0.8.

Determine TVaR0.995.
A. less than 12,000
E. at least 15,000

Annual aggregate losses have the following distribution:
0 50%
10 30%
20 10%
50 4%
100 2%
200 2%
500 1%
1000 1%
4.6 (1 point) Determine the 90% Tail Value at Risk.

A. 200 B. 210 C. 220 D. 230 E. 240
4.7 (1 point) Determine the 95% Tail Value at Risk.

A. 375 B. 400 C. 425 D. 450 E. 475

Losses follows a Single Parameter Pareto Distribution, with α = 6 and θ = 1000.
4.8 (1 point) Determine the 98% Value at Risk.

A. 1800 B. 1900 C. 2000 D. 2100 E. 2200
4.9 (2 points) Determine the 98% Tail Value at Risk.

A. 1900 B. 2000 C. 2100 D. 2200 E. 2300

• Frequency is Binomial with m = 500 and q = 0.3.
• Severity is LogNormal with µ = 8 and σ = 0.6.
Using the Normal Approximation, determine the 99% Tail Value at Risk for Aggregate Losses.
Hint: For the Normal Distribution, TVaRp (X) = µ + σ φ[Φ-1(p)] / (1 - p).
A. 655,000 B. 660,000 C. 665,000 D. 670,000 E. 675,000

For the aggregate losses, VaR0.9 is 1,000,000.
4.11 (2 points) John believes that the aggregate losses follow an Exponential Distribution.
Determine Johnʼs estimate of TVaR0.9.
4.12 (4 points) Paul believes that the aggregate losses follow a LogNormal Distribution
with σ = 0.6. Determine Paulʼs estimate of TVaR0.9.
4.13 (4 points) George believes that the aggregate losses follow a LogNormal Distribution
with σ = 1.2. Determine Georgeʼs estimate of TVaR0.9.
4.14 (3 points) Ringo believes that the aggregate losses follow a Pareto Distribution with α = 3.
Determine Ringoʼs estimate of TVaR0.9.

In the state of Windiana, a State Fund pays for losses due to hurricanes.
The worst possible annual amounts to be paid by the State Fund in millions of dollars are:
Amount Probability
100 3.00%
200 1.00%
300 0.50%
400 0.25%
500 0.10%
600 0.05%
700 0.04%
800 0.03%
900 0.02%
1000 0.01%
4.15 (2 points) Determine TVaR0.95 in millions of dollars.

A. 120 B. 140 C. 160 D. 180 E. 200
4.16 (2 points) Determine TVaR0.99 in millions of dollars.

A. 400 B. 450 C. 500 D. 550 E. 600

Losses follow a mixture of two Exponential Distributions,
with means of 1000 and 2000, and with weights of 60% and 40% respectively.
4.17 (2 points) Determine the 95% Value at Risk.

A. 3000 B. 3500 C. 4000 D. 4500 E. 5000

A. 5700 B. 6000 C. 6300 D. 6600 E. 6900

f(x) = 0.050 for 0 ≤ x ≤ 10, f(x) = 0.010 for 10 < x ≤ 50, and f(x) = 0.002 for 50 < x ≤ 100.
4.19 (1 point) Determine the 80% Value at Risk.

A. 35 B. 40 C. 45 D. 50 E. 55

A. 50 B. 55 C. 60 D. 65 E. 70
4.21 (2 points) F(x) = (x/10)4 , 0 ≤ x ≤ 10.

Determine TVaR0.90.
A. less than 9.70
E. at least 9.85
4.22 (2 points) For a Normal Distribution with µ = 10 and σ = 3, determine TVaR95%.

A. less than 14
E. at least 17
4.23 (3 points) f(x) = 0.0008 for x ≤ 1000, and f(x) = 0.0004 exp[2 - x/500] for x > 1000.
Determine the 95% Tail Value at Risk.
A. 2200 B. 2300 C. 2400 D. 2500 E. 2600
4.24 (2 points) For each of the following densities, determine VaR0.90 and TVaR0.90.
(a) f(x) = 0.09 for 0 ≤ x ≤ 10, f(x) = 0.02 for 10 < x ≤ 15.
(b) f(x) = 0.09 for 0 ≤ x ≤ 10, f(x) = 0.0005 for 10 < x ≤ 210.
4.25 (3 points) Unlimited aggregate annual losses follow a Pareto Distribution with α = 3 and
θ = 60,000.
However, due to reinsurance, the insurer will never pay more than 100,000 in a year.
Determine TVaR90%.
A. 88,000 B. 89,000 C. 90,000 D. 91,000 E. 92,000

The annual continuously compounded returns on a stock index fund are Normal with mean 10%
and standard deviation 25%.
4.26 (2 points) Determine the average of the 1% worst annual returns for an investor.
A. -57% B. -59% C. -61% D. -63% E. -65%
4.27 (3 points) Jessica invests $10,000 in this stock index fund.

The amount Jessica has after one year is: 10,000 ereturn.
Jessica is worried about the 10% worst outcomes over the next year.
Determine Jessicaʼs average loss for these bad outcomes.
A. 2600 B. 2800 C. 3000 D. 3200 E. 3400
4.28 (2 points) For an Exponential Distribution, solve for q such that: TVaRp = VaRq .
4.29 (4 points) Annual aggregate losses for ABC insurer follow a LogNormal Distribution with
µ = 19 and σ = 1.4.
ABC insurer buys stop loss insurance from XYZ reinsurer with an aggregate deductible of
1000 million. XYZ will pay ABC: (annual aggregate - 1000 million)+.
Determine the 95% Tail Value at Risk for XYZ reinsurer.
A. 2200 million B.2400 million C. 2600 million D. 2800 million E. 3000 million
4.30 (3 points) Ground-up unlimited aggregate annual losses follow a Pareto Distribution with
α = 2.25 and θ = 150 million.
Let X be the annual aggregate loss.
⎧ 0 if X < 20 million
⎪
Then a reinsurer will pay: ⎨X - 20 million if 20 million ≤ X ≤ 300 million
⎪ 280 million if X > 300 million
⎩
For the reinsurer, determine TVaR80%.

4.31 (3 points) For 1 > q > p > 0, define the censored tail value at risk, or CenTVaR, as follows:
CenTVaRp,q(X) = E[Min[X, πq ] | πp ≤ X ]
E[X | π p ≤ X ≤ πq] Prob[πp ≤ X ≤ πq] + πq Prob[X > πq]
= .
Prob[X ≥ πp]
Derive the form of CenTVaR for an Exponential Distribution with mean θ.
4.32 (IOA CT8, 9/09, Q. 3) (9 points) A small bank wishes to improve the performance of its
investments by investing £1m in high returning assets. An investment bank has offered the bank
two possible investments:
Investment A: A diversified portfolio of shares and derivatives which can be assumed to produce
a return of £R1 million where R1 = 0.1 + N, where N is a normal N(1,1) random variable.
Investment B: An over-the-counter derivative which will produce a return of £R2 million where the
investment bank estimates:
⎧1.5 with probability 0.99
R2 = ⎨
⎩-5.0 with probability 0.01
The chief executive of the bank says that if one investment has a better expected return and a
lower variance than the other then it is the best choice.
(i) (4.5 points)
(a) Calculate the expected return and variance of each investment A and B.
(b) Discuss the chief executiveʼs comments in the light of your calculations.
(ii) (1.5 points) Calculate the following risk measures for each of the two investments A and B:
(a) probability of the returns falling below 0.
(b) probability of the returns falling below -2.
(iii) (3 points)
(a) Define other suitable risk measures that could be calculated.
(b) Discuss what these risk measures would show.
4.33 (CAS7, 5/13, Q.19a&f) (2 points)

Given the following information for an insurance company:
Losses for earthquake ($ million) Probability
0 91%
200 7%
400 2%
Losses for coastal property ($ million) Probability

0 80%
250 20%
Losses for these two perils are independent.

Calculate the Tail value-at-risk (TVaR) at the 95% level for earthquake.
Calculate the Tail value-at-risk (TVaR) at the 95% level for coastal property.
Calculate the Tail value-at-risk (TVaR) at the 95% level for both perils combined.
4.1. A. Set 0.95 = 1 - exp[-π0.95/100]. ⇒ π0.95 = -(100)ln(.05) = 299.6.
For the Exponential, e(x) = θ = 100. TVaR0.95 = π.95 + e(π0.95) = 299.6 + 100 = 399.6.
As shown in Appendix A: TVaRp (X) = -θ ln(1-p) + θ = -(100)ln(.05) + 100 = 399.6.

Comment: See Example 3.16 in Loss Models.
4.2. B. 0.90 = F(x) = Φ[(x - 300)/10]. ⇒ (x - 300)/10 = 1.282
⇒ x = 300 + (10)(1.282) = 312.82.

φ[(312.82 - µ)/σ] = φ[1.282] = exp[-1.2822 /2]/ 2 π = 0.1754.
Φ[(312.82 - µ)/σ] = 0.9.
E[X ∧ x] = µ Φ[(x−µ)/σ] - σ φ[(x−µ)/σ] + x {1 - Φ[(x−µ)/σ]}.

E[X ∧ 312.82] = (300)(0.9) - (10)(0.1754) + (312.82)(1 - 0.9) = 299.53.
e(312.82) = (E[X] - E[X ∧ 312.82]) / (1 - 0.9) = (300 - 299.53)/0.1 = 4.7.
TVaRp = πp + e(πp ) = 312.82 + 4.7 = 317.5.
Alternately, for a Normal Distribution, TVaRp [X] = µ + σ φ[zp ] / (1 -p) = 300 + (10)φ[1.282] =
300 + (10) exp[-1.2822 /2]/ 2 π = 317.5.

4.3. D. For the Pareto Distribution, .99 = 1 - {θ/(θ + π.99)}4 . ⇒ π.99 = θ(100.25 - 1) = 2.1623θ.
TVaRp = πp + (E[X] - E[X ∧ πp ])/(1 - p) = πp + e(πp ).
TVaR0.99 = 2.1623θ + e(2.1623θ) = 2.1623θ + (2.1623θ + θ)/(4 - 1) = 3.2164θ.
As shown in Appendix A, for a Pareto Distribution with parameters α and θ, α > 1:

-1/ α -1/ α
θ (1 - p) θ (1 - p)
TVaRp (X) = VaRp (X) + = θ [(1-p)-1/α - 1] + =
α- 1 α- 1
⎧ - 1/ α α ⎫
θ ⎨ (1 - p) - 1⎬ .
⎩ α- 1 ⎭
With α = 4, TVaR0.99(X) = θ [(1%)-0.25 - 1] + θ (1%)-0.25 / 3 = 2.1623 θ + 1.0541 θ = 3.2164 θ.

For the Pareto Distribution, e(x) = (x + θ)/(α - 1), α > 1.
TVaRp , the Tail Variance at Risk as a function of p, for F(x) = 1 - {θ/(θ + x)}4 :
TVaR
600
500
400
300
200
100
p
0.8 0.9 0.95 0.99
4.4. A. TVaRp = πp .+ e(πp ). TVaRp − πp = e(πp ) = θ.
As shown in Appendix A: VaRp (X) = -θ ln(1-p). TVaRp (X) = -θ ln(1-p) + θ.
TVaRp (X) - VaRp (X) = θ.
Comment: For an Exponential Distribution, e(x) = θ.

4.5. A. E[X] = exp[7 + 0.82 /2] = 1510.

0.995 = F(x) = Φ[(lnx - 7)/0.8]. ⇒ (lnx - 7)/0.8 = 2.576.
⇒ x = exp[7 + (0.8)(2.576)] = 8611. 99.5th percentile of the LogNormal is 8611.

E[X ∧ 8611] = (1510)Φ[(ln8611 - 7 - 0.82 )/0.8] + (8611){1 - Φ[(ln8611 - 7)/0.8]}
= (1510)Φ[1.78] + (8611){1 - Φ[2.58]} = (1510)(.9625) + (8611)(.0049) = 1496.
TVaR0.995 = π0.995 + (E[X] - E[X ∧ π.995])/(1 - 0.995) = 8611 + (1510 - 1496)/0.0050
= 11,411.
4.6. D. Average the 10% worst possible outcomes:

TVaR0.90 = {(4%)(50) + (2%)(100) + (2%)(200) + (1%)(500) + (1%)(1000)}/10% = 230.
4.7. B. Average the 5% worst possible outcomes:

TVaR0.95 = {(1%)(100) + (2%)(200) + (1%)(500) + (1%)(1000)}/5% = 400.
4.8. B. F(X) = 1 - (θ/x)α. .98 = 1 - (1000/π.98)6 . ⇒ π.98 = 1919.
As shown in Appendix A: VaRp = θ (1-p)−1/α.
VaR0.98 = (1000) (0.02)-1/6 = 1919.
4.9. E. E[X ∧ x] = θ {α - (θ/x)α−1} / (α - 1).

E[X ∧ 1919] = (1000){6 - (1000/1919)5 }/(6 - 1) = 1192.315.
E[X] = θ α / (α - 1) = (1000)(6/5) = 1200.
TVaR0.98 = π0.98 + (E[X] - E[X ∧ π0.98])/(1 - .98) = 1919 + (1200 - 1192.315)/.02 = 2303.
∞
Alternately, f(x) = 6 x 108 / x7 . TVaR0.98 = ∫ x f(x) dx /0.02 = 2303.
1919
α θ (1-p)-1 / α
As shown in Appendix A: TVaRp = , for α > 1.
α -1
TVaR0.98 = (6) (1000) (0.02)-1/6 / 5 = 2303.
Comment: For a Single Parameter Pareto, with parameters α and θ, πp = θ /(1 - p)1/α.
E[X] - E[X ∧ πp ] = θ (θ/πp )α−1 / (α - 1) = θ {(1 - p)1/α}α−1 / (α - 1) = θ (1 - p)1 - 1/α / (α - 1).
TVaRp = πp + (E[X] - E[X ∧ πp ])/(1 - p) = {θ /(1 - p)1/α} {α/(α - 1)} = {α/(α - 1)} πp .
4.10. B. The mean severity is: exp[8 + 0.62 /2] = 3569.

The second moment of severity is: exp[(2)(8) + (2)(0.62 )] = 18,255,921.
The variance of severity is: 18,255,921 - 35692 = 5,518,160.
The mean frequency is: (500)(0.3) = 150.
The variance of frequency is: (500)(0.3)(0.7) = 105.
The mean aggregate loss is: (150)(3569) = 535,350.
The variance of aggregate loss is: (150)(5,518,160) + (35692 )(105) = 2,165,188,905.
Approximate by a Normal Distribution with µ = 535,350 and σ = 2,165,188,905 = 46,532.
φ[Φ-1(p)] = φ[Φ-1(99%)] = φ[2.326] = exp[-2.3262 /2] / 2 π = 0.02667.
TVaR95%(X) = 535,350 + (46,532)(0.02667) / 0.01 = 659,451.
Comment: VaR0.99 = 535,350 + (2.326)(46,532) = 643,583.
The formula for TVaR for the Normal Distribution is given the Example 3.15 in Loss Models.
4.11. For the Exponential Distribution, VaRp = -θ ln(1-p).
Thus 1,000,000 = -θ ln(1-0.9). ⇒ θ = 434,294.

For the Exponential Distribution, TVaRp = VaRp + θ = 1,000,000 + 434,294 = 1,434,294.
4.12. For the LogNormal, the distribution function at 1,000,000 is 0.9.

0.9 = Φ[{ln[1,000,000] - µ}/0.6]. ⇒ 1.282 = {ln[1,000,000] - µ}/0.6. ⇒ µ = 13.0463.
For this LogNormal, E[X] = exp[13.0463 + 0.62 /2] = 554,765.
⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤
E[X ∧ x] = exp(µ + σ2/2) Φ ⎢ ⎥ + x {1 - Φ ⎢ ⎥⎦ }
⎣ σ ⎦ ⎣ σ
E[X ∧ 1,000,000] = 554,765 Φ[0.682] + (1,000,000) {1 - Φ[1.282]}

= (554,765)(0.7517) + (1,000,000)(0.10) = 517,067.
e(1 million) = (554,765 - 517,067)/0.1 = 376,980.
TVaRp = 1,000,000 + 376,980 = 1,376,980.
4.13. For the LogNormal, the distribution function at 1,000,000 is 0.9.

0.9 = Φ[{ln[1,000,000] - µ}/1.2]. ⇒ 1.282 = {ln[1,000,000] - µ}/1.2. ⇒ µ = 12.2771.
For this LogNormal, E[X] = exp[12.2771 + 1.22 /2] = 441,132.
⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤
E[X ∧ x] = exp(µ + σ2/2) Φ ⎢ ⎥ + x {1 - Φ ⎢ ⎥⎦ }.
⎣ σ ⎦ ⎣ σ
E[X ∧ 1,000,000] = 441,132 Φ[0.082] + (1,000,000) {1 - Φ[1.282]}

= (441,132)(0.5319) + (1,000,000)(0.10) = 334,638.
e(1 million) = (441,132 - 334,638)/0.1 = 1,064,940.
TVaRp = 1,000,000 + 1,064,940 = 2,064,940.
Comment: The Tail Value at Risk depends on which form of distribution one assumes.
Even assuming a LogNormal Distribution, the Tail Value at Risk depends on σ:
TVaR ($ million)
4.0
3.5
3.0
2.5
2.0
1.5
sigma
0.5 1.0 1.5 2.0
The bigger σ, the heavier the righthand tail and thus the larger TVaR, all else being equal.
4.14. For the Pareto Distribution, VaRp = θ {(1-p)-1/α - 1}.
Thus 1,000,000 = θ {(1-0.9)-1/3 - 1}. ⇒ θ = 866,225.

θ (1- p) - 1/ α
For the Pareto Distribution, TVaRp = VaRp +
α - 1
= 1,000,000 + (866,225)(1-0.9)-1/3 / (3 -1) = 1,933,113.
Comment: The Tail Value at Risk depends as a function of α:
TVaR ($million)
3.5
3.0
2.5
2.0
alpha
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
The smaller α, the heavier the righthand tail and thus the larger TVaR, all else being equal.
4.15. D. What is shown here is the 5% worst outcomes. Their average is:
{(100)(3%) + (200)(1.00%) + (300)(0.50%) + (400)(0.25%) + (500)(0.10%) + (600)(0.05%) +
(700)(0.04%) + (800)(0.03%) + (900)(0.02%) + (1000)(0.01%)}/5% = 182 million.
4.16. A. The average of the worst 1% of outcomes is:

{(300)(0.50%) + (400)(0.25%) + (500)(0.10%) + (600)(0.05%) +
(700)(0.04%) + (800)(0.03%) + (900)(0.02%) + (1000)(0.01%)}/1% = 410 million.
4.17. D. We wish to find where the survival function is 5%.

0.05 = 0.6 exp[-x/1000] + 0.4 exp[-x/2000].
5 exp[x/1000] - 60 - 40 exp[x/2000] = 0.
Let y = exp[x/2000]. Then y2 - 8y - 12 = 0
8 ± 64 + 48
⇒y= = 9.292, taking the positive root.
2
⇒ exp[x/2000] = 9.292. ⇒ x = 4458.
4.18. C. The mean of the mixture is: (60%)(1000) + (40%)(2000) = 1400.

The limited expected value of the mixture at 4458 is:
(60%)(1000)(1 - e-4458/1000) + (40%)(2000)(1 - e-4458/2000) = 1306.94
e(4458) = (1400 - 1306.94)/0.05 = 1861. TVaR95% = 4458 + e(4458) = 4458 + 1861 = 6319.
4.19. B. & 4.20. C. The density is constant on each interval.

Integrate the density over the first interval from 0 to 10, and get F(10) = 0.5.
Integrate the density from 10 to 50 and get 0.4. This implies that F(50) = 0.5 + 0.4 = 0.9.
By integrating the density on the 2nd interval:
F(x) = 0.5 + (x-10)(0.010) = 0.4 + 0.01x, 10 ≤ x ≤ 50.
Set F(x) = 0.8. ⇒ 0.8 = 0.4 + 0.01x. ⇒ x = 40. The 80th percentile is 40.
50 100
The 80% Tail Value at Risk is: E[X | X > 40] = { ∫40 x 0.01 dx + 50∫ x 0.002 dx } / 0.2
= (4.5 + 7.5) / 0.2 = 60.
50 100
Alternately, e(40) = E[(X - 40) | X > 40] = { ∫40 (x - 40) 0.01 dx + 50∫ (x - 40) 0.002 dx } / 0.2 =
(0.5 + 3.5) / 0.2 = 20. ⇒ The 80% Tail Value at Risk is: 40 + e(40) = 60.
4.21. E. 0.90 = (x/10)4 . ⇒ 90th percentile = 9.74.

f(x) = 4 x3 / 10,000, 0 ≤ x ≤ 10.
10
⇒ TVaR0.90 = ∫ x 4x3 / 10,000 dx / 0.1 = 9.873.
9.74
4.22. D. For the Normal Distribution: TVaRp [X] = µ + σ φ[zp ] / (1 - p).

φ[1.645] = exp[-1.6452 /2] / 2 π = 0.10311.
TVaR95% = 10 + (3)φ[1.645] / 0.05 = 10 + (60)(0.10311) = 16.19.
4.23. A. f(x) = 0.0008 for x ≤ 1000. ⇒ F(1000) = 0.8.

To find the 95th percentile:
x
0.95 = 0.8 + ∫ 0.0004 exp[2 - t / 500] dt = 0.8 + (0.2) (1 - exp[2- x /500]).
1000
⇒ 0.25 = exp[2- x /500]. ⇒ x = 1693.

∞ ∞
⇒ TVaR95%[X] = ∫ 0.0004 exp[2 - t / 500] t dt / 0.05 = 0.008 e2
∫ e- t / 500 t dt =
1693 1693
t= ∞
0.008 e2 (-500te - t / 500 - 2
500 e - t / 500 )] = (0.008) e2 (37,110) = 2194.
t = 1693
4.24. (a) F(10) = 0.9. Thus VaR0.90 = 10. TVaR0.90 = (10+15)/2 = 12.5.
(b) F(10) = 0.9. Thus VaR0.90 = 10. TVaR0.90 = (10+210)/2 = 110.
Comment: Even though the values at risk are the same, the tail values of risk are significantly
different. Unlike Value at Risk, Tail Value at Risk takes into account how heavy the extreme
righthand tail is.
4.25. E. VAR90% = θ {(1 - p)−1/α - 1} = (60,000)(0.1-1/3 - 1) = 69,266.
⎛ 60,000 ⎞ 3-1
E[X ∧ 69,266] = {60,000/(3-1)} {1 - ⎜ ⎟ } = 23,537.
⎝ 60,000 + 69,266⎠
⎛ 60,000 ⎞ 3-1
E[X ∧ 100,000] = {60,000/(3-1)} {1 - ⎜ ⎟ } = 25,781.
⎝ 60,000 + 100,000⎠
e(69,266) = (25,781 - 23,537) / 0.1 = 22,440.

TVaR90% = π90% + e(π90%) = 69,266 + 22,440 = 91,706.
4.26. A. If we work with -1 times the returns, we want TVaR99%.

The minus returns have mean -10% and standard deviation 25%. z0.99 = 2.326.
TVaRp = µ + σ φ[zp ] / (1 - p) = -10% + 25% φ(2.326) / 0.01
= -0.1 + 25 exp[-2.3262 /2] / 2 π = 0.567.

Thus the average of the 1% worst returns for an investor is: -56.7%.
Alternately, for the original Normal Distribution:
TVaR99% = 0.1 + 25 exp[-2.3262 /2] / 2 π = 0.767.
Since the Normal is symmetric around its mean,
the average of the 1% worst returns for an investor is: 0.1 - (0.767 - 0.1) = -56.7%.
4.27. B. The natural log of the amount Jessica has after one year is:
ln[10,000] + return, which is Normal with µ = ln[10,000] + 0.1, and σ = 0.25.
Thus the amount she has after one year is LogNormal, with µ = 9.31034 and σ = 0.25.
The 10% percentile is: exp[9.31034 - (1.282)(0.25)] = 8021.
8021
The average of her 10% worst amounts after one year is: ∫0 x f(x) dx / 0.1.
8021
∫0 x f(x) dx = E[X ∧ 8021] - 8021 S(8021).
⎡ ln(x) - µ - σ2 ⎤ ⎡ ln(x) - µ ⎤
E[X ∧ x] = exp(µ + σ2/2) Φ ⎢ ⎥ + x {1 - Φ ⎢ ⎥⎦ }.
⎣ σ ⎦ ⎣ σ
⎡ ln(x) - µ - σ2 ⎤
Thus, E[X ∧ x] - x S(x) = exp(µ + σ2/2) Φ ⎢ ⎥⎦ .
⎣ σ
E[X ∧ 8021] - 8021 S(8021) = exp[9.31034 + 0.252 /2] Φ[{ln(8021) - 9.31034 - 0.252 } / 0.25]
= (11,403) Φ[-1.53] = (11,403)(0.0630) = 718.4.
Thus the average of her worst 10% amounts after one year is: 718.4/0.1 = 7184.
Her average loss in these situations is: 10,000 - 7184 = $2816.
Comment: Looking at the dollar loss in value for an investor results in a different result for TVaR
than looking at percentage returns.
4.28. -θ ln(1-p) + θ = -θ ln(1-q). ⇒ ln(1-q) = ln(1-p) - 1. ⇒ 1 - q = (1-p)/e. ⇒ q = 1- (1-p)/e.

Comment: For example, for p = 90%: q = 1 - 0.1/2.71828 = 96.3%. TVaR90% = VaR96.3%.
TVaR90% is the average of the 10% worst outcomes.
For the uniform distribution, TVaR90% = VaR95%.
However, for the Exponential Distribution, the average of the 10% worst outcomes is greater than
the 95th percentile: TVaR90% > VaR95%.
4.29. D. The 95th percentile of ABCʼs unlimited distribution is:

exp[19 + (1.645)(1.4)] = 1786 million.
So we wish the average amount paid by XYZ when ABCʼs aggregate is more than 1786 million.
E[X] = exp[19 + 1.42 /2] = 475.6 million.
⎡ ln(x) - µ - σ2 ⎤ ⎡ ln(x) - µ ⎤
E[X ∧ x] = exp(µ + σ2/2) Φ ⎢ ⎥ + x {1 - Φ ⎢ ⎥⎦ }
⎣ σ ⎦ ⎣ σ
E[X ∧ 1786 million] = (475.6 million) Φ[{ln(1786 million) - 19 - 1.42 }/1.4] + (1786 million) (0.05) =
(475.6 million) Φ[0.25] + (1786 million) (0.05) =
(475.6 million) (0.5987) + (1786 million) (0.05) = 374.0 million.
The average size of aggregate losses greater than 1786 million is:
E[X] - {E[X ∧ 1786 million] - (1786 million) S(1786 million)}
=
S(1786 million)
(E[X] - (E[X ∧ 1786 million]) / 0.05 + 1786 million =
(475.6 million - 374.0 million) / 0.05 + 1786 million = 3818 million.
However, for each large aggregate loss, XYZ reinsurer pays 1000 million less than the amount of
the aggregate losses of ABC.
Thus, TVaR95% = 3818 million - 1000 million = 2818 million.
4.30. C. The 80th percentile of the Pareto is: θ {(1 - p)−1/α - 1} = (150) (0.2-1/2.25 - 1) = 156.722.
For 156.722 ≤ X ≤ 300, the payment is X - 20.
For X > 300, the payment is 280.
For the Pareto Distribution,
θ ⎧ ⎛ θ ⎞ α− 1⎫ ⎛ 150 ⎞ 1.25
E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬ = 120 {1 - }.
α −1 ⎩ ⎝ θ + x⎠ ⎭ ⎝ 150 + x ⎠
E[X ∧ 156.722] = 70.92. E[X ∧ 300] = 89.61.

The average value of X for 156.722 ≤ X ≤ 300 is:
E[X ∧ 300] - 300 S(300) - {E[X ∧ 156.722] - 156.722 S(156.722)}
.
F(300) - F(156.722)
The contribution to TVaR80% from X for 156.722 ≤ X ≤ 300 is:
E[X - 20 | 156.722 ≤ X ≤ 300] Prob[156.722 ≤ X ≤ 300] =
E[X | 156.722 ≤ X ≤ 300] Prob[156.722 ≤ X ≤ 300] - 20Prob[156.722 ≤ X ≤ 300] =
E[X ∧ 300] - 300 S(300) - E[X ∧ 156.722] + 156.722 S(156.722) - 20{S(156.722) - S(300)}
= E[X ∧ 300] - E[X ∧ 156.722] + 136.722 S(156.722) - 280 S(300).
The contribution to TVaR80% from X ≥ 300 is: 280 S(300).
Thus TVaR80% is: {E[X ∧ 300] - E[X ∧ 156.722] + 136.722 S(156.722)} / 0.2 =
{89.61 - 70.92 + (136.722)(0.2)} / 0.2 = 230.2 million.
Comment: Mathematically this is like a deductible of 20 and a maximum covered loss of 300.
4.31. For the Exponential, πp = -θ ln(1-p). πq = -θ ln(1-q).

πq πq
x = πq
∫π x f(x) dx = π∫ x e- x / θ / θ dx = -xe- x / θ - θe - x / θ ]
x = πp
p p
= (1-p)(πp + θ) - (1-q)(πq + θ) = (q-p)θ + (1-p)πp - (1-q)πq

πq
∫π x f(x) dx + (1- q)πq

p
CenTVaRp,q(X) = = θ(q-p)/(1-p) + πp = θ(q-p)/(1-p) - θ ln(1-p).
1 - p
Comment: For the Exponential, TVaRp (X) = θ - θ ln(1-p).

As q approaches 1, CenTVaRp,q(X) approaches TVaRp (X).
CenTVaR is not on the syllabus of this exam.
4.32. (i) Investment A

Expected return = E[0.1 + N] = 0.1 + 1 = 1.1
Variance = 1
Investment B
Expected return = (1.5)(0.99) + (-5.0)(0.01) = 1.435.
Variance = (0.99)(1.435 - 1.5)2 + (0.01){1.435 - (- 5)}2 = 0.418275
Investment B has both a higher expected return and lower variance so would be preferred on this
basis. However there is an issue with the possibility of very bad returns on Investment B.
Also there might be an issue with the estimated probabilities of investment B being somewhat
unreliable as they are probably derived from the heavy righthand tail of a distribution. Thus it might
be wise to take this calculation with a grain of salt.
(ii) a. Investment A.
Probability of return below 0 is probability of the return from N(1, 1) being below -0.1:
Φ[(-0.1 - 1)/1] = Φ[-1.1] = 0.1357.
Investment B Probability of return below 0 is 0.01.
b. Investment A.
Probability of return below 2 is probability of the return from N(1,1) being below -2.1:
Φ[(-2.1 - 1)/1] = Φ[-3.1] = 0.0010.
Investment B. Probability of return below -2 is 0.01.
(iii) One could instead use the Value at Risk or Tail Value at Risk.
For example, 95%-VaR is the 95th percentile of the distribution of losses, which in this case would
be the 5th percentile of the returns.
For Investment A, 95%-VaR is: 0.1 + {1 - 1.645) = -0.545.
For Investment B, 95%-VaR is: 1.5.
For Investment A, 99%-VaR is: 0.1 + {1 - 2.326) = -1.226.
For Investment B, 99%-VaR is: -5.0.
The 95%-TVaR is the average of those losses greater than or equal to 95%-VaR, or in this case
the average of the returns less than or equal to 95%-VaR.
For the Normal Distribution, TVaRp [X] = µ + σ φ[zp ] / (1 - p).
exp[-(1.6452) / 2]
Thus for Investment A, 95%-TVar is: 1.1 - 1 / 0.05 = -0.9622.
2π
For Investment B, 95%-TVar is: (0.4)(1.5) + (0.1)(-5.0) = 0.1.
exp[-(2.3262) / 2]
For Investment A, 95%-TVar is: 1.1 - 1 / 0.01 = -1.5674.
2π
For Investment B, 99%-TVar is: -5.0.
Comment: There is no one “correct” measure of risk.
Different measures of risk give different orderings in this case.
4.33. For earthquake, TVaR95% = {(3%)(200) + (2%)(400)} / 5% = 280.

For coastal property, TVaR95% = 250.
For both perils: Prob[650] = (2%)(20%) = 0.4%, Prob[450] = (7%)(20%) = 1.4%,
Prob[400] = (2%)(80%) = 1.6%, Prob[250] = (91%)(20%) = 18.2%.
(0.4%)(650) + (1.4%)(450) + (1.6%)(400) + (1.6%)(250)
For both perils, TVaR95% = = 386.
5%
Comment: Notice that as will be discussed with respect to coherence, the subadditivity property
holds: 280 + 250 > 386.
2016-C-4, Risk Measures §5 Distortion Risk Measures, HCM 10/21/15, Page 58
Section 5, Distortion Risk Measures43
A distortion function, g, maps [0, 1] to [0, 1] such that g(0) = 0, g(1) = 1, and g is increasing.
A distortion risk measure is obtained by taking the integral of g[S(x)]:44

∞
H(X) = ∫ g[S(x)]dx .
0
Examples of distortion risk measures are:
PH Transform45 g(y) = y1/κ
Wang Transform g(y) = Φ[ Φ-1[y] + κ]
Dual Power Transform g(y) = 1 - (1 - y)κ
It is less obvious, but the Value-at-Risk and Tail-Value-at-Risk risk measures can also be put in this
form and are thus distortion risk measures.
Proportional Hazard (PH) Transform:
Define the Proportional Hazard (PH) Transform to be:

g(S(x)) = S(x)1/κ, κ ≥ 1.
Exercise: What is the PH transform of an Exponential Distribution?

[Solution: For the Exponential, S(x) = exp[-x/θ]. S(x)1/κ = exp[-x/θ]1/κ = exp[-x/(κθ)].
Thus the PH transform is also an Exponential, but with θ replaced by κθ.]
∞
Recall that E[X] = ∫ S(x) dx . 46
The above integral computes the expected value of the losses. If instead we raised the survival
function to some power less than one, we would get a larger integral, since
S(x) < S(x)1/κ for κ > 1 and S(x) < 1.
43
No longer on the syllabus of this exam. However, it is on the syllabus of CAS Exam 7.
44
The integral is taken over the domain of X, which is most commonly 0 to ∞.
45
The PH Transform is a special case of a Beta Transform, where g is a Beta Distribution with θ = 1.
46
The Proportional Hazard Transform risk measure is for κ ≥ 1:

∞
H(X) = ∫ S(x)1/ κ dx .47
For κ = 1, the PH Transform is the mean. As the selected κ increases, so does the PH Transform.
The more averse to risk one is, the higher the selected κ should be, resulting in a higher level of
security.
For a Pareto Distribution with α = 4 and θ = 240, E[X] = 240/(3 - 1) = 80.
S(x) = {240/(240 + x)}4 . S(x)1/κ = {240/(240 + x)}4/κ, a Pareto with α = 4/κ and θ = 240.
Exercise: For κ = 1.2, what is the PH Transform risk measure for this situation?
[Solution: The transformed distribution is a Pareto with α = 4/1.2 = 3.33 and θ = 240.
∞
Therefore ∫ S(x)1/ κ dx = mean of this transformed Pareto = 240/(3.33 - 1) = 103.]
0
In the case of the Pareto Distribution, the PH Transform is also a Pareto, but with α replaced by
α/κ. Thus the PH Transform has reduced the Pareto's shape parameter, resulting in a distribution
with a heavier tail.48 The PH Transform risk measure is: θ/(α/κ - 1), for κ < α.
47
48
Heavier-tailed distributions are sometimes referred to as more risky.
Here is a graph of the PH Transform Risk Measure, for a Pareto Distribution with α = 4 and
θ = 240, as function of κ:
PHTrans. R. M.
1500
1000
500
80
kappa
1 2 3 3.5
Exercise: For this situation, what value of k corresponds to a relative security loading of 50%?
[Solution: θ/(α/κ - 1) = 1.5E[X] = 1.5 θ/(α - 1). ⇒ α - 1 = 1.5α/κ - 1.5.
⇒ κ = 1.5α/(α + .5) = (1.5)(4)/(4 + .5) = 6/4.5 = 1.33.]
Exercise: Losses follow an Exponential Distribution with θ = 1000.

Determine the PH Transform risk measure for κ = 1.6.
[Solution: S(x) = exp[-x/1000]. S(x)1/1.6 = exp[-x/1600].
The PH Transform risk measure is the mean of the new Exponential, 1600.
Comment: For the Exponential Distribution, the PH Transform risk measure is κθ.]
Wangʼs Transform:
Wangʼs Transform produces another risk measure, which is useful for working with Normal or
LogNormal losses.
Let X be LogNormal with µ = 6 and σ = 2. Then S(x) = 1 - Φ[(lnx - 6)/2].
Φ-1 is the inverse function of Φ. Φ-1[0.95] = 1.645. ⇔ Φ[1.645] = 0.95.
Φ-1[1 - 0.95] = -Φ-1[0.95] = -1.645. ⇔ Φ[-1.645] = 0.05.

Φ-1[S(x)] = Φ-1[1 - Φ[(lnx - 6)/2]] = - Φ-1[Φ[(lnx - 6)/2]] = -(lnx - 6)/2.
Φ[Φ-1[S(x)] + 0.7] = Φ[0.7 - (lnx - 6)/2] = 1 - Φ[(lnx - 6)/2 - 0.7] = 1 - Φ[(lnx - {6 + (0.7)(2)})/2].
Thus Φ[Φ-1[S(x)] + 0.7] is the survival function of a LogNormal with µ = 6 + (0.7)(2) and σ = 2.
Define Wangʼs Transform to be: g(S(x)) = Φ[Φ-1[S(x)] + κ], κ ≥ 0.
As shown in the above example, if X is LogNormal with parameters µ and σ, then the Wang
Transform is also LogNormal but with parameters µ + κσ and σ.
The Wang Transform risk measure is for κ ≥ 0:

∞
H(X) = ∫ Φ[ Φ− 1[S(x)] + κ ] dx . 49
Exercise: X is LogNormal with µ = 6 and σ = 2.

Determine the Wang Transform risk measure for κ = 0.7.
[Solution: The Wang Transform is LogNormal with µ = 6 + (0.7)(2) = 7.4 and σ = 2.
The Wang Transform risk measure is the mean of that LogNormal: exp[7.4 + 22 /2] = 12,088.]
Dual Power Transform:
g(S(x)) = 1 - {1 - S(x)}κ = 1 - F(x)κ, κ ≥ 1.
Exercise: X is uniform from 0 to 1. Determine the Dual Power Transform with κ = 3.

[Solution: F(x) = x. 1 - F(x)3 = 1 - x3 , 0 ≤ x ≤ 1.
Comment: This is the survival function of a Beta Distribution with a = 3, b = 1, and θ = 1.
The corresponding density is: 3x2 , 0 ≤ x ≤ 1.]
Prob[Maximum of a sample of size N ≤ x] = Prob[X ≤ x]N = F(x)N.

The distribution function of the maximum of a sample of size N is F(x)N.
Therefore, 1 - F(x)N is the survival function of the maximum of a sample of size N.
49
Therefore, if κ is an integer, the Dual Power Transform is the survival function of the maximum of a
sample of size κ.
The Dual Power Transform measure of risk is:

∞
κ
H(X) = ∫1 - F(x) dx , κ ≥ 1.50
0
Thus if k = N, then the Dual Power Transform risk measure is the expected value of the maximum
of a sample of size N.
Exercise: X is uniform from 0 to 1. Determine the Dual Power Transform risk measure with κ = 3.
1
[Solution: F(x) = x. 1 - F(x)3 =1- x3 , 0 ≤ x ≤ 1. ∫ 1 - x 3 dx = 1 - 1/4 = 3/4.
0
Comment: The mean of a Beta Distribution with a = 3, b = 1, and θ = 1 is: (1)(3)/(3 + 1) = 3/4.
The expected value of the maximum of a sample of size N from a uniform distribution on (0, ω) is:
ω N/(N + 1). See “Mahlerʼs Guide to Statistics”, covering material on the syllabus of CAS3.]
Value at Risk:
For the VaRp risk measure:

⎧ 0 if 0 ≤ y ≤ 1 - p
g(y) = ⎨
⎩1 if 1 - p < y ≤ 1
Using the above distortion function, g[S(x)] is one when S(x) > 1 - p, and otherwise zero.
∞ πp
S(x) > 1 - p when x < πp . Thus ∫ g[S(x)]dx = ∫ 1 dx = πp.
0 0
Exercise: What is the distortion function for VaR0.95?

⎧ 0 if 0 ≤ y ≤ 0.05
[Solution: g(y) = ⎨ .]
⎩1 if 0.05 < y ≤ 1
50
Tail-Value-at-Risk:
For the TVaRp risk measure:

⎧y / (1-p) if 0 ≤ y ≤ 1 - p
g(y) = ⎨
⎩ 1 if 1 - p < y ≤ 1
Then, g[S(x)] is S(x)/(1 - p) when S(x) ≤ 1 - p, in other words when x ≥ πp , and otherwise 1. Thus
∞ πp ∞
∫ g[S(x)]dx = ∫ 1 dx + π∫ S(x)/ (1 - p) dx = πp + e(πp ) = TVaRp .
0 0 p
Exercise: What is the distortion function for TVaR0.95?

⎧20y if 0 ≤ y ≤ 0.05
[Solution: g(y) = ⎨ .]
⎩ 1 if 0.05 < y ≤ 1
Then, g[S(x)] is 20S(x) when S(x) ≤ 0.05, in other words when x ≥ π.9 5, and otherwise 1.
∞ π.95 ∞
Thus ∫ g[S(x)]dx = ∫ 1 dx + ∫ 20 S(x) dx = π0.9 5 + (layer from π0.9 5 to ∞)/0.05
0 0 π.95
= π0.9 5 + e(π0.9 5) = TVaR0.95.

Problems:
5.1 (1 point) What is the PH Transform risk measure with κ = 1.5 for an Exponential Distribution
with mean of 100?
A. 150 B. 160 C. 170 D. 180 E. 190
5.2 Which of the following distributions are not preserved under a PH Transform?
A. Single Parameter Pareto
B. Weibull
C. Burr
D. Gompertzʼs Law, F(x) = 1 - exp[-B[cx - 1)/ln(c)]
E. LogNormal
5.3 (2 points) F(x) = 1 - {300/(300 + x)}5 .

Determine the Proportional Hazard Transform risk measure with κ = 2.
A. 200 B. 220 C. 240 D. 260 E. 280
5.4 (2 points) Losses follow a Uniform Distribution from 0 to 100.

Determine the Proportional Hazard Transform risk measure with κ = 1.3.
A. less than 55
E. at least 70
5.5 (3 points) Aggregate losses follow a Single Parameter Pareto Distribution with α = 3 and
θ = 10. However, a reinsurance contract caps the insurerʼs payments at 30.
Determine the Proportional Hazard Transform risk measure of the insurerʼs payments with κ = 1.2.
A. less than 16
E. at least 19
5.6 (3 points) Losses follow a Weibull Distribution with θ = 1000 and τ = 0.4.
Determine the Proportional Hazard Transform risk measure with κ = 1.8. Hint: Γ(1/2) = π.
A. 14,000 B. 14,500 C. 15,000 D. 15,500 E. 16,000

100 60%
500 30%
1000 10%
Determine the Proportional Hazard Transform risk measure with κ = 2.
A. less than 350
E. at least 500

The premium will be set equal to the proportional hazard transform of the distribution of aggregate
annual losses retained by the insurer or reinsurer, with κ = 1.2.
The relative security loading, η, is such that: Premiums = (1 + η) (Expected Losses).
5.8 (6 points) Annual aggregate losses follow an Exponential Distribution with mean 100.
Determine the relative security loading for the following situations:
(a) The insurer retains all losses.
(b) The insurer retains only the layer from 0 to 50.
(c) A reinsurer retains only the layer from 50 to 100.
(d) A reinsurer retains only the layer above 100.
5.9 (6 points) Annual aggregate losses follow a Pareto Distribution with α = 3 and θ = 200.
Determine the relative security loading for the following situations:
(a) The insurer retains all losses.
(b) The insurer retains only the layer from 0 to 50.
(c) A reinsurer retains only the layer from 50 to 100.
(d) A reinsurer retains only the layer above 100.
5.10 (2 points) Losses are LogNormal with µ = 4 and σ = 0.6.

A. less than 75
E. at least 90
5.11 (2 points) Losses are Normal with µ = 7000 and σ = 500.

A. 7000 B. 7100 C. 7200 D. 7300 E. 7400

100 60%
500 30%
1000 10%
Determine the Wang Transform risk measure with κ = 0.5.
A. 350 B. 400 C. 450 D. 500 E. 550
5.13 (4 points) Y+ is defined as 0 if Y ≤ 0, and Y if Y > 0.

X is the value of a put option.
X = (220 - P)+, where P follows a LogNormal Distribution with µ = 5.5 and σ = 0.2.

A. 17 B. 19 C. 21 D. 23 E. 25
5.14 (4 points) Y+ is defined as 0 if Y ≤ 0, and Y if Y > 0.

X is the value of a call option.
X = (P - 300)+, where P follows a LogNormal Distribution with µ = 5.5 and σ = 0.2.

A. 8 B. 9 C. 10 D. 11 E. 12
5.15 (2 points) What is the Dual Power Transform risk measure with κ = 3 for an Exponential
Distribution with mean 100?
A. 140 B. 160 C. 180 D. 200 E. 220
5.16 (2 points) Losses are uniform from 0 to 100.

Determine the Dual Power Transform risk measure with κ = 1.4.
A. 52 B. 54 C. 56 D. 58 E. 60
5.17 (2 points) Losses follow a Pareto Distribution with α = 5 and θ = 10.

Determine the Dual Power Transform risk measure with κ = 2.
A. 3.3 B. 3.5 C. 3.7 D. 3.9 E. 4.1

100 60%
500 30%
1000 10%
Determine the Dual Power Transform risk measure with κ = 1.5.
A. less than 350
E. at least 500
5.19 (1 point) Graph the distortion functions corresponding to VaR0.9.
5.20 (1 point) Graph the distortion function corresponding to TVaR0.9
5.21 (1 point) Graph the distortion function corresponding to the PH Transform with κ = 2.
5.22 (1 point) Graph the distortion function corresponding to the Dual Power Transform with κ = 2.
5.23 (3 points) Graph the distortion function corresponding to the Wang Transform with κ = 0.3.
5.24 (3 points) A distortion risk measure has:

⎧10y if 0 ≤ y ≤ 0.1
g(y) = ⎨
⎩ 1 if 0.1 < y ≤ 1
Determine the risk measure for a Pareto Distribution with α = 3 and θ = 200.
A. less than 500
E. at least 800
5.25 (1 point) Which of the following are distortion risk measures?

1. PH (Proportional Hazard) Transform
2. Dual Power Transform
3. Expected Value Premium Principle
A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, and 3 E. Not A, B, C, or D
5.26 (2 points) Which of the following is the distortion function for the VaR risk measure for
p = 90%?
⎧ 0 if 0 ≤ y ≤ 0.10
A. g(y) = ⎨
⎩1 if 0.10 < y ≤ 1
⎧ 0 if 0 ≤ y ≤ 0.90
B. g(y) = ⎨
⎩1 if 0.90 < y ≤ 1
⎧10y if 0 ≤ y ≤ 0.10
C. g(y) = ⎨
⎩ 1 if 0.10 < y ≤ 1
⎧10y if 0 ≤ y ≤ 0.90
D. g(y) = ⎨
⎩ 1 if 0.90 < y ≤ 1
5.27 (2 points) Which of the following is the distortion function for the TVaR risk measure for
p = 90%?
⎧ 0 if 0 ≤ y ≤ 0.10
A. g(y) = ⎨
⎩1 if 0.10 < y ≤ 1
⎧ 0 if 0 ≤ y ≤ 0.90
B. g(y) = ⎨
⎩1 if 0.90 < y ≤ 1
⎧10y if 0 ≤ y ≤ 0.10
C. g(y) = ⎨
⎩ 1 if 0.10 < y ≤ 1
⎧10y if 0 ≤ y ≤ 0.90
D. g(y) = ⎨
⎩ 1 if 0.90 < y ≤ 1
5.28 (4, 5/07, Q.27) (2.5 points) You are given the distortion function:
g(x) = x.
Calculate the distortion risk measure for losses that follow the Pareto distribution with θ = 1000 and
α = 4.
(A) Less than 300
(E) At least 1200
Solution to Problems:
5.1. A. S(x) = exp[-x/100]. S(x)1/κ = exp[-x/100]1/1.5 = exp[-x/150].

Thus the PH transform is also an Exponential, but with mean 150.
5.2. E. For the Single Parameter Pareto, S(x) = (θ/x)α. S(x)1/κ = (θ/x)α/κ.
Another Single Parameter Pareto, but with α replaced by α/κ.
For the Weibull, S(x) = exp[-(x/θ)τ].
S(x)1/κ = exp[-(x/θ)τ]1/κ = exp[-(x/θ)τ/κ] = exp[-{x/(θ κ1/τ)}τ].
Thus the PH transform is also a Weibull distribution, but with θ replaced by θ κ1/τ.
For the Burr, S(x) = {θ/(θ + xγ)}α. S(x)1/κ = {θ/(θ + xγ)}α/κ.

Thus the PH transform is also a Burr Distribution, but with α replaced by α/κ.
For Gompertzʼs Law, S(x) = exp[-B(cx - 1)/ln(c)]. S(x)1/κ = exp[-(B/κ)(cx - 1)/ln(c)].

Thus the PH transform is also Gompertzʼs Law, but with B replaced by B/κ.
For the LogNormal, S(x) = 1 - Φ[(lnx - µ)/σ]. S(x)1/κ is not of the same form.
5.3. A. S(x) = {300/(300 + x)}5 . S(x)1/2 = {300/(300 + x)}2.5, a Pareto Distribution with α = 2.5
and θ = 300. The risk measure is the mean of this second Pareto Distribution, 300/(2.5 - 1) = 200.
5.4. B. S(x) = 1 - x/100, x ≤ 100. S(x)1/1.3 = (1 - x/100)1/1.3.

100 x=100
∫ (1 - x / 100)1/ 1.3 dx = -100(1 - x/100)1+1/1.3/(1 + 1/1.3) ] = 100/(1 + 1/1.3) = 56.52.
0 x=0
5.5. A. S(x) = (10/x)3 , x < 30. S(x)1/1.2 = (10/x)3/1.2, x < 30, a Single Parameter Pareto
Distribution with α = 3/1.2 = 2.5 and θ = 10, capped at 30.
The risk measure is the limited expected value at 30 of this transformed Single Parameter Pareto.
E[X ∧ x] = θ {α - (θ/x)α−1} / (α - 1). E[X ∧ 30] = 10{2.5 - (10/30)1.5} / 1.5 = 15.38.
5.6. B. For the Weibull, S(x) = exp[-(x/θ)τ].
S(x)1/κ = exp[-(x/θ)τ]1/κ = exp[-(x/θ)τ/κ] = exp[-{x/(θ κ1/τ)}τ].
Thus the PH transform is also a Weibull distribution, but with θ replaced by θ κ1/τ.
Therefore, the risk measure is the mean of a Weibull with θ = (1000)(1.81/0.4) = 4347 and τ = 0.4:
4347 Γ(1 + 1/0.4) = 4347 Γ(3.5) = (4347)(2.5)(1.5)(0.5)Γ(1/2) = 8151 π = 14,447.
5.7. E. For the original distribution: S(x) = 1 for x < 100, 0.4 for 100 ≤ x < 500,
0.1 for 500 ≤ x < 1000, 0 for x ≤ 1000.
For the PH Transform, S(x) = 1 for x < 100, 0.41/2 = 0.6325 for 100 ≤ x < 500,
0.11/2 = 0.3162 for 500 ≤ x < 1000, 0 for x ≤ 1000.
The integral of the Survival Function of the PH Transform is:
(100)(1) + (500 - 100)(0.6325) + (1000 - 500)(0.3162) = 511.
Comment: The mean of the original distribution is:
(100)(1) + (500 - 100)(0.4) + (1000 - 500)(0.1) = 310
= (60%)(100) + (30%)(500) + (10%)(1000).
5.8. For an Exponential with mean θ, S(x) = e-x/θ. S(x)1/κ = e-x/(θκ).

u
∫ S(x)dx = θ(e-d/θ - e-u/θ).
d
Therefore for the layer from d to u, E[X] = θ(e-d/θ - e-u/θ), and H(X) = κθ{e-d/(κθ) - e-u/(κθ)}.
H(X)/E[X] = κ{e-d/(κθ) - e-u/(κθ)}/(e-d/θ - e-u/θ) = 1.2(e-d/120 - e-u/120)/(e-d/100 - e-u/100).

(a) For d = 0 and u = ∞, H(X)/E[X] = 1.2. η = 20.0%.
(b) For d = 0 and u = 50, H(X)/E[X] = 1.2(1 - e-50/120)/(1 - e-50/100) = 1.039. η = 3.9%.
(c) For d = 50 and u = 100, H(X)/E[X] = 1.2(e-50/120 - e-100/120)/(e-50/100 - e-100/100) = 1.130.
η = 13.0%.
(d) For d = 100 and u = ∞, H(X)/E[X] = 1.2(e-100/120 - 0)/(e-100/100 - 0) = 1.418. η = 41.8%.
Comment: The lowest layer gets the smallest relative security loading, while the highest layer gets
the highest relative security loading.
5.9. For the Pareto, S(x) = θα/(x + θ)α. S(x)1/κ = θα/κ/(x + θ)α/κ.
u
∫ S(x)dx = θα{1/(θ + d)α−1 - 1/(θ + u)α−1}/(α - 1).
d
Therefore for the layer from d to u, E[X] = θα{1/(θ + d)α−1 - 1/(θ + u)α−1}/(α - 1) =
2003 {1/(200 + d)2 - 1/(200 + u)2 }/2, and H(X) = θα/κ{1/(θ + d)α/κ−1 - 1/(θ + u)α/κ−1}/(α/κ - 1) =
2002.5{1/(200 + d)1.5 - 1/(200 + u)1.5}/1.5.
H(X)/E[X] = (4/3)200-.5{1/(200 + d)1.5 - 1/(200 + u)1.5}/{1/(200 + d)2 - 1/(200 + u)2 }.
(a) For d = 0 and u = ∞, H(X)/E[X] = 4/3. η = 1/3 = 33.3%.
(b) For d = 0 and u = 50, H(X)/E[X] = (4/3)200-.5{1/2001.5 - 1/2501.5}/{1/2002 - 1/2502 } = 1.054.
η = 5.4%.
(c) For d = 50 and u = 100, H(X)/E[X] = (4/3)200-.5{1/2501.5 - 1/3001.5}/{1/2502 - 1/3002 } =
1.167. η = 16.7%.
(d) For d = 100 and u = ∞, H(X)/E[X] = (4/3)200-.5{1/3001.5}/{1/3002 } = 1.633. η = 63.3%.
5.10. B. The Wang Transform is LogNormal with µ = 4 + (.3)(.6) = 4.18 and σ = 0.6.
The Wang Transform risk measure is the mean of that LogNormal: exp[4.18 + .62 /2] = 78.26.
5.11. E. Φ -1[S(x)] = Φ-1[1 - Φ[(x - µ)/σ]] = - Φ-1[Φ[(x - µ)/σ]] = -(x - µ)/σ.

Φ[Φ-1[S(x)] + κ] = Φ[κ - (x - µ)/σ] = 1 - Φ[(x - µ)/σ - κ] = 1 - Φ[(x - {µ + κσ})/σ].
This is the survival function of a Normal with µʼ = µ + κσ and σ = 2.
The Wang Transform risk measure is the mean of that Normal: µ + κσ.
In this case, µ + κσ = 7000 + (.8)(500) = 7400.
Comment: As applied to the Normal Distribution, the Wang Transform is equivalent to the
Standard Deviation Premium Principle, with α = κ.
The Wang Transform Risk Measure for κ > 0 is greater than the mean, eliminating choice A.
5.12. C. For the original distribution: S(x) = 1 for x < 100, 0.4 for 100 ≤ x < 500,
0.1 for 500 ≤ x < 1000, 0 for x ≤ 1000.
Φ-1[S(x)] is: ∞ for x < 100, Φ-1[0.4] = -0.253 for 100 ≤ x < 500,
Φ-1[0.1] = -1.282 for 500 ≤ x < 1000, -∞ for x ≤ 1000.
Φ-1[S(x)] + κ is: ∞ for x < 100, 0.247 for 100 ≤ x < 500, -0.782 for 500 ≤ x < 1000,
-∞ for x ≤ 1000.
Φ[Φ-1[S(x)] + κ] is: 1 for x < 100, Φ[0.25] = 0.5987 for 100 ≤ x < 500,
Φ[-0.78] = 0.2177 for 500 ≤ x < 1000, 0 for x ≤ 1000.
The integral of the Survival Function of the Wang Transform is:
(100)(1) + (500 - 100)(0.5987) + (1000 - 500)(0.2177) = 448.
5.13. A. SX(x) = Prob[X > x] = Prob[220 - p > x] = Prob[p < 220 - x] = FP(150 - x) =
Φ[{ln(220 - x) - 5.5}/0.2], for x ≤ 220.
Φ-1[S(x)] is: {ln(220 - x) - 5.5}/0.2, for x ≤ 220.
Φ-1[S(x)] + 0.6 = {ln(220 - x) - 5.38}/0.2, for x ≤ 220.
Φ[Φ-1[S(x)] + 0.6] = Φ[{ln(220 - x) - 5.38}/0.2], for x ≤ 220.
Let p = 220 - x, then Φ[Φ-1[S(x)] + 0.6] = Φ[{ln(p) - 5.38}/0.2], for p ≤ 220.
The integral of Φ[Φ-1[S(x)] + 0.6] is the integral of a LogNormal Distribution Function with µ = 5.38
and σ = 0.2, from 0 to 220.
220 220 220
∫ F(x) dx = ∫ 1 - S(x) dx = 220 - ∫ S(x) dx = 220 - E[X ∧ 220].
0 0 0
For the LogNormal with µ = 5.38 and σ = 0.2, E[X ∧ 220] =

exp[5.38 + .22 /2] Φ[(ln220 - 5.38 - 0.22 )/0.2] + 220 {1 - Φ[(ln220 - 5.38)/0.2]}
= (221.406)Φ[-0.13] + 220 {1 - Φ[0.07]} = (221.406)(.4483) + (220)(.4721) = 203.1.
The integral of Φ[Φ-1[S(x)] + 0.6] is: 220 - 203.1 = 16.9.
5.14. D. SX(x) = Prob[X > x] = Prob[p - 300 > x] = Prob[p > 300 + x] = SP(300 + x) =
1 - Φ[{ln(300 + x) - 5.5}/0.2] = Φ[{5.5 - ln(300 + x)}/0.2], for x > 0.
Φ-1[S(x)] is: {5.5 - ln(300 + x)}/0.2, for x > 0.
Φ-1[S(x)] + 0.4 = {5.58 - ln(300 + x)}/0.2, for x > 0.
Φ[Φ-1[S(x)] + 0.4] = Φ[{5.58 - ln(300 + x)}/0.2] = 1 - Φ[{ln(300 + x) - 5.58}/0.2] for x > 0.
Let p = 300 + x, then Φ[Φ-1[S(x)] + 0.4] = 1 - Φ[{ln(p) - 5.58}/0.2], for p > 300
The integral of Φ[Φ-1[S(x)] + 0.4] is the integral of a LogNormal Survival Function with µ = 5.58 and
σ = 0.2, from 300 to ∞, which is for that LogNormal: E[X] - E[X ∧ 300].
For the LogNormal with µ = 5.58 and σ = 0.2, E[X] = exp[5.58 + .22 /2] = 270.426. E[X ∧ 300] =
exp[5.58 + .22 /2] Φ[(ln300 - 5.58 - 0.22 )/0.2] + 300 {1 - Φ[(ln300 - 5.58)/0.2]}
= (270.426)Φ[0.42] + 300 {1 - Φ[0.62]} = (270.426)(.6628) + (300)(.2676) = 259.5.
The integral of Φ[Φ-1[S(x)] + 0.4] is: 270.426 - 259.5 = 10.9.
5.15. C. F(x) = 1 - exp[-x/100]. F(x)3 = 1 - 3exp[-x/100] + 3exp[-2x/100] - exp[-3x/100].

∞
∫ 1 - F(x)3 dx = 3(100) - (3)(100/2) + 100/3 = 183.3.
0
Comment: The expected value of the maximum of a sample of size N from an Exponential is:
N
∑ i . See “Mahlerʼs Guide to Statistics”, covering material on the syllabus of CAS Exam ST.
1
θ
i=1
5.16. D. F(x) = x/100. F(x)1.4 = x1.4/1001.4 .

100
∫ 1 - F(x)1.4 dx = 100 - 100/2.4 = 58.33.
0
Comment: For a uniform distribution on (0, ω), the Dual Power Transform risk measure is:
ω κ /(κ + 1).
5.17. D. F(x) = 1 - {10/(10 + x)}5 . F(x)2 = 1 - 2({10/(10 + x)}5 + {10/(10 + x)}10.

∞
∫0 1 - F(x)2 dx = 10{2/4 - 1/9} = 3.89.

5.18. B. For the original distribution: F(x) = 0 for x < 100, 0.6 for 100 ≤ x < 500,
0.9 for 500 ≤ x < 1000, 1 for x ≤ 1000.
1 - F(x)κ is: 1 for x < 100, 1 - 0.61.5 = 0.5352 for 100 ≤ x < 500,
1 - 0.91.5 = 0.1462 for 500 ≤ x < 1000, 0 for x ≤ 1000.
The integral of the Survival Function of the Dual Power Transform is:
(100)(1) + (500 - 100)(0.5352) + (1000 - 500)(0.1462) = 387.
5.19. 90%-VaR. g(y) = 0 for 0 ≤ y ≤ 10%, and 1 for 10% < y ≤ 1.

g(y)
1.0
0.8
0.6
0.4
0.2
y
0.2 0.4 0.6 0.8 1.0
5.20. TVaR90%. g(y) = y/.1 = 10y for 0 ≤ y ≤ 10%, and 1 for 10% < y ≤ 1.
g(y)
1.0
0.8
0.6
0.4
0.2
y
0.2 0.4 0.6 0.8 1.0
5.21. PH Transform with κ = 2. g(y) = y1/2 = y .

g(y)
1.0
0.8
0.6
0.4
0.2
y
0.2 0.4 0.6 0.8 1.0
5.22. Dual Power Transform with κ = 2. g(y) = 1 - (1 - y)2 .
g(y)
1.0
0.8
0.6
0.4
0.2
y
0.2 0.4 0.6 0.8 1.0
5.23. Wang Transform with κ = 0.3. g(y) = Φ[ Φ-1[y] + 0.3].

g(y)
1.0
0.8
0.6
0.4
0.2
y
0.2 0.4 0.6 0.8 1.0
For example, without rounding,
g(0.05) = Φ[ Φ-1[0.05] + 0.3] = Φ[-1.645 + 0.3] = Φ[-1.345] = 0.0893.
g(0.7) = Φ[ Φ-1[0.7] + 0.3] = Φ[0.524 + 0.3] = Φ[0.824] = 0.795.
5.24. A. This is the distortion function for TVaR0.90.

∞ ∞ Q.90
H(X) = ∫ g[S(x)]dx = 10 ∫ S(x) dx + ∫ dx = (E[X] - E[X ∧ Q0.90])/(1 - 0.9) + Q0.90 =
0 Q.90 0
TVaR0.90.
E[X] = 200/(3 - 1) = 100.
0.9 = F(x) = 1 - {200/(200 + x)}3 . ⇒ Q0.90 = 230.89.
E[X ∧ 230.9] = {200/(3 - 1)}{1 - (200/(200 +230.9))3-1} = 78.46.

TVaR0.90 = Q0.90 + (E[X] - E[X ∧ Q0.90])/(1 - 0.9) = 230.89 + (100 - 78.46)/0.1 = 446.
Comment: y = S(x) in the definition of g(y) the distortion function.
0 ≤ y ≤ 0.1. ⇔ 0 ≤ S(x) ≤ 0.1. ⇔ 0.1 < F(x). ⇔ Q.90 < x.
When S(x) is small, x is large, while when S(x) is large, x is small.
0.1 < y. ⇔ 0.1 < S(x). ⇔ F(x) ≤ 0.1. ⇔ x ≤ Q.90.
5.25. A. The Expected Value Premium Principle is not a distortion risk measure.
⎧ 0 if 0 ≤ y ≤ 1 - α
5.26. A. For the α-VaR risk measure: g(y) = ⎨ .
⎩1 if 1 - α < y ≤ 1
⎧ 0 if 0 ≤ y ≤ 0.10
For 90%-VaR: g(y) = ⎨ .
⎩1 if 0.10 < y ≤ 1
Comment: g(S(x)) = 0 for S(x) ≤ .10, and g(S(x)) = 1 for S(x) > .10.
In other words, g(S(x)) = 0 for x ≥ Q.90, and g(S(x)) = 1 for x > Q.90.
∞ Q.90
Therefore, ∫ g(S(x)) dx = ∫ dx = Q.90.
0 0
⎧y / (1- α) if 0 ≤ y ≤ 1 - α
5.27. C. For the α-TVaR risk measure: g(y) = ⎨ .
⎩ 1 if 1 - α < y ≤ 1
⎧10y if 0 ≤ y ≤ 0.10
For 90%-TVaR: g(y) = ⎨ .
⎩ 1 if 0.10 < y ≤ 1
Comment: The distortion function for α-TVaR involves 1/(1-α).

g(S(x)) = 10 S(x) for S(x) ≤ .10, and g(S(x)) = 1 for S(x) > .10.
In other words, g(S(x)) = 10 S(x) for x ≥ Q.90, and g(S(x)) = 1 for x > Q.90.
∞ Q.90 ∞
Therefore, ∫ g(S(x)) dx = ∫ dx + ∫ 10 S(x) dx = Q.90 + 10 E[(X - Q.90)+] =
0 0 Q.90
Q .90 + E[(X - Q.90)+]/.1 = Q.90 + e(Q.90) = 90%-TVaR.
5.28. D. For this Pareto, S(x) = {1000/(1000 + x)}4 .

g(S(x)) = {1000/(1000 + x)}2 , the Survival Function of another Pareto with θ = 1000 and α = 2.
H(X) is the integral of g(S(x)), the mean of this second Pareto: 1000/( 2 - 1) = 1000.
∞ ∞ ∞
Alternately, ∫ g(S(x))dx = ∫ 10002 / (1000 + x)2 dx = -1,000,000/(1000 + x) ] = 1000.
0
0 0
Comment: PH transform with κ = 2.

For a Pareto Distribution, the PH Transform risk measure is: θ/(α/κ - 1), α/κ > 1.
2016-C-4, Risk Measures §6 Coherence, HCM 10/21/15, Page 78
Section 6, Coherence51
There are various desirable properties for a risk measure to satisfy.
A risk measure is coherent if it has the following four properties:

1. Translation Invariance
2. Positive Homogeneity
3. Subadditivity
4. Monotonicity
Translation Invariance:
ρ(X + c) = ρ(X) + c, for any constant c.
In other words, a risk measure is translation invariant if adding a constant to the loss variable, adds
that same constant to the risk measure.
Letting X = 0, Translation Invariance ⇒ ρ(c) = c.

In other words if the outcome is certain, the risk measure is equal to the loss
For example, if the loss is always 1000, then the risk measure is 1000.
Positive Homogeneity:
ρ(cX) = c ρ(X), for any constant c > 0.
In other words, a risk measure is positive homogeneous if multiplying the loss variable by a
positive constant, multiplies the risk measure by the same constant.
Positive Homogeneity ⇒ If the loss variable is converted to a different currency at a fixed rate of
exchange, then so is the risk measure.
Positive Homogeneity ⇒ If the exposure to loss is doubled, then so is the risk measure.
Subadditivity:
ρ(X + Y) ≤ ρ(X) + ρ(Y).

51
See Section 3.5.2 of Loss Models, in particular Definition 3.11. See also “Setting Capital Requirements With
Coherent Measures of Risk”, by Glenn G. Meyers, August 2002 and November 2002 Actuarial Reviews.
In other words, a risk measure satisfies subadditivity, if the merging of two portfolios can not
increase the total risk compared to the sum of their individual risks, but may decrease the total risk.
It should not be possible to reduce the appropriate premium or the required surplus by splitting a
portfolio into its constituent parts.52
Exercise: Determine whether VaR90% satisfies subadditivity.

[Solution: For example, take the following joint distribution for X and Y:
X = 0 and Y = 0, with probability 88%
Then for X, Prob[X = 0] = 92%, Prob[X = 1] = 8%, π0.90 = 0.
For Y, Prob[Y = 0] = 92%, Prob[Y = 1] = 8%, π0.90 = 0.
For X + Y, Prob[X + Y = 0] = 88%, Prob[X + Y = 1] = 8%, Prob[X + Y = 2] = 4%, π0.90 = 1.
ρ(X + Y) = 1 > 0 = 0 + 0 = ρ(X) + ρ(Y).

Thus VaR90% does not have the subadditivity property.
X and Y are not independent.
In order to have the subadditivity property, one must have that H(X + Y) ≤ H(X) + H(Y), for all
possible distributions of losses X and Y.]
Since it does not satisfy subadditivity, Value at Risk (VaR) is not coherent.53
Monotonicity:
If Prob[X ≤ Y] = 1, then ρ(X) ≤ ρ(Y).54
In others words, a risk measure satisfies monotonicity, if X is never greater than Y, then the risk
associated with X is not greater than the risk associated with Y.
For example, let X = (180 - P)+ and Y = (200 - P)+.55 Then X ≤ Y.

Therefore, for any risk measure that satisfies monotonicity, ρ(X) ≤ ρ(Y).
52
Mergers do not increase risk. Diversification does not increase risk.
53
Nevertheless, VaR is still commonly used, particularly in banking. In most practical applications, VaR is
subadditive. Also in some circumstances it may be valuable to disaggregate risks.
54
Technically, we are allowing X > Y on a set of probability zero, something of interest to mathematicians but not
most actuaries.
55
X and Y are two put options on the price of the same stock P. with different strike prices.
Risk Measures:
The Tail Value at Risk is a coherent measure of risk.
The Standard Deviation Premium Principle and the Value at Risk are not coherent
measures of risk.
Translation Positive Sub-

Risk Measure Invariance Homogen. additivity Monotonicity Coherence
Expected Value
Premium Principle No56 Yes Yes Yes No
Standard Deviation
Premium Principle Yes Yes Yes No No
Variance
Premium Principle Yes No No No No
Value at Risk Yes Yes No Yes No
Tail
Value at Risk Yes Yes Yes Yes Yes
A measure of risk is coherent if and only if it can be expressed as the supremum of the
expected losses taken over a class of probability measures on a finite set of scenarios.57
A distortion measure is coherent if and only if the distortion function is concave.58

From this it follows that the PH Transform Risk Measure, the Double Power Transform, and
the Wang Transform are each coherent.
It can be shown that for a coherent risk measure: E[X] ≤ ρ(X) ≤ Max[X].59
56
For (1+k)E[X] with k > 0. For E[X] translation invariance holds.
57
“Coherent Measure of Risks” by Philippe Artzner, Freddy Delbaen, Jean-Marc Eber, and David Heath,
Mathematical Finance 9 (1999), No. 3.
58
“Distortion Risk Measures: Coherence and Stochastic Dominance” by Julia L. Wirch and Mary R. Hardy,
presented at the 6th International Congress on Insurance: Mathematics and Economics.
59
TVaR0 = E[X] and TVaR1 = Max[X].
Problems:
6.1 (3 points) List and briefly define the properties that make a risk measure, ρ(X), coherent.
6.2 (3 points) Briefly discuss whether the Expected Value Premium Principle is a coherent risk
measure. Which of the properties does it satisfy?
6.3 (3 points) Briefly discuss whether the Standard Deviation Premium Principle is a coherent risk
measure. Which of the properties does it satisfy?
6.4 (3 points) Briefly discuss whether the Variance Premium Principle is a coherent risk measure.
Which of the properties does it satisfy?
6.5 (3 points) Briefly discuss whether the Value at Risk is a coherent risk measure.
6.6 (2 points) Briefly discuss whether the Tail Value at Risk is a coherent risk measure.
6.7 (3 points) Briefly discuss whether ρ(X) = E[X] is a coherent risk measure.
6.8 (3 points) Define ρ(X) = Maximum[X], for loss distributions for which Maximum[X] < ∞.
Briefly discuss whether this is a coherent risk measure.
6.9 (3 points) The Exponential Premium Principle has ρ(X) = ln[E[eαX]]/α, α > 0.
Briefly discuss whether it is a coherent risk measure. Which of the properties does it satisfy?
6.1. 1. Translation Invariance. ρ(X + c) = ρ(X) + c.

2. Positive Homogeneity. ρ(cX) = c ρ(X), for any constant c > 0.
3. Subadditivity. ρ(X + Y) ≤ ρ(X) + ρ(Y).
4. Monotonicity. If Prob[X ≤ Y] = 1, then ρ(X) ≤ ρ(Y).
6.2. ρ(X) = (1 + k)E[X], k > 0.

1. ρ(X + c) = (1 + k)E[X +c] = (1 + k)E[X] + (1 + k)c
= ρ(X) + (1 + k)c ≠ ρ(X) + k. Translation Invariance does not hold
2. ρ(cX) = (1 + k)E[cX] = c(1 + k)E[X] = c ρ(X). Positive Homogeneity does hold.
3. ρ(X + Y) = (1 + k)E[X + Y] = (1 + k)E[X] + (1 + k)E[Y] = ρ(X) + ρ(Y) ≤ ρ(X) + ρ(Y).
Subadditivity does hold.
4. If Prob[X ≤ Y] = 1, then ρ(X) = (1 + c)E[X] ≤ (1 + c)E[Y] = ρ(Y). Monotonicity does hold.
The Expected Value Premium Principle is not coherent since #1 does not hold.
6.3. ρ(X) = E[X] + kStdDev[X], k > 0.

1. ρ(X + c) = E[X + c] + kStdDev[X+c] = E[X] + c + kStdDev[X] = ρ(X) + c.
Translation Invariance does hold
2. ρ(cX) = E[cX] + kStdDev[cX] = c E[X] + k c StdDev[X] = c ρ(X).
Positive Homogeneity does hold.
3. ρ(X + Y) = E[X + Y] + kStdDev[X + Y] = E[X] + E[Y] + kStdDev[X + Y].
Now Var[X + Y] = σ2X + σ2Y + 2σX σY Corr[X, Y] ≤ σ2X + σ2Y + 2σX σY, since Corr[X, Y] ≤ 1
Var[X + Y] ≤ (σX + σY)2 . ⇒ StdDev[X + Y] ≤ σX + σY.
⇒ ρ(X + Y) ≤ E[X] + E[Y] + kStdDev[X] + kStdDev[X] = ρ(X) + ρ(Y).

Subadditivity does hold.
4. Let X be uniform from 0 to 1. Let Y be constant at 2.
Let k = 10. Then ρ(X) = 0.5 + 10 1/ 12 = 3.39. ρ(Y) = 2 + (10)(0) = 2.
Prob[X ≤ Y] = 1, yet ρ(X) > ρ(Y). Monotonicity does not hold.
The Standard Deviation Premium Principle is not coherent since #4 does not hold.
6.4. ρ(X) = E[X] + kVar[X], α > 0.

1. ρ(X + c) = E[X + c] + kVar[X+c] = E[X] + c + kVar[X] = ρ(X) + c.
2. ρ(λX) = E[cX] + kVar[cX] = c E[X] + k c2 Var[X] ≠ c ρ(X).
Positive Homogeneity does not hold.
3. ρ(X + Y) = E[X + Y] + kVar[X + Y] = E[X] + E[Y] + kVar[X + Y].
Now Var[X + Y] = σ2X + σ2Y + 2σX σY Corr[X, Y].
If Corr[X, Y] > 0, then Var[X + Y] > Var[X] + Var[Y], and ρ(X + Y) > ρ(X) + ρ(Y).
Subadditivity does not hold.
4. Let X be uniform from 0 to 1. Let Y be constant at 1.
Let k = 10. Then ρ(X) = 0.5 + 10(1/12) = 1.333. ρ(Y) = 1 + (10)(0) = 1.
Prob[X ≤ Y] = 1, yet ρ(X) > ρ(Y). Monotonicity does not hold.
The Variance Premium Principle is not coherent.
6.5. ρ(X) = πp , the pth percentile

1. Adding a constant to a variable adds a constant to each percentile.
ρ(X + c) = ρ(X) + c. Translation Invariance does hold
2. Multiplying a variable by a constant multiplies each percentile by that constant.
ρ(cX) = c ρ(X). Positive Homogeneity does hold.
3. For example, take the following joint distribution for X and Y:
Then for X, Prob[X = 0] = 92%, Prob[X = 1] = 8%, π.90 = 0.
For Y, Prob[Y = 0] = 92%, Prob[Y = 1] = 8%, π.90 = 0.
For X + Y, Prob[X + Y = 0] = 88%, Prob[X + Y = 1] = 8%, Prob[X + Y = 2] = 4%, π.90 = 1.
Let p = 90%. ρ(X + Y) = 1 > 0 = 0 + 0 = ρ(X) + ρ(Y).

Subadditivity does not hold.
4. If Prob[X ≤ Y] = 1, then the pth percentile of X is ≤ the pth percentile of Y.
ρ(X) ≤ ρ(Y). Monotonicity does hold.
Value at Risk is not coherent since #3 does not hold.
6.6. ρ(X) = E[X | X > πp ].

1. Adding a constant to a variable adds a constant to each percentile.
ρ(X + c) = E[X + c | X + c > πp + c ] = E[X + c | X > πp ] = E[X | X > πp ] + c = ρ(X) + c.
2. Multiplying a variable by a constant multiplies each quantile by that constant.
ρ(cX) = E[cX | cX > cπp ] = E[cX | X > πp ] = c E[X | X > πp ] = c ρ(X).
Positive Homogeneity does hold.
3. E[X | worst p of the outcomes for X] ≥ E[X | worst p of the outcomes for X + Y].
ρ(X + Y) = E[X + Y | worst p of the outcomes for X + Y] =
E[X | worst p of the outcomes for X + Y] + E[Y | worst p of the outcomes for X + Y]
≤ E[X | worst p of the outcomes for X] + E[Y | worst p of the outcomes for Y]
= ρ(X) + ρ(Y). Subadditivity does hold.
4. If Prob[X ≤ Y] = 1, then the pth percentile of X is ≤ the pth percentile of Y.
Therefore ρ(X) = E[X | worst p of the outcomes for X] ≤ E[Y | worst p of the outcomes for X]
≤ E[Y | worst p of the outcomes for Y] = ρ(Y). Monotonicity does hold.
The Tail Value at Risk is coherent.
6.7. ρ(X) = E[X].

1. ρ(X + c) = E[X + c] = E[X] + c = ρ(X) + c. Translation Invariance does hold
2. ρ(cX) = E[cX] = c E[X] = c ρ(X). Positive Homogeneity does hold.
3. ρ(X + Y) = E[X + Y] = E[X] + E[Y] = ρ(X) + ρ(Y) ≤ ρ(X) + ρ(Y). Subadditivity does hold.
4. If Prob[X ≤ Y] = 1, then ρ(X) = E[X] ≤ E[Y] = ρ(Y). Monotonicity does hold.
This risk measure is coherent.
6.8. ρ(X) = Max[X].

1. ρ(X + c) = Max[X + c] = Max[X] + c = ρ(X) + c. Translation Invariance does hold
2. ρ(cX) = Max[cX] = c Max[X] = c ρ(X). Positive Homogeneity does hold.
3. ρ(X + Y) = Max[X + Y] ≤ Max[X] + Max[Y] ≤ ρ(X) + ρ(Y). Subadditivity does hold.
4. If Prob[X ≤ Y] = 1, then ρ(X) = Max[X] ≤ Max[Y] = ρ(Y). Monotonicity does hold.
This risk measure is coherent.
6.9. ρ(X) = ln[E[eαX]]/α.
1. ρ(X + c) = ln[E[eα(X+c)]]/α = ln[eαcE[eαX]]/α = {αc + ln[E[eαX]]}/α = ρ(X) + c.

2. If X is Normal, ρ(X) = µ + ασ2/2. cX is Normal with parameters cµ and cσ.
ρ(cX) = c µ + α(cλσ)2/2 ≠ cρ(X). Positive Homogeneity does not hold.
3. ρ(X + Y) ≤ ρ(X) + ρ(Y). ⇔ ln[E[eα(X+Y)]] ≤ ln[E[eαX]] + ln[E[eαY]].
⇔ E[eαX eαY] ≤ E[eαX] E[eαY]. ⇔ Cov[eαX, eαY] ≤ 0. However, this covariance can be positive.
For example, take the following joint distribution for X and Y:
Then for X, Prob[X = 0] = 92%, Prob[X = 1] = 8%. ρ(X) = ln(.92 + .08eα).
For Y, Prob[Y = 0] = 92%, Prob[Y = 1] = 8%. ρ(Y) = ln(.92 + .08eα).

For X + Y, Prob[X + Y = 0] = 88%, Prob[X + Y = 1] = 8%, Prob[X + Y = 2] = 4%.
ρ(X + Y) = ln(0.88 + 0.08eα + 0.04e2α).
For example, for α = 2, ρ(X) = ρ(Y) = ln(0.92 + 0.08e2 )/2 = 0.206.
ρ(X + Y) = ln(0.88 + 0.08e2 + 0.04e4 )/2 = 0.648.
ρ(X + Y) = 0.648 > 0.412 = ρ(X) + ρ(Y). Subadditivity does not hold.
4. If Prob[X ≤ Y] = 1, then for α > 0, eαX ≤ eαY. ⇒ E[eαX] ≤ E[eαY]. ⇒ ln[E[eαX]] ≤ ln[E[eαY]].
⇒ ρ(X) ≤ ρ(Y). Monotonicity does hold.

The Exponential Premium Principle is not coherent.
2016-C-4, Risk Measures §7 Using Simulation, HCM 10/21/15, Page 86
Section 7, Using Simulation60
For a general discussion of simulation see “Mahlerʼs Guide to Simulation.” Here I will discuss using
the results of a simulation of aggregate losses to estimate risk measures.
Simulating aggregate losses could be relatively simple, for example if one assumes that
aggregate losses are LogNormal. On the other hand, it could involve a very complicated simulation
model of a property/casualty insurer with many different lines of insurance whose results are not
independent, with complicated reinsurance arrangements, etc.61
Here we will not worry about how the simulation was performed. Rather we will be given a large
simulated sample. For example, let us assume we have simulated from the distribution of
aggregate losses the following sample of size 100, arranged from smallest to largest:62
13, 19, 20, 25, 25, 31, 35, 35, 37, 39, 43, 48, 49, 51, 53, 55, 65, 68, 69, 75, 75, 79, 81, 84, 86,
87, 88, 90, 90, 94, 97, 112, 121, 128, 129, 132, 133, 133, 134, 137, 137, 138, 141, 142, 143,
144, 145, 145, 150, 150, 161, 166, 171, 186, 187, 191, 191, 206, 212, 212, 222, 226, 228,
228, 239, 250, 252, 270, 272, 274, 303, 315, 317, 319, 321, 322, 326, 340, 352, 356, 362,
365, 373, 388, 415, 434, 455, 456, 459, 516, 560, 638, 691, 762, 906, 1031, 1456, 1467,
1525, 2034.
Mean and Standard Deviation:
The sum of this sample is 27,305. X = 27,305/100 = 273.05.

The sum of the squares of the sample is 18,722,291. The estimated 2nd moment is 187,223.
Therefore, the sample variance is: (187,223 - 273.052 )(100/99) = 113,805.
Exercise: Estimate the appropriate premium using the Standard Deviation Premium Principle with k
= 0.5.
[Solution: 273.05 + (0.5)( 113,805 ) = 442.]
60
61
See for example, The Dynamic Financial Analysis Call Papers in the Spring 2001 CAS Forum.
62
In practical applications, one would usually simulate a bigger sample, such as size 1000 or 10,000.
Estimating Value at Risk:
Here, in order to estimate the pth percentile, Loss Models takes the value in the sample
corresponding to: 1 + the largest integer in Np.63
For a sample of size 100, VaR0.90 is estimated as:

[(100)(0.9)] + 1 = 91st value from smallest to largest.
Exercise: For the previous sample of size 100, estimate VaR0.80.

[Solution: Take 1 + the largest integer in: (0.80)(100) = 80.
So we take the 81st element in the sample from smallest to largest: 362.]
In general, let [x] be the greatest integer contained in x.

[7.2] = 7. [7.6] = 7. [8.0] = 8.
VaRp is estimated as the [Np] + 1 value from smallest to largest.

VaRp = L[ N p ] + 1.
Using a Series of Simulations: 64
Loss Models does not discuss how to estimate the variance of this estimate of VaR0.80.
One way would be through a series of simulations.
One could repeat the simulation that resulted in the previous sample of 100, and get a new
sample of size 100. Using the original sample the estimate of VaR0.80 was 362. Using the new
sample, the estimate of VaR0.80 would be somewhat different. Then we could proceed to
simulate a third sample, and get a third estimate of VaR0.80.
We could produce for example 500 different samples and get 500 corresponding estimates of
VaR0.80. Then the mean of these 500 estimates of VaR0.80, would be a good estimate of
VaR0.80. The sample variance of these 500 estimates of VaR0.80, would be an estimate of the
variance of any of the individual estimates of VaR0.80. However, the variance of the average of
these 500 estimates of VaR0.80 would be the sample variance divided by 500.65
63
This differs from the smoothed empirical estimate of πp which is the p(N+1) loss from smallest to largest, linearly
interpolating between two loss amounts if necessary. See “Mahlerʼs Guide to Fitting Loss Distributions.”
64
“Mahlerʼs Guide to Simulation” has many examples of simulation experiments.
See especially the section on Estimating the p-value via Simulation.
65
The variance of an average is the variance of a single draw, divided by the number of items being averaged.
Estimating Tail Value at Risk:
One can estimate TVaRp as an average of the worst outcomes of a simulated sample.
For a sample of size 100, VaR0.90 is estimated as:

[(100)(0.9)] + 1 = 91st value from smallest to largest.
For a sample of size 100, in order to estimate TVaR0.90, take an average of the 10 largest values.
Average the values starting at the 91st.
For the previous sample of size 100, the 91st value is 560, the estimate of π0.90.
We could estimate TVaR90% as the average of the 10 largest values in the sample:
(560 + 638 + 691 + 762 + 906 + 1031 + 1456 + 1467 + 1525 + 2034)/10 = 1107.
In general, let [x] be the greatest integer contained in x.

TVaRp is estimated as the average of the largest values in the sample,
starting with the [Np] + 1 value from smallest to largest.66
Exercise: For the previous sample of size 100, estimate TVaR95%.

[Solution: [(100)(0.95)] + 1 = 96. (1031 + 1456 + 1467 + 1525 + 2034)/5 = 1502.6.
Comment: For a small sample such as this, and a large p such as 95%, the estimate of the
TVaR95% is subject to a lot of random fluctuation.]
66
There are other similar estimators that would also be reasonable.
Variance of the Estimate of the Tail Value at Risk:
In general a variance can be divided into two pieces.67

Conditioning the estimate of TVaRp on π^p , the estimator of πp :
^ ^ ^
Var[ TVaRp ] = E[Var[ TVaRp | π^p ]] + Var[E[ TVaRp | π^p ]].
This leads to the following estimate of the variance of the estimate of TVaRp :68
^
{sp 2 + p( TVaRp - π^p ) 2 }/{N - [Np]},
where sp 2 is the sample variance of the worst outcomes used to estimate TVaRp .
For the previous sample of size 100, TVaR90% was estimated as an average of the 10 largest
values in the sample:
(560 + 638 + 691 + 762 + 906 + 1031 + 1456 + 1467 + 1525 + 2034)/10 = 1107.
The sample variance of these 10 worst outcomes is:

{(560 - 1107)2 + (638 - 1107)2 + (691 - 1107)2 + (762 - 1107)2 + (906 - 1107)2 +
(1031 - 1107)2 + (1456 - 1107)2 + (1467 - 1107)2 + (1525 - 1107)2 + (2034 - 1107)2 }/9 =
238,098.
Thus the estimate of the variance of this estimate of TVaR90% is:

{238,098 + (0.9)(1107 - 560)2 }/(100 - 90) = 50,739.
Thus the estimate of the standard deviation of this estimate of TVaR90% is: 50,739 = 712.
67
As discussed in “Mahlerʼs Guide to Buhlmann Credibility.”
68
The first term is the EPV, while the second term is the VHM.
The first term dominates for a heavier-tailed distribution, while the second term is significant for a lighter-tailed
distribution.
Although Loss Models does not explain how to derive the second term, it does cite “Variance of the CTE
Estimator”, by B. John Manistre and Geoffrey H. Hancock, April 2005 NAAJ. The derivation in Manistre and
Hancock uses the information matrix and the delta method, which are discussed in “Mahlerʼs Guide to Fitting Loss
Distributions.”
Problems:
For the next six questions, you simulate the following 35 random values from a distribution:
6 7 11 14 15 17 18 19 25 29
30 34 38 40 41 48 49 53 60 63
78 103 124 140 192 198 227 330 361 421
514 546 750 864 1638
7.1 (1 point) What is the estimate of VaR0.9?

7.2 (1 point) What is the estimate of VaR0.7?

A. 140 B. 192 C. 198 D. 227 E. 330
7.3 (2 points) What is the estimate of TVaR0.9?

A. 900 B. 950 C. 1000 D. 1050 E. 1100
7.4 (3 points) What is the variance of the estimate in the previous question?
A. 88,000 B. 90,000 C. 92,000 D. 94,000 E. 96,000
7.5 (2 points) What is the estimate of TVaR0.7?

A. 400 B. 450 C. 500 D. 550 E. 600
7.6 (3 points) What is the variance of the estimate in the previous question?
A. 22,000 B. 24,000 C. 26,000 D. 28,000 E. 30,000

One hundred values of the annual earthquake losses in the state of Allshookup for the
Presley Insurance Company have been simulated, and ranked from smallest to largest:
57, 72, 98, 128, 151, 160, 163, 171, 203, 218,
257, 262, 267, 301, 323, 327, 337, 372, 397, 401,
441, 447, 454, 464, 491, 498, 500, 509, 512, 520,
522, 523, 526, 530, 531, 553, 554, 565, 620, 632,
633, 637, 641, 648, 660, 666, 678, 685, 695, 708,
709, 728, 732, 782, 810, 826, 851, 858, 862, 871,
890, 903, 942, 947, 955, 976, 984, 992, 1016, 1023,
1024, 1027, 1041, 1047, 1048, 1050, 1055, 1055, 1057, 1062,
1076, 1081, 1088, 1117, 1131, 1148, 1192, 1220, 1253, 1270,
1329, 1398, 1406, 1537, 1578, 1658, 1814, 1909, 2431, 2702.
7.7 (1 point) Estimate VaR0.9.

A. 1192 B. 1220 C. 1253 D. 1270 E. 1329
7.8 (2 points) Estimate TVaR0.95.

A. 1800 B. 1900 C. 2000 D. 2100 E. 2200
7.9 (3 points) Estimate the standard deviation of the estimate made in the previous question.
A. 240 B. 260 C. 280 D. 300 E. 320
7.10 (1 point) XYZ Insurance Company wrote a portfolio of medical professional liability
insurance. 100 scenarios were simulated to model the aggregate losses.
The 10 worst results of these 100 scenarios are (in $ million):
104, 132, 132, 143, 152, 183, 131, 126, 191, 117.
Estimate the 95% Tail Value at Risk.

One thousand values of aggregate annual losses net of reinsurance have been simulated.
They have been ranked from smallest to largest, and here are the largest 100:
3985, 4239, 4521, 4705, 4875, 5220, 5239, 5294, 5384, 5503,
5514, 5581, 5601,5630, 5735, 5756, 5823, 5872, 5902, 5909,
5945, 6004, 6038, 6085, 6204, 6249, 6265, 6270, 6326, 6338,
6371, 6378, 6398, 6402, 6457, 6533, 6548, 6667, 6679, 6688,
6822, 6920, 6994, 7004, 7039, 7050, 7100, 7126, 7126, 7128,
7129, 7133, 7208, 7250, 7317, 7317, 7352, 7361, 7377, 7466,
7467, 7468, 7470, 7472, 7527, 7534, 7538, 7544, 7547, 7547,
7578, 7607, 7613, 7651, 7663, 7712, 7757, 7771, 7785, 7823,
7849, 7865, 7878, 7880, 7906, 7923, 7928, 7941, 7955, 7963,
7976, 7979, 8011, 8021, 8032, 8034, 8052, 8065, 8089, 8116.

A. 7128 B. 7129 C. 7133 D. 7208 E. 7250

A. 3985 B. 4239 C. 4521 D. 4705 E. 4875
7.13 (2 points) Estimate TVaR0.99.

A. 7600 B. 7700 C. 7800 D. 7900 E. 8000
7.14 (3 points) Estimate the standard deviation of the estimate made in the previous question.
A. 20 B. 22 C. 24 D. 26 E. 28
7.1. A. VaR0.9 is estimated as the [(0.90)(35)] + 1 = 32nd value from smallest to largest: 546.
7.2. B. VAR0.7 is estimated as the [(0.70)(35)] + 1 = 25th value from smallest to largest: 192.
7.3. B. [(0.90)(35)] + 1 = 32.

Estimate TVaR0.9 as the average of the worst outcomes starting with the 32nd value.
(546 + 750 + 864 + 1638)/4 = 949.5.
7.4. D. [Np] + 1 = [(0.90)(35)] + 1 = 32.

The 32nd element from smallest to largest is 546, the estimate of π0.9.
sp 2 is the sample variance of the worst outcomes used to estimate TVaR0.95:
{(546 - 949.5)2 + (750 - 949.5)2 + (864 - 949.5)2 + (1638 - 949.5)2 }/3 = 227,985.
The variance of the estimate of TVaRp : {sp 2 + p(TVaRp - πp )2 }/{N - [Np]}
= {227,985 + (0.9)(949.5 - 546)2 }/(35 - 31) = 93,629.
7.5. D. [(0.70)(35)] + 1 = 25.

Estimate TVaR0.7 as the average of the worst outcomes starting with the 25th value.
(192 + 198 + 227 + 330 + 361 + 421 + 514 + 546 + 750 + 864 + 1638)/11 = 549.2.
7.6. B. [Np] + 1 = [(0.60)(35)] + 1 = 25.

The 25th element from smallest to largest is 192, the estimate of π0.7.
{(192 - 549.2)2 + (198 - 549.2)2 + (227 - 549.2)2 + (330 - 549.2)2 + (361 - 549.2)2 +
(421 - 549.2)2 + (514 - 549.2)2 + (546 - 549.2)2 + (750 - 549.2)2 + (864 - 549.2)2
+ (1638 - 549.2)2 }/10 = 178,080.
= {178,080 + (0.7)(549.2 - 192)2 }/(35 - 24) = 24,309.

Comment: The variance of the estimate of TVaR0.7 is much smaller than the variance of the
estimate of TVaR0.9. It is easier to estimate the Tail Value at Risk at a smaller value of p than a
larger value of p; it is hard to estimate what is going on in the extreme righthand tail.
7.7. E. [Np] + 1 = [(100)(0.90)] + 1 = 91.

The 91st element from smallest to largest is: 1329.
7.8. D. [Np] + 1 = [(100)(0.95)] + 1 = 96.

Average the 96th to 100th values:
(1658 + 1814 + 1909 + 2431 + 2702)/5 = 2102.8.
7.9. C. [Np] + 1 = [(100)(0.95)] + 1 = 96.

The 96th element from smallest to largest is 1658, the estimate of π0.95.
sp 2 is the sample variance of the worst outcomes used to estimate TVaR0.95: {(1658 - 2102.8)2
+ (1814 - 2102.8)2 + (1909 - 2102.8)2 + (2431 - 2102.8)2 + (2702 - 2102.8)2 }/4 = 196,392.
The variance of the estimate of TVaRp : {sp 2 + p(TVaRp - πp )2 } / {N - [Np]}
= {196,392 + (0.95)(2102.8 - 1658)2 } / (100 - 95) = 76,869.

The standard deviation is: 76,869 = 277.
7.10. [Np] + 1 = [(100)(0.95)] + 1 = 96.

We average the 96th, 97th, 98th, 99th, 100th values:
(132 + 143 + 152 + 183 + 191) / 5 = $160.2 million.
Comment: TVaRp is estimated as the average of the largest values in the sample,
starting with the [Np] + 1 value from smallest to largest.
7.11. B. [Np] + 1 = [(1000)(0.95)] + 1 = 951.

The 951st value is: 7129.
7.12. A. [Np] + 1 = [(1000)(0.90)] + 1 = 901.

The 901st value is: 3985.
7.13. E. [Np] + 1 = [(1000)(0.99)] + 1 = 991.

Average the 991st to the 1000th values:
(7976 + 7979 + 8011 + 8021 + 8032 + 8034 + 8052 + 8065 + 8089 + 8116)/10 = 8037.5.
7.14. C. [Np] + 1 = [(1000)(0.99)] + 1 = 991.

The 991st element from smallest to largest is 7976, the estimate of π0.99.
{(7976 - 8037.5)2 + (7979 - 8037.5)2 + (8011 - 8037.5)2 + (8021 - 8037.5)2 +

(8032 - 8037.5)2 + (8034 - 8037.5)2 + (8052 - 8037.5)2 + (8065 - 8037.5)2
+ (8089 - 8037.5)2 + (8116 - 8037.5)2 }/9 = 2000.
= {2000 + (.99)(8037.5 - 7976)2 }/(1000 - 990) = 574.4.

The standard deviation is: 574.4 = 24.0.
2016-C-4, Risk Measures §8 Important Ideas, HCM 10/21/15, Page 96
Section 8, Important Ideas and Formulas
Introduction (Section 1):
A risk measure is defined as a functional mapping of a loss distribution to the real numbers.
ρ(X) is the notation used for the risk measure.
Premium Principles (Section 2):
Expected Value Premium Principle: ρ(X) = (1 + k)E[X], k > 0.
Standard Deviation Premium Principle: ρ(X) = E[X] + k Var[X] , k > 0.
Variance Premium Principle: ρ(X) = E[X] + k Var[X], k > 0.
Value at Risk (Section 3):
Value at Risk, VaRp (X), is defined as the 100pt h percentile.
VaRp (X) = π p .
In Appendix A of the Tables attached to the exam, there are formulas for VaRp(X) for a
many of the distributions: Exponential, Pareto, Single Parameter Pareto, Inverse Pareto,
Inverse Weibull, Burr, Inverse Burr, Inverse Exponential, Paralogistic, Inverse Paralogistic.
Tail Value at Risk (Section 4):
TVaRp (X) ≡ E[X | X > πp ] = πp + e(πp ) = πp + (E[X] - E[X ∧ πp ]) / (1 - p).
The corresponding risk measure is: ρ(X) = TVaRp (X).
TVaRp (X) ≥ VaRp (X). TVaR0 (X) = E[X]. TVaR1 (X) = Max[X].
In Appendix A, there are formulas for TVaRp (X) for a few of the distributions:
Exponential, Pareto, Single Parameter Pareto.
2016-C-4, Risk Measures §8 Important Ideas, HCM 10/21/15, Page 97
For the Normal Distribution: TVaRp (X) = µ + σ φ[zp ] / (1 - p).
Coherence (Section 6):
A risk measure is coherent if it has the following four properties:

1. Translation Invariance ρ(X + c) = ρ(X) + c, for any constant c..
2. Positive Homogeneity ρ(cX) = c ρ(X), for any constant c > 0.
3. Subadditivity ρ(X + Y) ≤ ρ(X) + ρ(Y).
4. Monotonicity If Prob[X ≤ Y] = 1, then ρ(X) ≤ ρ(Y).
The Tail Value at Risk is a coherent measure of risk.
The Standard Deviation Premium Principle and the Value at Risk

are not coherent measures of risk.
Using Simulation (Section 7):
Let [x] be the greatest integer contained in x.
VaRp is estimated as the [Np] + 1 value from smallest to largest.
TVaRp is estimated as the average of the largest values in the sample,

starting with the [Np] + 1 value from smallest to largest.
Estimate of the variance of the estimate of TVaRp :

^
{sp 2 + p( TVaRp - π^p )2 }/{N - [Np]},
where sp 2 is the sample variance of the worst outcomes used to estimate TVaRp .
Mahlerʼs Guide to
Fitting Frequency Distributions
Exam C
prepared by
Study Aid 2016-C-5
Howard Mahler
hmahler@mac.com
2016-C-5, Fitting Frequency Distributions, HCM 10/27/15, Page 1
Mahlerʼs Guide to Fitting Frequency Distributions

This Study Aid will review what a student needs to know about fitting frequency distributions in
Loss Models.1
Information in bold or sections whose title is in bold are more important for passing the exam. Larger
bold type indicates it is extremely important. Information presented in italics (or sections whose title
is in italics) should not be needed to directly answer exam questions and should be skipped on first
reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in
practical applications.

Note that problems include both some written by me and some from past exams.2 The latter are
copyright by the Casualty Actuarial Society and the SOA and are reproduced here solely to aid
students in studying for exams.3

1 3-5 Introduction
2 6-28 Method of Moments
3 29-88 Method of Maximum Likelihood
4 89-153 Chi-Square Test
5 154-172 Likelihood Ratio Test
6 173-188 Fitting to the (a, b, 1) class
7 189-191 Important Formulas and Ideas
1
CAS Exam 3 and SOA Exam M used to include material preliminary to that on joint Exam 4/C, which is summarized
in the Introduction to this study guide and covered in more detail in “Mahlerʼs Guide to Frequency Distributions.”
2
In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus. In some cases
3
The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for
their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you
in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to
produce quality exams.
2016-C-5, Fitting Frequency Distributions, HCM 10/27/15, Page 2
Course 4 Exam Questions by Section of this Study Aid4
Section Sample 5/00 11/00 5/01 11/01 11/02 11/03 11/04
1
2 8
3 3 6
4 29 19 25 28 16
5 20
6
Section 5/05 11/05 11/06 5/07
1
2 40
3 29 12 15 18
4 10
5
6
The CAS/SOA did not release the 5/02, 5/03, 5/04, 5/06, 11/07, 5/08, 11/08, and 5/09 exams.
Fitting to the (a, b, 1) distributions was added to the syllabus for the Fall 2009 exam.
4
2016-C-5, Fitting Frequency § 1 Introduction, HCM 10/27/15, Page 3
One can be asked many of the same questions about fitting Frequency Distributions as one can
about fitting size of Loss Distributions. However, there many fewer questions asked on the exam
about fitting Frequency Distributions than about fitting Loss Distributions.
The two methods of fitting one needs to know how to apply to Frequency Distributions are the
Method of Moments and the Method of Maximum Likelihood. In addition one has to know how to
apply the Chi-Square statistic to test a fit and how to use the loglikelihood to compare fits.
All of these topics are covered in more detail in “Mahlerʼs Guide to Fitting Loss Distributions.”
The members of the (a, b, 0) class of frequency distributions are: Binomial, Poisson, and Negative
Binomial. This includes special cases such as the Bernoulli (Binomial with m = 1),
Binomial with m fixed, Geometric (Negative Binomial with r = 1), and Negative Binomial for r fixed.
Here is a summary of each of the three common frequency distributions.5
Binomial Distribution
Support: x = 0, 1, 2, 3..., m. Parameters: 1 > q > 0, m ≥ 1. m = 1 is a Bernoulli Distribution.
m! qx (1- q)m - x ⎛m⎞ x

P. d. f. : f(x) = = ⎜ ⎟ q (1- q )m-x.
x! (m- x)! ⎝x ⎠
Mean = mq Variance = mq(1-q)

Mode = largest integer in mq + q (if mq + q is an integer, then f(mq + q) = f(mq + q- 1)
and both mq + q and mq + q - 1 are modes.)
Probability Generating Function: P(z) = {1 + q(z-1)}m.
f(x + 1) b −q q
f(0) = (1-q)m. =a + , a= , b = (m+ 1) .
f(x) x +1 1− q 1− q
The sum of m independent Bernoullis with the same q is a Binomial with parameters
m and q. The sum of two independent Binomials with parameters (m1 , q) and (m2 , q) is also
Binomial with parameters m1 + m2 , and q.
5
See “Mahlerʼs Guide to Frequency Distributions”, or Loss Models.
Poisson Distribution
Support: x = 0, 1, 2, 3... Parameters: λ > 0
λx e - λ
P. d. f. : f(x) =
x!
Mean = λ Variance = λ
Mode = largest integer in λ (if λ is an integer then f(λ) = f(λ−1) and both λ and λ-1 are modes.)
f(x + 1) b
f(0) = e−λ. =a + , a = 0, b = λ .
f(x) x +1
λ 2 is also Poisson, with parameter λ1 + λ2 .
If frequency is given by a Poisson and severity is independent of frequency, then the

This is an example of thinning a Poisson Distribution. If claims are from a Poisson
Process, and one divides these claims into subsets in a manner independent of the
frequency process, then the claims in each subset are independent Poisson Processes.
Negative Binomial Distribution
Support: x = 0, 1, 2, 3... Parameters: β > 0, r ≥ 0. r = 1 is a Geometric Distribution.
r(r + 1)...(r + x - 1) βx ⎛ x+ r − 1⎞ βx
P. d. f. : f(x) = = ⎜ ⎟ .
x! (1+ β )x + r ⎝ x ⎠ (1+ β) x + r
Mean = rβ Variance = rβ(1+β)
Mode = largest integer in (r-1)β. (if (r-1)β is an integer, then both (r-1)β and (r-1)β - 1 are modes.)
Probability Generating Function: P(z) = {1- β(z-1)}-r.
f(x + 1) b β β
f(0) = 1/ (1+β)r. =a + ,a= , b = (r − 1) .
f(x) x +1 1+ β 1+ β
The sum of r independent Geometric Distributions with the same β, is a Negative Binomial
Distribution with parameters r and β.
X is Negative Binomial(r1 , β) and Y is Negative Binomial(r2 , β), X and Y independent,
then X + Y is Negative Binomial(r1 + r2 , β).

2016-C-5, Fitting Frequency § 2 Method of Moments, HCM 10/27/15, Page 6
Section 2, Method of Moments
One can fit a type of distribution to data via the Method of Moments, by finding that set of
parameters such that the moments (about the origin) match the observed moments. If one has a
single parameter, such as in the case of the Poisson Distribution, then one matches the observed
mean to the theoretical mean of the distribution. In the case of two parameters, one matches the first
two moments, or equivalently one matches the mean and variance.
Poisson Distribution:
Fitting via the method of moments is easy for the Poisson, one merely sets the single parameter
^
λ equal to the observed mean, λ = X .
Exercise: Assume one has observed insureds and gotten the following distribution of insureds by
number of claims:
Number of Claims 0 1 2 3 4 5 6 7 8 All
Number of Insureds 17649 4829 1106 229 44 9 4 1 1 23872
Fit this data to a Poisson Distribution via the Method of Moments.
[Solution: By taking the average value of the number of claims, one can calculate that the first
moment is: X =
(0)(17,649) + (1)(4829) + (2)(1106) + (3)(229) + (4)(44) + (5)(9) + (6)(4) + (7)(1) + (8)(1)
23,872
= 0.3346. Thus the fitted Poisson has λ = 0.3346.]
As discussed in the following section, for the Poisson the Method of Maximum Likelihood
equals the Method of Moments.
Binomial Distribution, m fixed:
If as is common when fitting a Binomial, m is taken as fixed, then there is only one parameter q, and
^
solving via the method of moments for the Binomial is easy: q = X / m.
As discussed in the following section, for m fixed, this is also the Method of Maximum Likelihood
solution for q.
Exercise: Assume one has observed insureds and got the following distribution of insureds by
number of claims:
Number of Claims 0 1 2 3 4 5 6 7&+ All
Number of Insureds 208 357 274 126 31 3 1 0 1000
Fit this data to a Binomial Distribution with m = 6 via the Method of Moments.
moment is: X = 1428/1000 = 1.428. Then the fitted q = 1.428/6 = 0.238.
Comment: If this data is from a Binomial Distribution, then we know that m ≥ 6, since for the Binomial
Distribution x ≤ m.]
Binomial Distribution:
Fitting via the method of moments for the Binomial Distribution consists of writing two equations for
the two parameters m and q in order to match the mean and variance.
Mean = X = mq
Variance = E[X2 ] - X 2 = mq(1-q)

Solving for m and q gives:
X2
m=
X 2 + X - E[X2]
q= X /m
Since m has to be integer, one usually needs to round the fitted m to the nearest integer and then
take q = X / m. (After rounding m, the observed and fitted second moments will no longer be
equal.)
Exercise: Assume one has observed insureds and got the following distribution of insureds by
number of claims:
Fit this data to a Binomial Distribution via the Method of Moments.
moment is: 1428/1000 = 1.428. By taking the average value of the square of the number of claims,
one can calculate that the 2nd moment is: 3194 /1000 = 3.194 .
X2
Thus the fitted Binomial has m = = 1.4282 / (1.4282 + 1.428 - 3.194) = 7.465.
X 2 + X - E[X2]
Rounding m to the nearest integer, m = 7. Then q = 1.428/7 = 0.204.]
One can compare the fitted via method of moments Binomial distribution with m = 7 and
q = 0.204 to the observed distribution:
Number of Method of Moments

Claims Observed Binomial
0 208 202.5
1 357 363.3
2 274 279.3
3 126 119.3
4 31 30.6
5 3 4.7
6 1 0.4
7 0 0.01
Sum 1000.0 1000.0
For example, f(1) = 7(1 - 0.204)6 (0.204) = 0.36325. 1000 f(1) = 363.3.
The fit via method of moments seems to be a reasonable first approximation.

How to test the fit will be discussed in a subsequent section.
Negative Binomial, r fixed:
If one takes r as fixed in the Negative Binomial, then solving via the method of moments is
^
straightforward: X = rβ ⇒ β = X / r.
Exercise: For r = 1.5, fit the following data to a Negative Binomial Distribution via the methods of
moments.
[Solution: X = 0.3346. The fitted β = X /r = 0.3346 / 1.5 = 0.223.]
As discussed in the following section, for r fixed, this is also the Method of Maximum Likelihood
solution for β.
Negative Binomial:
Assume one has the following distribution of insureds by number of claims:

By taking the average of the number of claims, as was calculated, the first moment is 0.3346.
By taking the average of the square of the number of claims observed for each insured, one can
calculate that the second moment is:
(02)(17,649) + (12 )(4829) + (22 )(1106) + (32 )(229)+ (42 )(44)+ (52)(9) + (62 )(4) + (72 )(1) + (82)(1)
=
23,872
0.5236. Thus the estimated variance is: 0.5236 - 0.33462 = 0.4116.
Since the estimated variance is greater than the estimated mean, it might make sense to fit a
Negative Binomial Distribution to this data. Using the method of moments one would try to match
the first two moments by fitting the two parameters of the Negative Binomial Distribution r and β.
Exercise: Fit the above data to a Negative Binomial Distribution via the Method of Moments.
[Solution: One can write down two equations in two unknowns, by matching the mean and the
Variance
variance: X = rβ. Variance = rβ(1+β). ⇒ 1 + β = = 0.4116/ 0.3346 = 1.230.
X
⇒ β = 0.230. ⇒ r = X /β = 0.3346 / 0.230 = 1.455.]

In general for the Negative Binomial, the method of moments consists of writing two equations for
the two parameters r and β by matching the first two moments, or equivalently, one can match the
mean and variance: X = rβ. E[X2 ] - X 2 = rβ(1+β).
^ E[X2] - X 2 - X ^r X2 ^
The solution is: β = . = = X / β.
X E[X2] - X2 - X
One can compare the Negative Binomial distribution fitted via method of moments with
β = 0.230 and r = 1.455, to the observed distribution:
Number of Observed Method of Moments

Claims Negative Binomial
0 17,649 17,663.5
1 4,829 4,805.8
2 1,106 1,103.1
3 229 237.6
4 44 49.5
5 9 10.1
6 4 2.0
7 1 0.4
8 1 0.1
9 0 0.0
10 0 0.0
Sum 23,872.0 23,872.0
The fit via method of moments seems to be a reasonable first approximation.

How to test the fit will be discussed in a subsequent section.
Variance of an Estimated Mean:
Let us assume we have X1 , X2 and X3 , three independent, identically distributed variables.

Since they are independent, their variances add, and since they are identical they each have the
same variance, Var[X]:
Var[X1 + X2 + X3 ] = Var[X1 ] + Var[X2 ] + Var[X3 ] = 3Var[X].
Let the estimated mean be: X = (X1 + X2 + X3 )/3.
Var[ X ] = Var[(X1 + X2 + X3 )/3] = Var[X1 + X2 + X3 ]/32 = (3Var[X])/32 = Var[X]/3.
Exercise: We generate four independent random variables, each from a Poisson with λ = 5.6.
What is the variance of their average?
[Solution: Var[ X ] = Var[(X1 + X2 + X3 + X4 )/4] = Var[X1 + X2 + X3 + X4 ]/42 = (4Var[X])/42 =
Var[X]/4 = 5.6/4 = 1.4.]
For n independent, identically distributed variables:

n n n
Var[ X ] = Var[(1/n) ∑Xi ] = Var[ ∑ Xi ]/n2 = ∑Var[Xi] /n2 = n Var[X]/n2 = Var[X]/n.
i=1 i=1 i=1
Var[ X ] = Var[X] / n.
The variance of an average declines as 1/(the number of data points).

Variance of Estimated Parameters:
Previously, we fit a Poisson Distribution to 23,872 observations via the Method of Moments and
obtained a point estimate λ = .3346. Then one might ask how reliable this estimate is, assuming this
data actually came from a Poisson Distribution. In other words, what is the variance one would
expect in this estimate solely due to random fluctuation in the data set?
^ ^
Using the Method of Moments, we set λ = X . Therefore Var[ λ ] = Var[ X ].
Thus in this case we have reduced the problem of calculating the variance of the estimated λ to that
of estimating the variance of the estimated mean.
For X from a Poisson Distribution, with λ = 0.3346: Var(X) = λ = 0.3346.
^ ^
Therefore, Var[ λ ] = Var[ X ] = Var[X]/n = λ / n = 0.3346 / 23,872 = 0.00001402.
The standard deviation of the estimated λ is thus 0.00374. Using plus or minus 1.96 standard
^
deviations, an approximate 95% confidence interval for λ is: 0.3346 ± 0.0073.6
The larger the data set, the larger n, and the smaller the variance of the estimate of λ.
Exercise: A Binomial Distribution with fixed m = 6 has been fit to 1000 observations via the Method
of Moments. The resulting estimate of q^ = 0.238. What is an approximate 95% confidence interval
for q^ ?
[Solution: q^ = X /6. Var[ q^ ] = Var[ X ]/36 = (Var[X]/1000)/36 = Var[X]/36,000 =

6(q)(1-q) / 36,000 = (0.238)(1 - 0.238)/6000 = 0.00003023. The Standard Deviation is:
0.00003023 = 0.0055. Thus using plus or minus 1.96 standard deviations, an approximate 95%
confidence interval for q^ is: 0.238 ± 0.011.]
6
Since for a Poisson, the Method of Moments is equal to the Method of Maximum Likelihood, this is also the interval
estimate from the Method of Maximum Likelihood.
Exercise: Assume we have fit 23,872 observations via the Method of Moments to a Negative
^
Binomial Distribution with fixed r = 1.5 and obtained a point estimate β = 0.223.
^
What is an approximate 95% confidence interval for β ?
^ ^ Var[X]/ 23,872 1.5 β (1+ β) (0.223) (1.223)
[Solution: β = X /1.5. Var[ β] = Var[ X ]/2.25 = = =
2.25 (2.25) (23,872) (1.5) (23,872)
= 0.00000762. Standard Deviation is: 0.00000762 = 0.00276.

^
Thus an approximate 95% confidence interval for β is: 0.223 ± 0.005.
Comment: The confidence interval is so narrow due to the large amount of data. Its width goes down
as the inverse of the square root of the amount of data.]
Variance of Functions of the Parameters:
Exercise: Assume we have fit a Negative Binomial Distribution with fixed r = 1.5 to 23,872
^
observations via the Method of Moments and obtained a point estimate β = 0.223.
What is an estimate of the variance of the frequency process?
[Solution: rβ(1+β) = (1.5)(.223)(1.223) = 0.409.]
∂h
Assume we have a function of the estimated parameter θ^ , h(θ). Then Δh(θ) ≅ (Δθ).
∂θ
⎛ ∂h ⎞ 2 ⎛ ∂h ⎞ 2 ⎛ ∂h ⎞ 2
Therefore, Var[h] ≅ (Δh(θ))2 ≅ ⎜ ⎟ (Δθ)2 ≅ ⎜ ⎟ Var[ θ^ ]. Thus, Var[h(θ)] ≅ ⎜ ⎟ Var[ θ^ ].7
⎝ ∂θ ⎠ ⎝ ∂θ ⎠ ⎝ ∂θ ⎠
Exercise: Assume we have fit a Negative Binomial Distribution with fixed r = 1.5 to 23,872
observations via the Method of Moments and obtained a point estimate β = 0.223.
What is an approximate 95% confidence interval for the variance of the frequency process?
∂h
[Solution: h(β) = rβ(1 + β) = (1.5)(β + β2). = 1.5(1 + 2β) = (1.5){1 + (2)(0.223)} = 2.169.
∂β
⎛ ∂h ⎞ 2 ^
Thus Var[h(β)] ≅ ⎜ ⎟ (Var[ β]) = 2.1692 (0.00000762).
⎝ ∂θ ⎠
Standard Deviation = (2.169)(0.00276) = 0.00599. Thus an approximate 95% confidence interval
for the variance of the frequency process is: 0.409 ± 0.012.]
7
This is a special case of the delta method. See “Mahlerʼs Guide to Fitting Loss Distributions.”
Problems:

Over one year, the following claim frequency observations were made for a group of 1,000 policies:
# of Claims # of Policies
0 800
1 180
2 19
3 1
2.1 (2 points) You fit a Poisson Distribution via the method of moments.
Estimate the probability that a policy chosen at random will have more than 1 claim next year.
(A) Less than 1.6%
(B) At least 1.6%, but less than 1.8%
(C) At least 1.8%, but less than 2.0%
(D) At least 2.0%, but less than 2.2%
(E) At least 2.2%
2.2 (2 points) You fit a Binomial Distribution with m = 3 via the method of moments.
Estimate the probability that a policy chosen at random will have more than 1 claim next year.
(A) Less than 1.6%
(B) At least 1.6%, but less than 1.8%
(C) At least 1.8%, but less than 2.0%
(D) At least 2.0%, but less than 2.2%
(E) At least 2.2%
2.3 (3 points) You have the following data from 10,000 insureds for one year.
Xi is the number of claims from the ith insured.
∑ Xi = 431. ∑ Xi2 = 634.

The number of claims of a given insured during the year is assumed to be Poisson
distributed with an unknown mean that varies by insured via a Gamma Distribution.
Of these 10,000 insureds, how many should one expect to observe with 3 claims next year?
A. 4 B. 6 C. 8 D. 10 E. 12

One has observed the following distribution of insureds by number of claims:
Number of Claims 0 1 2 3 4 5&+ All
Number of Insureds 490 711 572 212 15 0 2000
2.4 (2 points) A Binomial Distribution with m = 4 is fit via the Method of Moments.
Which of the following is an approximate 95% confidence interval for q?
A. [0.317, 0.321] B. [0.315, 0.323] C. [0.313, 0.325]
D. [0.311, 0.327] E. [0.309, 0.329]
2.5 (2 points) A Poisson Distribution is fit via the Method of Moments.

Which of the following is an approximate 95% confidence interval for λ?
A. [1.251, 1.301] B. [1.226, 1.326] C. [1.201, 1.351]
D. [1.176, 1.376] E. [1.151, 1.401]
2.6 (2 points) A Negative Binomial Distribution with r = 2 is fit via the Method of Moments.
Which of the following is an approximate 95% confidence interval for β?
A. [0.636, 0.640] B. [0.626, 0.650] C. [0.606, 0.670]
D. [0.576, 0.700] E. [0.526, 0.675]

Over one year, the following claim frequency observations were made for a group of 13,000
policies, where ni is the number of claims observed for policy i:
Σ ni = 671. Σ ni2 = 822.
2.7 (2 points) You fit a Poisson Distribution via the method of moments.
Estimate the probability that a policy chosen at random will have at least one claim next year.
A. 4.4% B. 4.6% C. 4.8% D. 5.0% E. 5.2%
2.8 (3 points) You fit a Negative Binomial Distribution via the method of moments.
Estimate the probability that a policy chosen at random will have at least one claim next year.
A. 4.4% B. 4.6% C. 4.8% D. 5.0% E. 5.2%

During a year, 10,000 insureds have a total of 4200 claims.
2.9 (1 point) You fit a Binomial Distribution with m = 10 via the method of moments.
What is the fitted q?
(A) Less than 0.04
(E) At least 0.07
2.10 (2 points) What is the standard deviation of q^ ?

(A) Less than 0.0004
(E) At least 0.0007
2.11 (1 point) You fit a Poisson Distribution via the method of moments. What is the fitted λ?
(A) Less than 0.4
(E) At least 0.7
^
2.12 (1 point) What is the standard deviation of λ ?
(A) Less than 0.007
(E) At least 0.010
2.13 (1 point) Using the fitted Poisson, what is the probability that an insured picked at random will
have at least one claim next year?
(A) Less than 25%
(B) At least 25%, but less than 30%
(C) At least 30%, but less than 35%
(D) At least 35%, but less than 40%
(E) At least 40%
2.14 (2 points) What is the standard deviation of the estimate in the previous question?
(A) 0.0005 (B) 0.001 (C) 0.002 (D) 0.003 (E) 0.004
2.15 (1 point) You fit a Negative Binomial Distribution with r = 3 via the method of moments.
What is the fitted β?
(A) Less than 0.2
(E) At least 0.5
^
2.16 (2 points) What is the standard deviation of β?
(A) Less than 0.001

(E) At least 0.004
2.17 (1 point) Using the fitted Negative Binomial, what is the probability that an insured picked at
random will have one claim next year?
(A) 10% (B) 15% (C) 20% (D) 25% (E) 30%
(A) 0.0001 (B) 0.0002 (C) 0.0005 (D) 0.0010 (E) 0.0020
2.19 (2 points) You are given the following information on a block of similar Homeowners policies:
Number of Policies
Renewed 90
Not renewed 10
Using the normal approximation, determine the lower bound of the symmetric 95% confidence
interval for the probability that such a Homeowners policy will be renewed.
(A) 0.84 (B) 0.85 (C) 0.86 (D) 0.87 (E) 0.88
2.20 (2 points) You observe 39 successes in 1000 trials.

Using the normal approximation, determine the upper bound of the symmetric 95% confidence
interval for the probability of a success.
(A) 0.050 (B) 0.051 (C) 0.052 (D) 0.053 (E) 0.054

• A discrete random variable X has the density function:
β1 x β 2x
f(x) = 0.2 + 0.8 , x = 0, 1, 2, ... , 0 < β1 < β2.
(1+ β1 )x + 1 (1+ β 2)x + 1
• A random sample taken of the random variable X has mean 0.14 and variance 0.16.
Determine the method of moments estimate of β1.
A. 0.11 B. 0.13 C. 0.15 D. 0.17 E. 0.19
2.22 (3 points) The number of claims follows a Negative Binomial distribution.

A random sample of 9237 policyholders produced the following distribution of the number of claims
reported:
Claims Number of Policyholders
0 9000
1 200
2 30
3 5
4 2
Use the sample data and the method of moments in order to estimate the probability that a
policyholder chosen at random will have 4 claims next year.
A. Less than 0.005%
E. At least 0.020%
2.23 (4, 5/89, Q.45) (2 points) The number of claims X that a randomly selected policyholder has
in a year follows a negative binomial distribution, as per Loss Models.
The following data was compiled for 10,000 policyholders in one year:
Number of Number of
Claims Policyholders
0 8,200
1 1,000
2 600
3 200
What is the method of moments estimate for the parameter β?
A. β < 0.35 B. 0.35 ≤ β < 0.40 C. 0.40 ≤ β < 0.45 D. 0.45 ≤ β < 0.50 E. 0.50 ≤ β
2.24 (4, 5/90, Q.24) (1 point) Assume that the number of claims for an insured has a Poisson
distribution: p(n) = e-θ θn / n!.
Using the observations 3, 1, 2, 1, taken from a random sample, what is the method of moments
~
estimate, θ , of θ?
~
A. θ < 1.40
~
B. 1.40 ≤ θ < 1.50
~
C. 1.50 ≤ θ < 1.60
~
D. 1.60 ≤ θ < 1.70
~
E. 1.70 ≤ θ
• Number of large claims follows a Poisson distribution.
• Exposures are constant and there are no inflationary effects.
• In the past 5 years, the following number of large
claims has occurred: 12, 15, 19, 11, 18
Estimate the probability that more than 25 large claims occur in one year.
(The Poisson distribution should be approximated by the normal distribution.)
A. Less than 0.002
E. At least 0.005
• The occurrence of hurricanes in a given year has a Poisson distribution.
• For the last 10 years, the following number of hurricanes has occurred:
2, 4, 3, 8, 2, 7, 6, 3, 5, 2
Using the normal approximation to the Poisson, determine the probability of more than 10
hurricanes occurring in a single year.
A. Less than 0.0005
E. At least 0.0065
2.27 (CAS9, 11/94, Q.32) (2 points)

You are given the following accident frequency distribution for a class:
Number of Accidents Number of Insureds
0 767
1 190
2 35
3 3
4 1
You are also given that:
• This distribution can be fit well by a negative binomial distribution.
• Each insured has a Poisson frequency of the number of accidents.
• The underlying distribution of hypothetical means has a Gamma distribution.
A class should be considered sufficiently homogeneous if the coefficient of variation of the underlying
distribution of hypothetical means is less than 0.500.
Determine whether or not the class whose data is shown is sufficiently homogeneous.
2.28 (2, 2/96, Q.26) (1.7 points) Let X1 ,..., Xn be a random sample from a discrete distribution
with probability function: p(1) = θ, p(2) = θ, and p(3) = 1 - 2θ, where 0 < θ < 1/2.
Determine the method of moments estimator of θ.
A. (3 - X )/3 B. ( X - 1)/4 C. (2 X - 3)/6 D. X E. X /2
2.29 (4, 11/04, Q.8 & 2009 Sample Q.138) (2.5 points)
You are given the following sample of claim counts:
0 0 1 2 2
You fit a binomial(m, q) model with the following requirements:
(i) The mean of the fitted model equals the sample mean.
(ii) The 33rd percentile of the fitted model equals the smoothed empirical 33rd percentile
of the sample.
Determine the smallest estimate of m that satisfies these requirements.
(A) 2 (B) 3 (C) 4 (D) 5 (E) 6
2.30 (CAS3, 5/06, Q.1) (2.5 points)

The number of goals scored in a soccer game follows a Negative Binomial distribution.
A random sample of 20 games produced the following distribution of the number of goals scored:
Goals Scored Frequency Goals Scored Frequency
0 1 5 2
1 3 6 1
2 4 7 0
3 5 8 1
4 3
Use the sample data and the method of moments to estimate the parameter β of the Negative
Binomial distribution.
A. Less than 0.25
E. At least 1.00

Loss Experience Number of Policies
0 claims 1600
1 or more claims 400
Using the normal approximation, determine the upper bound of the symmetric 95% confidence
interval for the probability that a single policy has 1 or more claims.
(A) 0.200 (B) 0.208 (C) 0.215 (D) 0.218 (E) 0.223
2.32 (CAS3L, 11/09, Q.17) (2.5 points) You are creating a model to describe exam progress.
You are given the following information:
• Let X be the number of exams passed in a given year.
• The probability mass function is defined as follows:
P(X = 0) = 1 - p - q
P(X = 1) = p
P(X = 2) = q
• Over the last 5 years, you observe the following values of X:
0 0 1 2 2
Calculate the method of moments estimate of p.
A. Less than 0.15
E. At least 0.33
2.1. D. λ = mean = ((0)(800) + (1)(180) + (2)(19) + (3)(1)) /1000 = 0.221.

1 - {f(0) + f(1)} = 1 - e-0.221(1 + 0.221) = 2.11%.
2.2. A. 3q = mean = 0.221. q = 0.07367.

f(2) + f(3) = (3)(0.073672 )(1 -0 .07367) + 0.073673 = 1.55%.
2.3. D. The mixed distribution for a Gamma-Poisson is a Negative Binomial.

Fit a Negative Binomial via Method of Moments.
second moment = 634/10000 = 0.0634. mean = 0.0431 = rβ.
variance = 0.0634 - 0.04312 = 0.06154 = rβ(1+β).
(1+β) = 0.06154/0.0431 = 1.4278. ⇒ β = 0.4278. r = 0.1007.

f(3) = (r(r+1)(r+2)/3!)β3 /(1+β)r+3 = {(0.1007)(1.1007)(2.1007)/6) 0.42783 /1.42783.1007 =
0.1007%. 10,000 f(3) = 10.1.
2.4. E. X = {(0)(490) + (1)(711) +(2)(572) + (3)(212) + (4)(15)}/2000 = 1.2755.
q^ = X /m = 1.2755/4 = 0.3189. Var[ q^ ] = Var[ X ]/16 = (Var[X]/2000)/ 16 = 4(q)(1-q) / 32,000 =
(0.3189)(1 - 0.3189)/8000 = 0.00002715. StdDev[ q^ ] = 0.00002715 = 0.0052.

Thus an approximate 95% confidence interval for q is: 0.319 ± (1.96)(0.0052) = 0.319 ± 0.010.
^
2.5. B. λ = X = 1.2755.
^
Var[ λ ] = Var[ X ] = Var[X]/ 2000 = λ / 2000 = 1.2755/2000 = 0.0006378.
Standard Deviation is: 0.0006378 = 0.0253.

Thus an approximate 95% confidence interval for λ is: 1.276 ± (1.96)(0.0253) = 1.276 ± 0.050.
^ ^
2.6. C. β = X / r = 1.2755/2 = 0.6378. Var[ β] = Var[ X ]/22 = Var[X]/ (4)(2000) = 2(β)(1+β)/ 8000 =
(0.6378)(1.6378)/4000 = 0.0002611. Standard Deviation is: 0.0002611 1 = 0.0162.

2.7. D. λ = mean = 671/ 13000 = 0.05162. 1 - f(0) = 1 - e-0.05162 = 1 - 95.0% = 5.0%.

2.8. B. second moment = 822/13000 = 0.06323. mean = 0.05162 = rβ.

variance = 0.06323 - 0.051622 = 0.06057 = rβ(1 + β).
(1 + β) = 0.06057/0.05162 = 1.1734. ⇒ β = 0.173. ⇒ r = 0.2977.

1 - f(0) = 1 - 1/1.17340.2977 = 1 - 95.35% = 4.65%.
2.9. B. X = 4200/10000 = 0.42. q^ = X /m = 0.42/10 = 0.042.
2.10. D. Var[ q^ ] = Var[ X /m] = Var[ X /10] = Var[ X ]/100 = (Var[X]/n)/100 =

(10q(1-q)/10000)/100 = (0.042)(1-0.042)/100000 = 0.000000402.
StdDev[ q^ ] = 0.000000402 = 0.000634.
^
2.11. B. λ = X = 0.42.
^
2.12. A. Var[ λ ] = Var[ X ] = Var[X]/n = λ/10000 = 0.42/10000.
^
StdDev[ λ ] = 0.42 / 100 = 0.00648.
2.13. C. Prob[at least one claim] = 1 - e−λ = 1 - e-0.42 = 34.30%.
∂h
2.14. E. h(λ) = 1 - e−λ. = e−λ = e-0.42 = 0.6570.
∂λ
∂h 2
Var[h(λ)] ≅ ( ) Var[ λ^ ] = (0.65702 )(0.000042). StdDev[h(λ)] = 0.6570 0.000042 = 0.00426.
∂λ
Comment: Thus an approximate 95% confidence interval for the probability of at least one claim is:
34.30% ± (1.96)(0.426%) ≅ 34.3% ± 0.8%.
^
2.15. A. β = X /r = 0.42/3 = 0.14.
^
2.16. C. Var[ β] = Var[ X /3] = Var[ X ]/9 = (Var[X]/n)/9 = {3β(1+β)/10000}/9 = (0.14)(1.14)/30000 =
^
0.00000532. StdDev[ β] = 0.00000532 = 0.00231.
2.17. D. f(1) = rβ / (1+β)1+r = (3)(0.14)/(1.14)4 = 24.87%.
∂h
2.18. E. h(β) = 3β / (1+β)4 . = 3/(1+β)4 - 12β/(1+β)5 = 3/1.144 - 12(.14)/1.145 = 0.9037.
∂β
∂h 2 ^
Var[h(β)] ≅ ( ) Var[ β] = (0.90372 )(.00000532).
∂β
StdDev[h(β)] = 0.9037 0.00000532 = 0.00208.

Comment: Thus an approximate 95% confidence interval for f(1) is:
24.87% ± (1.96)(.208%) ≅ 24.9% ± 0.4%.
2.19. A. Treat this as a yes/no situation or Bernoulli, with q the probability of renewal.
We fit a Bernoulli using the method of moments: q^ = X = 90/100 = 0.9.
Var[ q^ ] = Var[ X ] = Var[X]/n = q(1 - q)/n = (0.9)(0.1)/100 = 0.0009.
An approximate 95% confidence interval for q^ is:
0.9 ± 1.960 0.0009 = 0.9 ± 0.0588 = [0.8412 , 0.9588].

One would get the same result using maximum likelihood.
2.20. B. We fit a Bernoulli using the method of moments: q^ = X = 39/1000 = 0.039.

Var[ q^ ] = Var[ X ] = Var[X]/n = q(1 - q)/n = (0.039)(1 - 0.039)/1000 = 0.0000375.
0.0390 ± 1.960 0.0000375 = 0.0390 ± 0.0120 = [0.0270 , 0.0510].
2.21. A. This is a mixture of two Geometric Distributions, with weights of 20% and 80%.
E[X] = 0.2 β1 + 0.8 β2.
A Geometric Distribution has a mean of β, and a variance of: β(1+β) = β + β2.
Therefore, a Geometric Distribution has a second moment of: β + β2 + β2 = β + 2β2.

Therefore, the mixture has a second moment of:
(0.2)(β1 + 2β12) + (0.8)(β2 + 2β22) = 0.2 β1 + 0.4 β12 + 0.8 β2 + 1.6 β22.
Setting the theoretical and observed moments equal, gives 2 equations in 2 unknowns:
0.2 β1 + 0.8 β2 = 0.14. ⇒ β2 = 0.175 - 0.25 β1.
0.2 β1 + 0.4 β12 + 0.8 β2 + 1.6 β22 = 0.16 + 0.142 = 0.1796. ⇒
0.2 β1 + 0.4 β12 + (0.8) (0.175 - 0.25 β1) + (1.6) (0.175 - 0.25 β1)2 = 0.1796. ⇒
0.5 β12 - 0.14 β1 + 0.0094 = 0. ⇒
0.14 ± (-0.14)2 - (4)(0.5)(0.0094)

β1 = = 0.1117 or 0.1683.
(2)(0.5)
If β1 = 0.1117, then β2 = 0.175 - (0.25)(0.1117) = 0.1471 > 0.1117. OK.
If β1 = 0.1683, then β2 = 0.175 - (0.25)(0.1683) = 0.1329 < 0.1683. Not OK.

Comment: Similar to 4, 11/01, Q.23 in “Mahlerʼs Guide to Fitting Loss Distributions.”
We are told that β1 < β2.
2.22. C. Mean is: (200 + 60 + 15 + 8)/9237 = 0.03064.

2nd moment is: (200 + 120 + 45 + 32)/9237 = 0.04298.
Variance is: 0.04298 - 0.030642 = 0.04204.
Matching the mean and the variance: rβ = 0.03064, and rβ(1 + β) = 0.04204.
⇒ 1 + β = 0.04204/0.03064 = 1.3721. ⇒ β = 0.3721. ⇒ r = 0.03064/0.3721 = 0.08234.

f(4) = {(0.08234)(1.08234)(2.08234)(3.08234)/4!} 0.37214 / 1.37214.08234 = 0.0126%.
2.23. E. X = {(8200)(0) + (1000)(1) + (600)(2) + (200)(3)} / 10,000 = 0.28.

The estimated 2nd moment is: {(8200)(02 ) + (1000)(12 ) + (600)(22 ) + (200)(32 )} / 10,000 = 0.52.
Thus the estimated variance is: 0.52 - 0.282 = 0.4416. Setting the estimated mean and variance
equal to the theoretical mean and variance for the Negative Binomial:
Variance/Mean = rβ(1 + β) / (rβ) = 1 + β = 0.4416/0.28 = 1.577. ⇒ β = 0.577.
Comment: One can use the value of β to then solve for r = .28/.577 = 0.485. For β =0.577 and
r = 0.485, one would expect 10000 policyholders to be distributed as follows:
Number of Claims 0 1 2 3 4 5 6 7
Fitted Number of Policyholders 8017 1423 387 117 37 12 4 1
Observed # of Policyholders 8200 1000 600 200 0 0 0 0
Thus the Negative Binomial is not a good fit to this data.
2.24. E. Since the Poisson has one parameter θ, one sets X equal to the mean of the Poisson, θ.
X = (3+1+2+1)/4 = 1.75. Thus θ = 1.75.
2.25. C. The average number of large claims observed per year is: (12+15+19+11+18)/5 = 15.
Thus we estimate that the Poisson has a mean of 15 and thus a variance of 15. We wish to estimate
the probability of 26 large claims or more; using the continuity correction, we wish to standardize
25.5 by subtracting the mean of 15 and dividing by the standard deviation of 15 .
Thus Prob(N > 25) ≅ 1 - Φ[(25.5 - 15)/ 15 ] = 1 - Φ(2.71) ≅ 1 - 0.9966 = 0.0034.
2.26. B. The observed mean is 42 / 10 = 4.2. Assume a Poisson with mean of 4.2 and therefore
variance of 4.2. Using the “continuity correction”,
P( N > 10) ≅ 1 - Φ[(10.5 - 4.2) / 4.2 ] = 1 - Φ[3.07] = 1 - 0.9989 = 0.0011.
2.27. Fit a Negative Binomial to the given data via the Method of Moments.
Mean is: 273/996 = 0.2741. Second moment is: 373/996 = 0.3745.
Variance is: 0.3745 - 0.27412 = 0.2994.
rβ = 0.2741. rβ(1+β) = 0.2994. ⇒ β = 0.0923. ⇒ r = 2.97.
Thus the distribution of hypothetical means has a Gamma distribution with α = 2.97 and θ = 0.0923.
αθ2 1 1
The coefficient of variation of this Gamma is: = = = 0.580 > 0.5.
αθ α 2.97
Thus this class is not sufficiently homogeneous.
Comment: While one could fit a Negative Binomial to this data via maximum likelihood, it would
require a computer. The maximum likelihood Negative Binomial has r = 2.901 and β = 0.09448.
2.28. A. E[X] = (1)θ + (2)θ + (3)(1 - 2θ) = 3 - 3θ. Set X = E[X]. X = 3 - 3θ. ⇒ θ = (3 - X )/3.
2.29. E. Sample Mean = (0 + 0 + 1 + 2 + 2)/5 = 1.

(0.33)(5 + 1) = 1.98. The 1.98th value is 0, the smoothed empirical 33rd percentile.
Thus for the Binomial we want mq = 1 and the 33rd percentile to be 0.
⇒ q = 1/m, and F(0) ≥ 0.33. F(0) = f(0) = (1 - q)m = (1 - 1/m)m. ⇒ (1 - 1/m)m ≥ 0.33.
Try values of m. For m = 5, (4/5)5 = 0.3277. For m = 6, (5/6)6 = 0.3349.
Comments: A mixture of method of moments and percentile matching, not a typical real world
application. Percentile matching is discussed in “Mahlerʼs Guide to Fitting Loss Distributions.” For a
discrete distribution, take the 100pth percentile as the first value at which F(x) ≥ p. In this case, the
33rd percentile of the Binomial Distribution is the smallest x such that F(0) ≥ 0.33. For m = 5 and
q = 1/5, the 33rd percentile is 1. For m = 6 and q = 1/6, the 33rd percentile is 0.
2.30. A. Mean is:

{(1)(0) + (3)(1) + (4)(2) + (5)(3) + (3)(4) + (2)(5) + (1)(6) + (0)(7) + (1)(8)} / 20 = 3.1
2nd moment is:
{(1)(02 ) + (3)(12 ) + (4)(22 ) + (5)(32 ) + (3)(42 ) + (2)(52 ) + (1)(62 ) + (0)(72 ) + (1)(82 )} / 20 =
262/20 = 13.1. Variance is: 13.1 - 3.12 = 3.49.
Matching the mean and the variance: rβ = 3.1, and rβ(1 + β) = 3.49.
⇒ 1 + β = 3.49/3.1 = 1.1258. ⇒ β = 0.1258. ⇒ r = 3.1/0.1258 = 24.64.
2.31. D. Treat this as a yes/no situation or Bernoulli.

We fit a Bernoulli using the method of moments: q^ = X = 400/2000 = 1/5.
Var[ q^ ] = Var[ X ] = Var[X]/n = q(1 - q)/n = (1/5)(4/5)/2000 = 0.00008.
0.2 ± 1.960 0.00008 = 0.2 ± 0.0175 = [0.1825 , 0.2175].
Comment: q here is the probability of one or more claim.
One would get the same result using maximum likelihood.
2.32. B. In order to fit the two parameters, p and q, we match the first two moments:
(0)(1 - p - q) + (1)(p) + (2)(q) = (0 + 0 + 1 + 2 + 2)/5. ⇒ p + 2q = 1.
(02 )(1 - p - q) + (12 )(p) + (22 )(q) = (02 + 02 + 12 + 22 + 22 )/5. ⇒ p + 4q = 9/5.

Solving the two equations in two unknowns: p = 1/5, and q = 2/5.
Comment: In general this intuitive result holds: p^ = the observed proportion of ones, and
q^ = the observed proportions of twos.
Let n be the total number of observations, with n1 ones, n2 twos, and n - n1 - n2 zeros.
Then the two equations are: p + 2q = (n1 + 2n2 )/n. p + 4q = (n1 + 4n2 )/n.
The solutions of these two equations is: p^ = n1 /n, and q^ = n2 /n.
2016-C-5, Fitting Frequency § 3 Maximum Likelihood, HCM 10/27/15, Page 29
Section 3, Method of Maximum Likelihood
For ungrouped data {x1 , x2 , ... , xn } define:

Likelihood = Π f(xi) . Loglikelihood = Σ ln f(xi) .
During a year, 5 policies have 0 claims each, 3 policies have 1 claim each, and 2 policies have 2
claims each. Then the likelihood is: f(0)f(0)f(0)f(0)f(0)f(1)f(1)f(1)f(2)f(2) = f(0)5 f(1)3 f(2)2 .
The loglikelihood is: lnf(0) + lnf(0) + lnf(0) + lnf(0) + lnf(0) + lnf(1) + lnf(1) + lnf(1) + lnf(2) + lnf(2) =
5lnf(0) + 3lnf(1) + 2lnf(2).
For a given data set and type of distribution, the likelihood and the loglikelihood are functions of the
parameter(s) of the distribution. In order to fit a chosen type distribution by maximum likelihood you
maximize the likelihood or equivalently maximize the loglikelihood. In other words for ungrouped
data, you find the set of parameters such that either Π f(xi) or Σ ln f(xi) is maximized.
For single parameter frequency distributions one can solve for the parameter value by taking the
derivative of the loglikelihood and setting it equal to zero.8
Poisson:
Exercise: For the above data, what is the loglikelihood for a Poisson distribution?
[Solution: For the Poisson: f(x) = λx e−λ / x!. ln f(0) = ln(e−λ) = -λ. ln f(1) = ln(λe−λ) = lnλ - λ.
ln f(2) = ln(λ2 e−λ/2) = 2lnλ − λ − ln(2). loglikelihood = 5lnf(0) + 3lnf(1) + 2lnf(2) =

5{- λ} + 3{lnλ - λ} + 2{2lnλ − λ − ln(2)} = 7lnλ − 10λ - 2ln(2).]
8
For two parameters, one can apply numerical techniques. See for example Appendix F of Loss Models. Many
computer programs such as Excel or Mathematica come with algorithms to maximize or minimize functions, which
can be used to maximize the loglikelihood or minimize the negative loglikelihood.
The loglikelihood as a function of the parameter λ:
- 12
- 14
- 16
0.5 1 1.5 2
Exercise: For the above example, for what value of λ is the loglikelihood maximized?
[Solution: loglikelihood = 7lnλ − 10λ - 2ln(2).
Set the partial derivative with respect to λ equal to 0: 0 = 7/λ - 10. ⇒ λ = 7/10 = 0.7.]
Exercise: For the above example, using the method of moments what is the fitted Poisson?
[Solution: λ = X = {(5)(0) + (3)(1) + (2)(2)}/10 = 7/10 = 0.7.]
Maximum likelihood and the method of moments each result in a Poisson with λ = 0.7.
For the Poisson: f(x) = λx e−λ / x!.

Thus the loglikelihood for the observations x1 , x2 , ... xN is:
N N
∑ ln f(xi) = ∑ {xi lnλ - λ - ln( xi!)} .

i=1 i=1
Taking the partial derivative with respect to λ and setting it equal to zero, gives:
N N
∑ {(xi / λ) - 1} = 0. Therefore, λ = ∑ xi / N = X .
i=1 i=1
For the Poisson Distribution the Method of Maximum Likelihood equals the Method of
Moments, provided the frequency data has not been grouped into intervals.
Binomial, m fixed:
Exercise: Assume the following distribution of insureds by number of claims:

What is the loglikelihood for a Binomial Distribution with fixed m = 6 and q unknown?
[Solution: For the Binomial with m = 6, f(x) = {6!/ (6-x)!x!} qx (1-q)6-x.
ln f(x) = x ln (q) + (6-x) ln(1-q) + ln(6!) - ln[(6-x)!] - ln(x!). The loglikelihood is the sum of the
loglikelihood at each point; alternately each number of claims contributes the product of its number of
insureds times the loglikelihood for those number of claims.
208 ln f(0) + 357 ln f(1) + 274 ln f(2) + 126 ln f(3) + 31 ln f(4) + 3 ln f(5) + 1 ln f(6) =
208 {6 ln(1-q)} + 357 {ln (q) + 5 ln(1-q) + ln(6!) - ln(5!)} +
274 {2 ln (q) + 4 ln(1-q) + ln(6!) - ln(4!) - ln(2!)} +
126 {3 ln (q) + 3 ln(1-q) + ln(6!) - ln(3!) - ln(3!)} +
31{4 ln (q) + 2 ln(1-q) + ln(6!) - ln(2!) - ln(4!)} + 3 {5 ln (q) + ln(1-q) + ln(6!) - ln(5!)} + 1{6 ln q}.]
Exercise: Fit the above data to a Binomial with m = 6 via the Method of Maximum Likelihood.
[Solution: Set equal to zero the partial derivative of the loglikelihood with respect to the single
parameter q.
0 = {(1)(357) + (2)(274) +(3)(126) + (4)(31) + (5)(3) + (6)(1)}/q -
{(6)(208) + (5)(357) + (4)(274) + (3)(126) + (2)(31) + (1)(3)}/(1-q). ⇒
1428/q = 4572/(1-q). ⇒ q = 1428/(4572 + 1428) = 1428/6000 = 0.238.

Comment: This is the same solution as fitting by the Method of Moments.
X = 1428/1000 = 1.428. Then q = X /m = 1.428/6 = 0.238. ]
For a Binomial Distribution with m fixed, the Method of Maximum Likelihood solution for
q is the same as the Method of Moments, provided the frequency data has not been grouped
into intervals.
Binomial, m not fixed:
Assume we want to fit the above data via Maximum Likelihood to Binomial (with m not fixed.) Then
the first step is to maximize the likelihood for various fixed values of m,9 and then let m vary. We
note that m is always an integer10 greater than or equal to the largest number of claims observed. For
this data, m ≥ 6 and integer. For each value of m, q = X /m = 1.428/m. Then one can compute the
loglikelihood for these values of q and m.
9
As discussed above, for m fixed, the maximum likelihood estimate of q is observed mean / m. This is the same as
the method of moments estimate of q for m fixed.
10
In this respect the Binomial differs from the Negative Binomial, in which r can be any positive number, integer or
not.
Exercise: What is the loglikelihood for a Binomial Distribution with m = 6 and q = 0.238?
[Solution: 208 { 6 ln(0.762) } + 357 { ln(0.238) + 5 ln(0.762) + ln(6!) - ln (5!) } +
274 {2 ln(0.238) + 4 ln(0.762) + ln(6!) - ln(4!) - ln(2!)} +
126 {3 ln(0.238) + 3 ln(0.762) + ln(6!) - ln(3!) - ln(3!)} +
31{4 ln(0.238) + 2 ln(0.762) + ln(6!) - ln(2!) - ln(4!)} + 3 {5 ln(0.238) + ln(0.762) + ln(6!) - ln(5!)} +
1{6 ln0.238} = (208)(-1.63085) + (357)(-1.00277) + (274)(-1.25015) + (126)(-2.12615) +
(31)(-3.57751) + (3)(-5.65747) + (1)(-8.61291) = -1444.13.]
For this data set, here are the loglikelihoods for m = 6, 7, 8 and 9, with q = 1.428/m:
m q loglikelihood
6 0.2380 -1444.13
7 0.2040 -1443.13
8 0.1785 -1443.20
9 0.1587 -1443.63
Loglikelihood
- 1443.5
- 1444.0
- 1444.5
m
6 7 8 9 10 11
The largest loglikelihood occurs when m = 7.11

Thus the maximum likelihood fit is at m = 7 and q = 0.2040.
11
The largest loglikelihood corresponds to the smallest negative loglikelihood. Above m = 7 the loglikelihood gets
smaller, so in this case the loglikelihood has been computed for enough values of m. If not, one would do the
computation at larger values of m, until one found the maximum likelihood.
Negative Binomial, r fixed:
Exercise: During a year, 5 policies have 0 claims each, 3 policies have 1 claim each, and 2 policies
have 2 claims each. For r = 3, what is the maximum likelihood Negative Binomial?
[Solution: For the Negative Binomial: f(x) = {r(r+1)..(r+x-1)/x!} β x / (1+β)x+r.
f(0) = 1/(1+β)3 . f(1) = 3β/(1+β)4 . f(2) = 6β2/(1+β)5 .

ln f(0) = -3ln(1+β). ln f(1) = ln(3) + ln(β) - 4ln(1+β). ln f(2) = ln(6) + 2ln(β) - 5ln(1+β).
loglikelihood = 5lnf(0) + 3lnf(1) + 2lnf(2) =
5{-3ln(1+β)} + 3{ln(3) + ln(β) - 4ln(1+β)} + 2{ln(6) + 2ln(β) - 5ln(1+β)} =
3ln(3) + 2ln(6) + 7ln(β) - 37ln(1+β).
Setting the partial derivative of the loglikelihood with respect to β equal to zero:
7/β - 37/(1+β) = 0. ⇒ 37β = 7(1+β). ⇒ β = 7/30.]
Exercise: During a year, 5 policies have 0 claims each, 3 policies have 1 claim each, and 2 policies
have 2 claims each. For r = 3, what is the method of moments Negative Binomial?
[ X = {(5)(0) + (3)(1) + (2)(2)}/10 = 7/10. rβ = X . ⇒ β = X /r = (7/10)/3 = 7/30.]
If one takes r as fixed in the Negative Binomial, then the method of maximum likelihood
equals the method of moments, provided the frequency data has not been grouped into
intervals.
Negative Binomial:
When r and β are both allowed to vary, then one must maximize the likelihood or loglikelihood via
numerical methods.
For example, assume the data to which we had previously fit a Negative Binomial via Method of
Moments:12
For given values of r and β, the loglikelihood is: 17649 ln f(0) + 4829 ln f(1) +
1106 ln f(2) + 229 ln f(3) + 44 ln f(4) + 9 ln f(5) + 4 ln f(6) + ln f(7) + ln f(8).
For given values of r and β, one can calculate the densities of the Negative Binomial Densities and
thus the loglikelihood.
12
The Method of Moments Negative Binomial had parameters r = 1.455 and β = 0.230.
A table of such loglikelihoods is:

r = 1.40 r = 1.45 r = 1.50 r = 1.55 r = 1.60
β = 0.210 -17,968.4 -17,943.8 -17,927.1 -17,918.1 -17,916.1
β = 0.215 -17,951.2 -17,931.5 -17,919.8 -17,915.6 -17,918.6
β = 0.220 -17,937.6 -17,922.8 -17,916.0 -17,916.8 -17,924.6
β = 0.225 -17,927.5 -17,917.5 -17,915.6 -17,921.2 -17,934.0
β = 0.230 -17,920.6 -17,915.5 -17,918.4 -17,928.9 -17,946.5
β = 0.235 -17,916.8 -17,916.5 -17,924.3 -17,939.7 -17,962.0
β = 0.240 -17,915.9 -17,920.5 -17,933.1 -17,953.3 -17,980.5
Based on this table, the maximum likelihood fit is approximately β = 0.23 and r = 1.45.13
It turns out that the maximum likelihood Negative Binomial has parameters: β = 0.2249 and
r = 1.4876.14 This maximum likelihood Negative Binomial seems to be a reasonable first
approximation to the observed data. In this case the maximum likelihood and method of moments
distributions are relatively similar.
Number of Observed Method of Moments Maximum Likelihood

Claims Negative Binomial Negative Binomial
0 17,649 17,663.5 17,653.5
1 4,829 4,805.8 4,821.8
2 1,106 1,103.1 1,101.1
3 229 237.6 235.0
4 44 49.5 48.4
5 9 10.1 9.8
6 4 2.0 1.9
7 1 0.4 0.4
8 1 0.1 0.1
9 0 0.0 0.0
10 0 0.0 0.0
Sum 23872.0 23872.0 23872.0
13
Note that the loglikelihood is also large for β = 0.225 and r = 1.50, and β = 0.215 and r =1.55. One would need to
refine the grid in order to achieve more accuracy for β and r.
14
For the Negative Binomial one solves via numerical methods. I used the Nelder-Mead algorithm, the “Simplex
Method” described in Appendix F of Loss Models, not on the syllabus.
Equations for Maximum Likelihood Fit of Negative Binomial Distribution:15
Rather than directly maximizing the loglikelihood by numerical techniques, one can set the partial
derivatives of the loglikelihood equal to zero.
For the Negative Binomial, f(x) = {(r+x-1)...(r+1)(r) / x!} βx / (1+β)x+r

ln f(x) = ln(r+x-1) + ... + ln(r+1) + ln(r) + x ln(β) - (x+r)ln(1+β) - ln(x!).
∂ ln[f(x)]
= x/β - (x+r)/(1+β) = x{1/β - 1/(1+β)} - r/(1+β).
∂β
x-1
∑r+i
∂ ln[f(x)] 1
= 1/(r+x-1) + ... + 1/(r+1) + 1/r - ln(1+β) = - ln(1+β),
∂r
i=0
where the summation is zero for x = 0.
For example, for the Negative Binomial and the previously given data set the loglikelihood is:
17,649 ln f(0) + 4829 ln f(1) + 1106 ln f(2) + 229 ln f(3) + 44 ln f(4) + 9 ln f(5) + 4 ln f(6) + ln f(7) +
ln f(8).
∂ ln[f(x)] 1 1
= {4829 + (2)(1106) + (3)(229) + (4)(44) + (5)(9) + (4)(6) + (1)(7) + (1)(8)} ( - )-
∂β β 1+β
23,782 r/(1+β) = 7988 / {β(1+β)} - 23,782 r/(1+β).
∂ ln[f(x)]
= 4829/r + 1106{1/r + 1/(r+1)} + 229{1/r + 1/(r+1) + 1/(r+2)} +
∂r
44{1/r + 1/(r+1) + 1/(r+2) + 1/(r+3)} + 9{1/r + 1/(r+1) + 1/(r+2) + 1/(r+3) + 1/(r+4)} +
4{1/r + 1/(r+1) + 1/(r+2) + 1/(r+3) + 1/(r+4) + 1/(r+5)} +
{1/r + 1/(r+1) + 1/(r+2) + 1/(r+3) + 1/(r+4) + 1/(r+5) + 1/(r+6)} +
{1/r + 1/(r+1) + 1/(r+2) + 1/(r+3) + 1/(r+4) + 1/(r+5) + 1/(r+6) + 1/(r+7)} - 23,782ln(1+β) .
One could set these two partial derivatives equal to zero and solve numerically.
The first equation becomes, 0 = 7988/ {β(1+β)} - 23,782 r/(1+β). ⇒
7988/23,782 = r β. ⇔ Observed mean = theoretical mean.

15
Beyond what you should be asked about on your exam.
More generally, let nx be the number of observations of x claims. ∑ nx = n.

X = observed mean = ∑ x nx /n. Then,
∑ β(1+β)
∂ ln[f(x)] x nx
= - n r/(1+β) = {n / (1+β)} ( X /β - 1/r).
∂β
∞ x-1
∑ ∑r + i
∂ ln[f(x)] 1
= - n ln(1+β).
∂r
x=0 i=0
Then setting the two partial derivatives equal to zero gives the following two equations:16
r β= X.
∞ x-1
∑ ∑r + i .
1
ln(1+β) = (1/n)
x=0 i=0
One could substitute beta from the first equation into the second equation:17
∞ x-1
∑ ∑r + i .
1
ln(1 + X /r) = (1/n)
x=0 i=0
Then one could solve this equation for r, via numerical means. Alternately, one can just directly
maximize the loglikelihood via numerical techniques, such as the Nelder-Mead Simplex method or
the Scoring Method.
16
See Equations 14.5 and 14.6 in Loss Models.
17
Exercise: For the following data and a Negative Binomial Distribution with parameters
β = 0.22494 and r = 1.48759, verify that r β = observed mean = X , and
∞ x-1
∑ ∑r + i .
1
ln(1 + X /r) = (1/n)
x=0 i=0
[Solution: The observed mean = 7988/23,872 = 0.033462. (0.22494)(1.48759) = 0.33462.

Thus r β = observed mean.
ln(1 + X /r) = ln(1 + 0.33462/1.48759) = 0.20289.
1.48759 Number of
Number of Sum of Observations times
Claims Observed 1/(r+i) 1/(r+i) Sum of 1/(r+i)
0 17,649 0.67223
1 4,829 0.40200 0.67223 3246.190
2 1,106 0.28673 1.07422 1188.091
3 229 0.22284 1.36095 311.659
4 44 0.18223 1.58379 69.687
5 9 0.15414 1.76602 15.894
6 4 0.13355 1.92016 7.681
7 1 0.11782 2.05372 2.054
8 1 0.10540 2.17153 2.172
Sum 23,872 4843.427
∞ x-1
∑ ∑ r + i = 4843.427/23,872 = 0.20289 = ln(1 + X /r). ]

1
Therefore, (1/n)
x=0 i=0
Thus we have shown that the maximum likelihood Negative Binomial has parameters:
β = 0.22494 and r = 1.48759.
Reparameterizing Distributions:
As pointed out in Loss Models, there are many different ways to parameterize a distribution. For
example, for a Negative Binomial, one could take µ = rβ and β as the parameters, rather than r and β.
In that case µ would be the mean of the Negative Binomial Distribution, and rβ(1+β) = µ(1+β) would
be the variance.
Exercise: For a Negative Binomial with µ = 2.6 and β = 1.3, what is the density at 4.
[Solution: Take r = µ/β = 2.6/1.3 = 2. f(4) = {r(r+1)(r+2)(r+3)/4!}β4 / (1+β)4+r =
{(2)(3)(4)(5)/24}(1.34 )/(2.36 ) = (5)(2.8561)/ 148.04 = 0.09647.]
Thus we see that the density at 4 is the same for either µ = 2.6 and β = 1.3 or
r = 2.6/1.3 = 2 and β = 1.3. Thus changing the way we parameterize the distribution can not affect
which distribution one gets via fitting by Maximum Likelihood.
Exercise: Assume the Maximum Likelihood Negative Binomial has µ = 6 (mean of 6) and
β = 1.7 (variance of (2.7)(6).) Then what is the Maximum Likelihood Negative Binomial
parameterized as in Loss Models, fit to this same data?
[Solution: r = µ/β = 6/1.7 and β = 1.7.]
The manner in which we parameterize the distribution has no effect on the result of applying the
Method of Maximum Likelihood. However, in some circumstances, certain forms of parameterization
may enhance the ability of numerical methods to quickly and easily find the Maximum Likelihood fit.18
Linear Exponential Families:19
For Linear Exponential Families, the Method of Maximum Likelihood and the Method Moments
produce the same result when applied to ungrouped data20. Thus there are many cases where one
can apply the method of maximum likelihood to ungrouped data by instead performing the simpler
method of moments: Exponential, Poisson, Normal for fixed σ, Binomial for m fixed (including the
special case of the Bernoulli) , Negative Binomial for r fixed (including the special case of the
Geometric), and the Gamma for α fixed.
18
Also the manner of parametrizing the distribution will affect the form of the Information Matrix. See “Mahlerʼs Guide
to Fitting Loss Distributions.” For a large class of distributions, if the mean is taken as one parameter, then for the
Method of Maximum Likelihood, the estimate of the mean will be asymptotically independent of the estimate of the
other parameter(s).
19
See "Mahler's Guide to Conjugate Priors".
20
This useful fact is demonstrated in "Mahler's Guide to Conjugate Priors".
Variance of Estimated Single Parameters:
Assuming the form of the distribution and the other parameters are fixed, the approximate variance
of the estimate of a single parameter using the method of maximum likelihood is given by negative
the inverse of the product of the number of points times the expected value of the second partial
derivative of the log likelihood: 21
^ -1
Variance of θ ≅ .22
∂2 lnf(x)
n E[ ]
∂θ 2
Exercise: A Poisson Distribution, f(x) = e−λ λ x / x!, is fit to 23,872 observations via the Method of
Maximum Likelihood and obtained a point estimate of λ = 0.3346.
What is the variance of this estimate of λ?
∂ ln[f(x)] ∂2 ln[f(x)]
[Solution: ln f(x) = xlnλ - λ - ln(x!). = x /λ - 1. = -x/λ2.
∂λ ∂λ 2
∂2 ln[f(x)]
E[ ] = E[ -x/λ2] = -E[x]/λ2 = -λ/λ2 = -1/λ.
∂λ 2
^ ∂2 ln[f(x)]
Thus Var[ λ ] ≅ -1/ {n E[ ]} = λ/n = 0.3346 / 23,872 = 0.00001402.]
∂λ 2
Since for the Poisson the Method of Moments equals the method of maximum
likelihood, we get the same result as was obtained for the Method of Moments in the
previous section.
As discussed, one can hold all but one parameter in a distribution fixed, and estimate the remaining
parameter by maximum likelihood. For example, assume we have a Negative Binomial with r = 1.5.
The second partial derivative of the loglikelihood with respect to β is obtained as follows:
f(x) = {r(r+1)...(r+x-1)/x!} β x / (1+β)x+r = {(1.5)(2.5)...(0.5 + x)/x!} βx / (1+β)x+1.5.
ln f(x) = xln(β) - (x+1.5)ln(1+β) + ln((1.5)(2.5)...(0.5 + x)/x!).
∂ ln[f(x)] ∂2 ln[f(x)]
= x/β - (x+1.5)/(1+β). = (x+1.5)/(1+β)2 - x/β2.
∂β ∂β 2
21
This is the Cramer-Rao (Rao-Cramer) lower bound.
22
This is a special case of the 2 parameter case, as discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
Exercise: Assume we have fit a Negative Binomial Distribution with r = 1.5 to 23,872 observations
^
via the Method of Maximum Likelihood and obtained a point estimate β = 0.223.
^
What is an approximate 95% confidence interval for β?
∂2 ln[f(x)]
[Solution: E[ ] = E[(x+1.5)/(1+β)2 - x/β2] = (E[x]+1.5)/(1+β)2 - E[x]/β2 =
∂β 2
(1.5β+1.5)/(1+β)2 - 1.5β/β2 = 1.5{1/(1+β) - 1/β} = -1.5 / {β(1+β)}.
^ ∂2 ln[f(x)]
Thus Var[ β] ≅ -1/ {nE[ ]} = β(1+β)/ (1.5n) = (0.223)(1.223) / {(1.5)(23,872)} =
∂β 2
0.00000762. Standard Deviation is: 0.00000762 = 0.00276. (1.96)(0.00276) = 0.005.

^
Comment: Since for the Negative Binomial with r fixed the Method of Moments equals the method
of maximum likelihood, we get the same result as was obtained for the Method of Moments in the
previous section.]
In this case as well as in general, the variance of the estimate is inversely proportional to the
number of data points used to fit the distribution. This can be a useful final check of your
solution.
Here is the approximate variance of the estimated parameter for the following single parameter
frequency distributions:
Distribution Parameter Approximate Variance

q(1- q)
Bernoulli q
n
q(1- q)
Binomial (m fixed) q
mn
λ
Poisson λ
n
β (1+ β)
Geometric β
n
β (1+ β)
Negative Binomial (r fixed) β
rn
1
Negative Binomial (β fixed) r ∞ x -1
1
∑ nx ∑ r + i
x =1 i =0
For two parameters distributions, one would make use of the “Information Matrix”, as discussed in
“Mahlerʼs Guide to Fitting Loss Distributions.”
Fisherʼs Information:
For one parameter distributions,

the Information or Fisherʼs Information is: -n E [∂2 ln f(x) / ∂θ2].
Fisherʼs Information is the reciprocal of the Cramer-Rao lower bound.
For the Poisson, method of moments is equal to maximum likelihood.23

^
Therefore, Var[ λ ] = Var[ X ] = Var[X]/n = λ/n.
^ 1 n
However, Var[ λ ] = . ⇒ Fisherʼs Information = .
Fisher's Information λ
∂ ln[f(x)] ∂2 ln[f(x)]
Alternately, ln f(x) = xlnλ - λ - ln(x!). = x/λ - 1. = -x/λ2.
∂λ ∂λ 2
∂2 ln[f(x)]
E[ ] = E[-x/λ2] = -E[x]/λ2 = -λ/λ2 = -1/λ.
∂λ 2
n
Fisherʼs Information = -n E [∂2 ln f(x) / ∂θ2] = .
λ
Frequency Distribution Fisherʼs Information
mn
Binomial, m fixed24
q (1- q)
n
Poisson
λ
rn
Negative Binomial, r fixed25
β (1+ β)
23
In the absence of grouping, truncation, or censoring.
24
m = 1 is a Bernoulli.
25
r = 1 is a Geometric.
Variance of Functions of Estimated Parameters:
When there is only one parameter θ, the variance of the estimate of a function of θ, h(θ) is:26
⎛ ∂h ⎞ 2 ^
Var[h(θ)] ≅ ⎜ ⎟ Var[ θ ].
⎝ ∂θ ⎠
For example, assume one has a Poisson Distribution. Assume we wish to estimate the chance of
∂h
having a single claim. Then h(λ) = λe-λ. Then = e-λ - λe-λ = e-λ (1- λ).
∂λ
^ ⎛ ∂h ⎞ 2 ^
As shown above Var[ λ ] ≅ λ / n. Thus, Var[h] ≅ Var[ λ ] = e-2λ (1- λ )2 λ / n.
⎝ ∂λ ⎠
Exercise: A Poisson Distribution, f(x) = e−λ λ x / x!, has been fit to 23,872 observations via the
Method of Maximum Likelihood and one obtained a point estimate λ = 0.3346.
What is the variance of the resulting estimate of the density function at 1?
[Solution: Var[h] ≅ e-2λ (1- λ )2 λ / n = e-(2)(0.3346) (1 - 0.3346)2 0.3346 / 23,872 = 0.00000318.
Comment: A one-dimensional example of the delta method, as discussed in “Mahlerʼs Guide to
Fitting Loss Distributions.”]
Exercise: A Poisson Distribution, f(x) = e−λ λ x / x!, has been fit to 23,872 observations via the
Method of Maximum Likelihood and one obtained a point estimate λ = 0.3346.
What is the an approximate 95% confidence interval for the chance of 1 claim?
[Solution: Var[h] ≅ 0.00000318. Standard Deviation is 0.0018.
The point estimate is: f(1) = λe-λ = 0.3346 e-0.3346 = 0.239.

Thus an approximate 95% confidence interval is: 0.239 ± 0.004.]
26
This is the same formula used in the section on Method of Moments. It is a special case of the delta method,
Data Grouped into Intervals:
Sometimes frequency data will be grouped into intervals.27

For grouped data, the likelihood for an interval [ai, bi] with ni observations is: {F(bi) -F(ai)} n i .
The loglikelihood is: ni ln[F(bi) -F(ai)].
For grouped data, the loglikelihood is a sum of terms over the intervals:
(number of observations in the interval) ln(probability covered by the interval).
For example, assume we have the following data:28

Number of Claims 0 1 2 3 4 5 or more All
Then the loglikelihood is:

17649 ln f(0) + 4829 ln f(1) + 1106 ln f(2) + 229 ln f(3) + 44 ln f(4) + 15 ln[1 - F(4)].
Where rather than use the density for the final interval, we use the sum of the densities for 5 or more,
that is 1 - F(4) = 1 - {f(0) + f(1) + f(2) + f(3) + f(4)}.
Exercise: What is the loglikelihood for the following data?

Number of Claims 0 1 2 to 4 5 or more All
Number of Insureds 17649 4829 1379 15 23872
[Solution: The loglikelihood is: 17649 ln f(0) + 4829 ln f(1) +

1379ln(f(2) + f(3) + f(4)) + 15 ln[1 - {f(0) + f(1) + f(2) + f(3) + f(4)}].]
27
This is more common for severity data. The most common grouping of frequency data is to have a final interval,
such as 10 or more claims.
28
Note that this is a previous data set, with a final interval of 5 or more claims. This grouping has removed some
information, compared to the previous version of the data set.
Exercise: What is the loglikelihood for the following data and a Negative Binomial Distribution with
r = 1.5 and β = 0.2?
[Solution: The densities of the Negative Binomial are:
x f(x)
0 0.7607258
1 0.1901814
2 0.0396211
3 0.0077041
4 0.0014445
5 0.0002648
6 0.0000478
ln[f(0)] = -0.273482. ln[f(1)] = -1.659777.
ln[f(2) + f(3)+ f(4)] = ln(0.0487697) = -3.02065.
ln[1 - {f(0) + f(1) + f(2) + f(3) + f(4)] = ln(0.0003231) = -8.038.
The loglikelihood is: 17,649 ln f(0) + 4829 ln f(1) +
1379ln[f(2) + f(3) + f(4)] + 15 ln[1 - {f(0) + f(1) + f(2) + f(3) + f(4)}] =
(17.649)(-0.273482) + (4829)(-1.659777) + (1379)(-3.02065) + (15)(-8.038) = -17,127.8.]
Given a particular form of the density and a set of data, the loglikelihood is a function of the
parameters. For example for a Poisson distribution and the following data
the loglikelihood is: 17,649 ln f(0) + 4829 ln f(1) +
1379 ln[f(2) + f(3) + f(4)] + 15 ln[1 - {f(0) + f(1) + f(2) + f(3) + f(4)}] =
17,649 ln(e−λ) + 4829 ln(λe−λ) + 1379 ln(λ2e−λ/2 + λ3e−λ/6 + λ4e−λ/24) +
15 ln(1 - e−λ - λe−λ - λ2e−λ/2 - λ3e−λ/6 - λ4e−λ/24).

Here is a graph of the loglikelihood as a function of lambda:
- 20000
- 25000
- 30000
0.2 0.4 0.6 0.8 1
To fit a distribution via maximum likelihood to grouped data you find the set of
parameters such that either Π{F(bi) -F(ai) } n i or Σ ni ln[F(bi) - F(ai)] is maximized.
While in this case one can not solve for the maximum loglikelihood in closed form, using a computer
one can determine that the maximum loglikelihood is -17243.8, for λ = 0.327562.
^
For this example, the maximum likelihood Poisson has λ = 0.328.
Years of Data:29
Sometimes one will only have limited information, the number of exposures and number of claims,
from each of several years. For example:
Year Exposures Claims

1995 1257 16
1996 1025 12
1997 1452 18
1998 1311 17
Total 5045 63
Note that the mean observed claim frequency is: 63/5045 = 0.01249.
Assume that each exposure has a Poisson frequency process, and that each exposure in every
year has the same expected claim frequency, λ. Assume that each Poisson frequency process is
independent across exposures and years.
Then in 1995, the number of claims is a Poisson frequency process with mean
1257λ. Similarly, in 1998, the number of claims is a Poisson frequency process with mean 1311λ.
In 1998, the likelihood is: f(17) = e− 1311λ (1311 λ)17 / 17!.

In 1998, the loglikelihood is: ln[f(17)] = 17 ln(λ) + 17 ln(1311) - 1311 λ - ln(17!).
Thus the sum of the loglikelihoods is: (16+12+18+17) ln(λ) + 16 ln(1257) + 12 ln(1025) +
18 ln(1452) + 17 ln(1311) - (1257 + 1025 + 1452 + 1311)λ - ln(16!) - ln(12!) - ln(18!) - ln(17!) =
63ln(λ) + 16ln(1257) + 12ln(1025) + 18ln(1452) + 17ln(1311) - 5045λ - ln(16!) - ln(12!) - ln(18!) -
ln(17!).
Setting the partial derivative of the loglikelihood equal to zero, one obtains
0 = 63/λ - 5045. ⇒ λ = 63/5045 = X .
In general, when applied to years of data, the Method of Maximum Likelihood applied to
the Poisson, produces the same result as the Method of Moments.
For either the Binomial with m fixed, or the Negative Binomial with r fixed, when applied to years of
data, the Method of Maximum Likelihood produces the same result as the Method of Moments.
29
Exercise: Assume that each exposure in every year has a Negative Binomial frequency
process, with the same parameters β and r. Assume that each Negative Binomial frequency
process is independent across exposures and years.
What is the loglikelihood for the following data:
1995 1257 16
1996 1025 12
1997 1452 18
1998 1311 17
Total 5045 63
[Solution: In 1998, the number of claims is the sum of 1311 independent Negative Binomial
Distributions each with the same parameters β and r, which is a Negative Binomial frequency
process with parameters β and 1311r. In 1998 the likelihood is:
f(17) = {(1311r + 16)!/(17!)(1311r -1)!} β17/(1+β)17+1311r.

ln f(17) = ln(1311r + 16)! - ln(17!) - ln((1311r -1)!) + 17ln β - (17+1311r) ln((1+β)).
Thus the sum of the loglikelihoods is:
ln((1257r + 15)!) + ln((1025r + 11)!) + ln((1452r + 17)!) + ln((1311r + 16)!) - ln(16!) - ln(12!) -
ln(18!) - ln(17!) - ln((1257r -1)!) - ln((1025r -1)!) - ln((1452r -1)!) - ln((1311r -1)!) + 63lnβ -
(63+5045r) ln(1+β).
Comment: One could use the form of the Negative Binomial density shown in the Appendix B
attached to the exam; I found the alternate form involving factorials easier to work with here.]
Then one could maximize this loglikelihood via numerical techniques, in order to fit the
maximum likelihood Negative Binomial to this data.
Note that when data is given for years in this manner, there is no way to estimate the second
moment. Thus one could not fit a Negative Binomial (r not fixed) via method of moments.
Restricted Maximum Likelihood:30
Assume the following numbers of claims:

Year 1st quarter 2nd quarter 3rd quarter 4th quarter Total
1 26 22 25 27 100
2 22 25 23 22 92
Assume that claims the first year are Poisson with parameter λ1.
Assume that claims the second year are Poisson with parameter λ2.
30
See 4, 11/00, Q. 34, involving exponential severities, in “Mahlerʼs Guide to Fitting Loss Distributions.”
Exercise: Use maximum likelihood to estimate λ1.
[Solution: It is the same as the method of moments. Estimated λ1 = 100.
Alternately, each quarter is Poisson with parameter λ1/4.

The loglikelihood is: ln f(26) + ln f(22) + ln f(25) + ln f(27) =
ln( e-λ1 /4 (λ1/4)26/26!) + ln( e-λ1 /4 (λ1/4)22/22!) + ln( e-λ1 /4 (λ1/4)25/25!) + ln( e-λ1 /4 (λ1/4)27/27!) =
−λ1 + 100ln(λ1) − 100ln(4) - ln(26!) - ln(22!) - ln(25!) - ln(27!).
Setting the partial derivative with respect to λ1 equal to zero: 0 = -1 + 100/λ1. ⇒ λ1 = 100.]
^
Similarly applying maximum likelihood to the data for Year 2: λ2 = 92.
Instead of separately estimating λ1 and λ2, one can assume some sort of relationship between
them. For example, let us assume λ2 = 0.9λ1.31
For a Poisson Distribution, f(x) = e−λλ x/x!. ln f(x) = -λ + xln(λ) - ln(x!).

Year 1 Loglikelihood is: -λ1 + 100ln(λ1) - ln(100!).
Assuming λ2 = 0.9λ1, Year 2 Loglikelihood is: -λ2 + 92ln(λ2) - ln(92!) =
-0.9λ1 + 92ln(0.9λ1) - ln(92!) = -0.9λ1 + 92ln(λ1) + 92ln(0.9) - ln(92!).
Total Loglikelihood = -λ1 + 100ln(λ1 ) - ln(100!) - 0.9λ1 + 92ln(λ1) + 92ln(0.9) - ln(92!) =
-1.9λ1 + 192ln(λ1) - ln(100!) + 92ln(0.9) - ln(92!).
Setting the partial derivative with respect to λ1 equal to zero:
0 = -1.9 + 192/λ1. ⇒ λ1 = 192/1.9 = 101.05.
Alternately, Year 2 is expected to produce the same number of claims as 0.9 of Year 1.
Therefore, in total we have 1.9 exposures on the Year 1 level.
For the Poisson, maximum likelihood = method of moments.32 λ1 = 192/1.9 = 101.05.
31
In practical applications, this type of assumption may come from many places. For example, it may come from
examining a much larger similar set of data.
32
This applies here due to the very special properties of the Poisson Distribution.
Problems:

0 301
1 361
2 217
3 87
4 27
5 6
6 1
1000
3.1 (1 point) The data given above is fit to a Poisson distribution using the method of maximum
likelihood. What is the mean number of claims for this fitted Poisson distribution?
A. less than 1.15
E. at least 1.45
3.2 (2 points) What is the upper end of an approximate 95% confidence interval for the estimate in
the previous question?
A. less than 1.15
E. at least 1.45
3.3 (3 points) What is an approximate 95% confidence interval for the chance of having 2 or more
claims?
A. [0.332, 0.342]
B. [0.327, 0.347]
C. [0.322, 0.352]
D. [0.317, 0.357]
E. [0.312, 0.362]
3.4 (2 points) You are given the following accident data from 1000 insurance policies:
0 100
1 267
2 311
3 208
4 87
5 23
6 4
7+ 0
You fit a Binomial Distribution with m = 7 to this data via Maximum Likelihood.
What is the fitted value of q?
A. less than 0.25
E. at least 0.40
3.5 (2 points) A fleet of cars has had the following experience for the last three years:
Year Cars Number of Claims
1 1500 60
2 1700 80
3 2000 100
Using maximum likelihood, estimate of the annual Poisson parameter for a single car.
A. 4.6% B. 4.8% C. 5.0% D. 5.2% E. 5.4%

• The number of claims per year for a given risk follows a distribution with
probability function p(n) = λn e−λ / n! , n = 0, 1,..., λ > 0.
• Five claims were observed for this risk during Year 1, nine claims were
observed for this risk during Year 2, and three claims were observed for this risk during
Year 3.
If λ is known to be an integer, determine the maximum likelihood estimate of λ.
A. 4 B. 5 C. 6 D. 7 E. 8
3.7 (1 point) A data set has an empirical mean of 1.5 and no individual with more than 4 claims.
For a Binomial Distribution and this data set:
m q loglikelihood
4 0.3750 -1633.20
5 0.3000 -1631.08
6 0.2500 -1629.87
7 0.2143 -1629.41
8 0.1875 -1630.63
Using the maximum likelihood Binomial Distribution, what is the probability of zero claims?
A. 15.3% B. 16.8% C. 17.8% D. 18.5% E. 19.0%
3.8 (3 points) You observe the following grouped data on 1000 insureds:
0-5 372
6-10 549
11-15 78
16-20 1
Which of the following expressions should be maximized in order to fit a Poisson Distribution with
parameter λ to the above data via the method of maximum likelihood?
⎛5 ⎞ 372 ⎛ 10 ⎞ 549 ⎛ 15 ⎞ 78 ⎛ 20 ⎞
A. e−1000λ ⎜ ∑ λ / i!⎟
i ⎜ ∑ λ / i!⎟
i ⎜ ∑ λ / i!⎟
i ⎜ ∑ λ i / i!⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝i=0 ⎠ ⎝i=6 ⎠ ⎝i=11 ⎠ ⎝ i=16 ⎠
⎛5 ⎞ 372 ⎛ 10 ⎞ 921 ⎛ 15 ⎞ 999 ⎛ 20 ⎞ 1000

B. e−1000λ ⎜ ∑ λ / i!⎟
i ⎜ ∑ λ / i!⎟
i ⎜ ∑ λ i / i!⎟ ⎜ ∑ λ i / i!⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝i=0 ⎠ ⎝i=6 ⎠ ⎝i=11 ⎠ ⎝i=16 ⎠
⎛5 ⎞ 628 ⎛ 10 ⎞ 79 ⎛ 15 ⎞
C. e−1000λ ⎜ ∑ λ i / i!⎟ ⎜ ∑ λ i / i!⎟ ⎜ ∑ λ i / i!⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝i=0 ⎠ ⎝i=6 ⎠ ⎝i=11 ⎠
⎛5 ⎞ ⎛ 10 ⎞ ⎛ 15 ⎞ ⎛ 20 ⎞
D. -1000λ + ln ⎜ ∑ λ / i!⎟ + ln ⎜ ∑ λ / i!⎟ + ln ⎜ ∑ λ / i!⎟ + ln ⎜ ∑ λ / i!⎟⎟
⎜ i ⎟ ⎜ i ⎟ ⎜ i ⎟ ⎜ i
⎝ i=0 ⎠ ⎝ i=6 ⎠ ⎝ i=11 ⎠ ⎝ i=16 ⎠
3.9 (3 points) You observe 100 insureds.

85 of them have no claims and 15 of them have at least one claim.
Fit a Poisson Distribution with parameter λ to this data via the method of maximum likelihood.
What is the fitted value of λ?
A. 0.15 B. 0.16 C. 0.17 D. 0.18 E. 0.19
3.10 (3 points) You have the following data from the state of East Carolina:
Region Number of Claims Number of Exposures Claim Frequency
Rural 5000 250,000 2.0%
Urban 10,000 312,500 3.2%
You assume that the distribution of numbers of claims is Poisson.
Based on data from other states, you assume that the mean claim frequency (expected number of
claims per exposure) for urban insureds is 1.5 times that for rural insureds.
Via the method of maximum likelihood applied to all the data, estimate the expected number of
claims for the rural region of East Carolina next year, if there are again 250,000 exposures.
A. 5000 B. 5050 C. 5100 D. 5150 E. 5200
3.11 (3 points) Claim counts follow a Negative Binomial distribution.

Determine the likelihood for the following four independent observations of claim counts: 2, 0, 4, 3.
r3 (r + 1) (r + 2) (r + 3) β9 r3 (r + 1)2 (r + 2)2 (r + 3) β9
A. B.
288 (1+ β)r + 9 288 (1+ β) r + 9
r3 (r + 1)3 (r + 2)2 (r + 3) β9 r3 (r + 1)3 (r + 2)2 (r + 3)2 β9

C. D.
288 (1+ β) 4r + 9 288 (1+ β) 4r + 9
3.12 (3 points) You observe the following data on 1000 policies:

0 100
1 267
2 311
3 208
4 or more 114
Which of the following expressions should be maximized in order to fit a Poisson Distribution with
parameter λ to the above data via the method of maximum likelihood?
A. -1000λ + 1513 ln(λ) + 114 ln(eλ - 1 - λ - λ2/2 - λ3/6) - 519 ln2 - 208 ln3.
B. -1000λ + 886 ln(λ) + 114 ln(eλ - 1 - λ - λ2/2 - λ3/6) - 519 ln2 - 208 ln3.
C. -886λ + 1513 ln(λ) + 114 ln(eλ - 1 - λ - λ2/2 - λ3/6) - 519 ln2 - 208 ln3.
D. -886λ + 886 ln(λ) + 114 ln(eλ - 1 - λ - λ2/2 - λ3/6) - 519 ln2 - 208 ln3.

One has observed the following distribution of insureds by number of claims:
Number of Claims 0 1 2 3 4 5&+ All
3.13 (2 points) A Poisson Distribution is fit via the Method of Maximum Likelihood.
^
Which of the following is the variance of λ ?
A. less than 0.0008
E. at least 0.0011
3.14 (2 points) A Negative Binomial Distribution with r = 3 is fit via the Method of Maximum
^
Likelihood. Which of the following is the variance of β?
A. less than 0.00012
E. at least 0.00015
3.15 (2 points) Claim counts follow a Poisson distribution. Determine the likelihood for the following
four independent observations of claim counts: 5, 1, 2, 4.
A. λ4 e−λ / 5760
B. λ12 e−λ / 5760
C. λ e−4λ / 5760
D. λ4 e−4λ / 5760

(i) An insurance policy has experienced the following numbers of claims over a 5-year period:
10 2 4 0 6
(ii) In the sixth year this insurance policy had at most one claim.
(iii) Numbers of claims are independent from year to year.
(iii) You use the method of maximum likelihood to fit a Poisson model.
Determine the estimated Poisson parameter.
(A) 3.60 (B) 3.65 (C) 3.70 (D) 3.75 (E) 3.80
3.17 (3 points)
You are given the following data for the number of claims during a one-year period:
Number of Claims Number of Policies
0 9000
1 800
2 180
3 20
4+ 0
Total 10,000
A Poisson distribution is fitted to the data using maximum likelihood estimation.
Let P = probability of at least one claim using the fitted Poisson model.
A Negative Binomial distribution is fitted to the data using the method of moments.
Let Q = probability of at least one claim using the fitted Negative Binomial model.
Calculate |P - Q|.
(A) 0.00 (B) 0.01 (C) 0.02 (D) 0.03 (E) 0.04
Use the following information for the next four questions.

Over one year, the following claim frequency observations were made for a group of 4000 policies,
where ni is the number of claims observed for policy i:
Σ ni = 372. Σ ni2 = 403.
3.18 (2 points) You fit a Binomial Distribution with m = 2 via maximum likelihood.
Estimate the number of these 4000 policies that will have exactly two claims next year.
3.19 (2 points) You fit a Poisson Distribution via maximum likelihood.

3.20 (2 points) You fit a Geometric Distribution via maximum likelihood.

3.21 (2 points) You fit a Negative Binomial Distribution with r = 0.5 via maximum likelihood. Estimate
the number of these 4000 policies that will have exactly two claims next year.
Use for the next four questions, the following data for three years:
Year Exposures Number of Claims
1 632,121 16,363
2 594,380 15,745
3 625,274 16,009
3.22 (2 points) You assume each exposure each year has the same Poisson Distribution.
^
Fit a Poisson Distribution via maximum likelihood. What is λ ?
A. 2.6% B. 2.8% C. 3.0% D. 3.2% E. 3.4%
3.23 (2 points) You assume each exposure each year has the same Binomial Distribution with
m = 2. Fit a Binomial Distribution via maximum likelihood. What is q^ ?
3.24 (2 points) You assume each exposure each year has the same Geometric Distribution.
^
Fit a Geometric Distribution via maximum likelihood. What is β ?
3.25 (2 points) You assume each exposure each year has the same Negative Distribution with
^
r = 1.5. Fit a Negative Distribution via maximum likelihood. What is β ?
3.26 (2 points) In baseball, a pitcherʼs earned run average (ERA) is the number of earned runs he
allows per 9 innings. Chris Cross is a pitcher who has allowed 40 earned runs in 100 innings.
Assume that the number of earned runs a pitcher allows per inning is Poisson.
Determine a 90% confidence interval for the underlying mean ERA of Chris Cross.
3.27 (2 points) Data has been collected on 1000 insurance contracts, and a distribution has been fit
by maximum likelihood. Determine the corresponding loglikelihood.
number of claims per contract observed number fitted number
0 852 851
1 113 117
2 28 24
3 7 6
4 and more 0 2
A. -600 B. -580 C. -560 D. -540 E. -520
3.28 (3 points) You are given the following data on the annual number of claims:
0 70
1 25
2 5
3 or more 0
You assume a Negative Distribution with β = 0.2.
Fit a Negative Distribution via maximum likelihood. What is r^ ?

A. Less than 1.70
E. 1.85 or more
3.29 (3 points) Claim counts follow a Poisson distribution with mean lambda.
You observe five years of data: 3, 0, 2, at most 1, and 1.
Determine the maximum likelihood estimate of lambda.
A. 1.29 B. 1.31 C. 1.33 D. 1.35 E. 1.37
3.30 (2 points) For a data set of size 700 from a Negative Binomial Distribution with r = 3, determine
Fisherʼs Information.
3.31 (3 points) You observe 100 insureds.

20 of them have fewer than 2 claims.
Fit a Binomial Distribution with parameter m = 10 to this data via the method of maximum likelihood.
What is the fitted value of q?
A. 0.23 B. 0.25 C. 0.27 D. 0.29 E. 0.31
3.32 (4 points) You have fit via maximum likelihood a Geometric Distribution to 80,000 data points.
The resulting estimate of β is 5.
Determine an approximate 90% confidence interval for the probability of three claims next year.
3.33 (3 points) You are given the following data on 1000 policies:
Number of Claims per Policy Number of Policies
0 579
1 332
2 79
3 9
4 1
Fit a binomial model with m = 4 using the method of maximum likelihood.
What is the maximum loglikelihood?
A. -936 B. -933 C. -930 D. -927 E. -924
3.34 (2 points) You are given the following data for the number of claims during a one-year period:
0 104
1 37
2 11
3 6
4 2
5+ 0
Total 160
A Geometric distribution is fitted to the data using the method of moments.
Let G = probability of two claims using the fitted Geometric model.
A Poisson distribution is fitted to the data using maximum likelihood estimation.
Let P = probability of two claims using the fitted Poisson model.
Calculate 10,000 |G - P|.
(A) Less than 40
(E) At least 70
3.35 (2 points) For a data set of size 100 from a Binomial Distribution with m = 5,
determine Fisherʼs Information.
3.36 (4 points) You observe the following data on 15 insureds:

0 or 1 5
2 or 3 7
more than 3 3
You fit a Geometric Distribution via the method of maximum likelihood. What is the fitted β?
A. 1.8 B. 2.0 C. 2.2 D. 2.4 E. 2.6

• Annual claim counts for policies follow a Poisson distribution with mean λ.
• 70 out of 200 policies have zero claims.
Calculate the maximum likelihood estimate of λ.
A. 0.90 B. 0.95 C. 1.00 D. 1.05 E. 1.10
3.38 (4 points) You have fit via maximum likelihood a Binomial with m = 6 to 2000 data points.
The estimate of q is 0.15.
Determine an approximate 95% confidence interval for the probability of no claims next year.
3.39 (2 points) For a data set of size 40 from a Poisson Distribution, what is the information?
3.40 (2 points) Last year there were 31 claims from Region 1 and 50 claims from Region 2.
You assume that the distribution of the annual number of claims from Region 1 is Geometric.
You assume that the distribution of the annual number of claims from Region 2 is
Negative Binomial with the same β as Region 1 but twice the mean.
Estimate β via the method of maximum likelihood.
A. 27 B. 28 C. 29 D. 30 E. 31
3.41 (3 points) X1 , X2 , ... X200 are independent, identically distributed variables.

Xi is 1 with probability q and zero with probability 1-q.
200
If ∑ Xi = 40, determine the information associated with the maximum likelihood estimator of q.
i=1
A. 500 B. 750 C. 1000 D. 1250 E. 1500

3.42 (2, 5/85, Q.20) (1.5 points) Let X1 , X2 , X3 , and X4 be a random sample from the discrete
distribution X such that P[X = x] = θ2x exp[-θ2] / x!, for x = 0, 1, 2, . . . , where θ > 0.
If the data are 17, 10, 32, and 5, what is the maximum likelihood estimate of θ?
A. 4 B. 8 C. 16 D. 32 E. 64
3.43 (2, 5/88, Q.42) (1.5 points) Let X be the number of customers contacted in a given week
before the first sale is made. The random variable X is assumed to have probability function
f(x) = p(1 - p)x-1 for x = 1, 2, . . . , where p is the probability of a sale on any contact.
For three independently selected weeks, the values of X were 7, 9, and 2.
What is the maximum likelihood estimate of p?
A. 1/18 B. 1/15 C. 1/6 D. 1/5 E. 6
3.44 (2, 5/90, Q.25) (1.7 points) Let X1 , X2 , X3 be independent Poisson random variables with
means θ, 2θ, and 3θ, respectively. What is the maximum likelihood estimator of θ?
A. X /2 B. X C. (X1 + 2X2 + 3X3 )/6
D. (3X1 + 2X2 + X3 )/6 E. (6X1 + 3X2 + 2X3 )/11
3.45 (165, 5/96, Q.3) (1.9 points) A coin is believed by Mark to be biased such that the
probability of heads is 0.9. Mark tosses the coin 100 times and 50 heads are observed.
Mark decides to revise his estimate of the probability of heads by taking a weighted average of his
a priori estimate and the maximum likelihood estimate from his experiment.
The weight given to his a priori estimate is the a priori standard deviation and the weight given to his
maximum likelihood estimate is the estimated standard deviation.
Determine the revised estimate of the probability of heads.
(A) 0.61 (B) 0.65 (C) 0.70 (D) 0.75 (E) 0.79
• The number of claims per year for a given risk follows a distribution with
probability function p(n) = λn e-λ / n! , n = 0, 1,..., λ > 0 .
• Two claims were observed for this risk during Year 1 and one claim was
observed for this risk during Year 2.
If λ is known to be an integer, determine the maximum likelihood estimate of λ .
A. 1 B. 2 C. 3 D. 4 E. 5
3.47 (Course 4 Sample Exam 2000, Q.3)

A fleet of cars has had the following experience for the last three years:
Earned Car Years Number of Claims
500 70
750 60
1000 100
The Poisson distribution is used to model this process. Determine the maximum likelihood estimate
of the Poisson parameter for a single car year.
3.48 (4, 11/02, Q.6 & 2009 Sample Q. 34) (2.5 points) The number of claims follows a negative
binomial distribution with parameters β and r, where β is unknown and r is known. You wish to
estimate β based on n observations, where x is the mean of these observations.
Determine the maximum likelihood estimate of β.
(A) x / r2 (B) x / r (C) x (D) r x (E) r2 x
3.49 (CAS3, 5/05, Q.20) (2.5 points) Blue Sky Insurance Company insures a portfolio of 100
automobiles against physical damage. The annual number of claims follows a binomial distribution
with m = 100. For the last 5 years, the number of claims in each year has been:
Year 1: 5 Year 2: 4 Year 3: 4 Year 4: 9 Year 5: 3
Two methods for estimating the variance in the annual claim count are:
Method 1: Unbiased Sample Variance
Method 2: Maximum Likelihood Estimation
Use each method to calculate an estimate of the variance.
What is the difference between the two estimates?
A. Less than 0.50
E. 0.80 or more
3.50 (CAS3, 11/05, Q.4) (2.5 points) When Mr. Jones visits his local race track, he places three
independent bets. In his last 20 visits, he lost all of his bets 10 times, won one bet 7 times, and won
two bets 3 times. He has never won all three of his bets.
Calculate the maximum likelihood estimate of the probability that Mr. Jones wins an individual bet.
A. 13/60 B. 4/15 C. 19/60 D. 11/30 E. 5/12
3.51 (4, 11/05, Q.29 & 2009 Sample Q.239) (2.9 points)
You are given the following data for the number of claims during a one-year period:
0 157
1 66
2 19
3 4
4 2
5+ 0
Total 248
A geometric distribution is fitted to the data using maximum likelihood estimation.
Let P = probability of zero claims using the fitted geometric model.
A Poisson distribution is fitted to the data using the method of moments.
Let Q = probability of zero claims using the fitted Poisson model.
Calculate |P - Q|.
(A) 0.00 (B) 0.03 (C) 0.06 (D) 0.09 (E) 0.12
3.52 (CAS3, 5/06, Q.2) (2.5 points)

Annual claim counts follow a Negative Binomial distribution.
The following claim count observations are available:
Year Claim Count
2005 0
2004 3
2003 5
Assuming each year is independent, calculate the likelihood function of this sample.
⎛ 1 ⎞ 3r ⎛ β ⎞ 8 r2 (r + 2)2 (r + 4)
A. ⎜ ⎟ ⎜ ⎟
⎝ β + 1⎠ ⎝ β + 1⎠ 3! 5!
⎛ 1 ⎞ 3r ⎛ β ⎞ 8 r2 (r + 2)2 (r + 4)
B. ⎜ ⎟ ⎜ ⎟
⎝ β + 1⎠ ⎝ β + 1⎠ 2! 4!
⎛ 1 ⎞ 3r ⎛ β ⎞ 8 r2 (r + 1)2(r + 2)2 (r + 3)
C. ⎜ ⎟ ⎜ ⎟
⎝ β + 1⎠ ⎝ β + 1⎠ 2! 4!
⎛ 1 ⎞ 3r ⎛ β ⎞ 8 r2 (r +1)2(r + 2)2 (r + 3) (r + 4)
D. ⎜ ⎟ ⎜ ⎟
⎝ β + 1⎠ ⎝ β + 1⎠ 2! 4!
⎛ 1 ⎞ 3r ⎛ β ⎞ 8 r2 (r +1)2(r + 2)2 (r + 3) (r + 4)
E. ⎜ ⎟ ⎜ ⎟
⎝ β + 1⎠ ⎝ β + 1⎠ 3! 5!
(i) The distribution of the number of claims per policy during a one-year period for
10,000 insurance policies is:
0 5000
1 5000
2 or more 0
(ii) You fit a binomial model with parameters m and q using the method of maximum likelihood.
Determine the maximum value of the loglikelihood function when m = 2.
(A) -10,397 (B) -7,781 (C) -7,750 (D) -6,931 (E) -6,730
(i) A hospital liability policy has experienced the following numbers of claims over a 10-year period:
10 2 4 0 6 2 4 5 4 2
(ii) Numbers of claims are independent from year to year.
(iii) You use the method of maximum likelihood to fit a Poisson model.
Determine the estimated coefficient of variation of the estimator of the Poisson parameter.
(A) 0.10 (B) 0.16 (C) 0.22 (D) 0.26 (E) 1.00

(i) The distribution of the number of claims per policy during a one-year period for a block of 3000
insurance policies is:
0 1000
1 1200
2 600
3 200
4+ 0
(ii) You fit a Poisson model to the number of claims per policy using the method of maximum
likelihood.
(iii) You construct the large-sample 90% confidence interval for the mean of the underlying Poisson
model that is symmetric around the mean.
Determine the lower end-point of the confidence interval.
(A) 0.95 (B) 0.96 (C) 0.97 (D) 0.98 (E) 0.99
3.56 (CAS3L, 11/08, Q.6) (2.5 points) You are given the following:
• An insurance company provides a coverage which can result in only three loss amounts in the
event that a claim is filed: $0, $500 or $1,000.
• The probability, p, of a loss being $0 is the same as the probability of it being $1,000.
• The following 3 claims are observed: $0 $0 $1,000
What is the maximum likelihood estimate of p?
A. Less than 0.20
E. At least 0.80
• The number of trials before success follows a geometric distribution.
• A random sample of size 10 from that process is:
0 1 2 3 4 4 5 6 7 8
Calculate the maximum likelihood estimate of the variance for the underlying geometric distribution.
A. Less than 10
E. At least 16
• Daily claim counts follow a Poisson distribution with mean λ.
• Exactly five of the last nine days have zero claims.
Calculate the maximum likelihood estimate of λ.
A. Less than 0.25
E. At least 0.55
3.59 (2 points) In the previous question, CAS3L, 5/10, Q.21, assume instead that daily claim
counts follow a Geometric distribution with mean β.
Calculate the maximum likelihood estimate of β.
A. 0.4 B. 0.5 C. 0.6 D. 0.7 E. 0.8
3.60 (CAS3L, 11/12, Q.17) (2.5 points)

A colleague produces a maximum likelihood estimate of ^q = 0.40 for a binomial distribution with
m = 1 using the sample data below.
Observation 1 2 3
Sample Size 100 50 y
Successes 35 18 41
Determine the value of y.
A. Less than 80
E. At least 86
Note: I have rewritten this past exam question.
• The number of trials before success follows a geometric distribution.
• A random sample of size 10 from that process is: 1, 0, 5, 6, 4, 8, 2, 3, 7, 4.
Calculate the maximum likelihood estimate of the variance for the underlying geometric distribution.
A. Less than 10
E. At least 16
3.1. B. For the Poisson with parameter λ , the method of maximum likelihood gives the same result
as the method of moments: λ = Σni / N = 1200 / 1000 = 1.2.
3.2. C. The estimated mean is λ = 1.2.

∂ ln[f(x)] ∂2 ln[f(x)]
ln f(x) = xlnλ -λ - ln(x!). = x /λ - 1. = -x/λ2.
∂λ ∂λ 2
∂2 ln[f(x)]
E[ ] = E[ -x/λ2] = -E[x]/λ2 = -λ/λ2 = -1/λ.
∂λ 2
^ ∂2 ln[f(x)]
Thus Var[ λ ] ≅ -1/{nE[ ]} = λ/n = 1.2 / 1000 = 0.0012.
∂λ 2
Standard Deviation = 0.0012 = 0.0346. An approximate 95% confidence interval for the mean
frequency is the estimated mean ± 1.96 standard deviations:
1.20 ± (1.96)(0.036) = 1.20 ± 0.07. The upper end is 1.27.
Alternately, maximum likelihood is equal to the method of moments.
^
Var[ λ ] = Var[ X ] = Var[X]/n = λ/n = 1.2 / 1000 = 0.0012. Proceed as before.
3.3. E. The estimated chance of two or more claims is 1 - e−λ - λe−λ = 1 - (2.2e-1.2) = 0.337.
∂h
h = 1 - (1+ λ)e−λ. = (1+ λ)e−λ - e−λ = λe−λ = 1.2e-1.2 = 0.361.
∂λ
∂h 2
Var[h] = ( ) Var[λ] = (0.361)2 (0.0012) = 0.000156.
∂λ

An approximate 95% confidence interval for the chance of two or more claims is: 0.337 ± 0.025.
Comment: A one-dimensional example of the delta method, as discussed in “Mahlerʼs Guide to
Fitting Loss Distributions.”
3.4. B. For m fixed, maximum likelihood equals the method of moments. The mean is:
{(100)(0) + (267)(1) + (311)(2) + (208)(3) + (87)(4) + (23)(5) + (4)(6)}/ 1000 = 2.000.
q = mean/ m = 2.000/7= 0.286.
3.5. A. For the Poisson, the method of maximum likelihood equals the method of moments.
λ = (60 + 80 + 100)/(1500 + 1700 + 2000) = 0.0462.
Alternately, in year one, the number of claims is a Poisson frequency process with mean 1500 λ.
In year one, the likelihood is f(60) = e−1500λ (1500 λ)60 / 60!.

In year one, the loglikelihood is ln(f(60)) = 60 ln(λ) + 60 ln(1500) - 1500λ - ln(60!).
Thus the sum of the loglikelihoods over the three years is:
(60 + 80 + 100)ln(λ) + 60ln(1500) + 80ln(1700) + 100ln(2000) - (1500 + 1700 + 2000)λ - ln(60!)
- ln(80!) - ln(100!).
Taking the partial derivative with respect to λ and setting it equal to zero:
0 = 240/λ - 5200. ⇒ λ = 240/5200 = 0.0462.

Comment: Similar to Course 4 Sample Exam, question 3.
3.6. C. The likelihood is the product of: f(5)f(9)f(3) = (λ5e−λ/5!) (λ9e−λ/9!) (λ3e−λ/3!) =
λ 17e−3λ / 261,273,600.
Trying all the given values of lambda, the maximum likelihood occurs at λ = 6.
Lambda Likelihood
4 0.00040
5 0.00089
6 0.00099
7 0.00068
8 0.00033
Comments: Similar to 4B, 11/98, Q.17.
3.7. D. For m fixed, maximum likelihood equals the method of moments. So for fixed m,
q = mean/m = 1.5/m. Thus the table contains the best set of parameters for each m.
The best loglikelihood is for m = 7 and q = 0.2143. f(0) = (1 - 0.2143)7 = 0.1848.
3.8. A. The likelihood is the product of terms, one for each interval. For each interval one takes the
probability covered by the interval, to the power equal to the number of insureds observed for that
interval. For example, the probability covered by the interval from 0 to 5 is:
f(0) + f(1) + f(2) + f(3) + f(4) + f(5) = e−λ {1 + λ + (λ2 / 2!) + (λ3 / 3!) + (λ4 / 4!) + (λ5 / 5!)} =
i=5
e−λ ∑ λi / i!
i=0
This term will be taken to the power 372, since that is the number of insureds observed in the
interval from 0 to 5:
i=5
e−372λ { ∑ λi / i! }372
i=0
In total the likelihood function is the product of the contributions of each interval:
i=5 i=10 i=15 i=20
e−1000λ { ∑ λ i / i! }372 { ∑ λ i / i! }549 { ∑ λ i / i! }78 { ∑ λi / i! }.
i=0 i=6 i=11 i=16
3.9. B. The loglikelihood is the sum of terms, one for each interval.
For each interval one takes the log of the probability covered by the interval times the number of
insureds observed for that interval.
Interval Number of Insureds Probability Contribution to the Loglikelihood
0 85 e−λ (85)(-λ)
1 or more 15 1 - e−λ (15)ln(1 - e−λ)
Loglikelihood = -85λ + 15ln(1 - e−λ).

In order to maximize the loglikelihood, take its derivative and set it equal to zero:
0 = -85 + 15e−λ/(1 - e−λ). ⇒ e−λ = 0.85. ⇒ λ = 0.163.
Alternately, let 1 - q be the probability covered by the first interval and q be the probability covered
by the second interval. Then the Iikelihood is: (1-q)85 q15.
This is the same as the Iikelihood from a Bernoulli when we observe 15 events out of 100 trials.
Therefore, the maximum likelihood value of q is 15/100 = 0.15.
0.15 = 1- q. ⇒ e−λ = 0.85. ⇒ λ = 0.163.
Comment: One could have instead maximized the likelihood.
3.10. E. For a Poisson Distribution, f(x) = e−λλ x/x!. ln f(x) = -λ + xln(λ) - ln(x!).
Rural Loglikelihood is: Σ -λR + xiln(λR) - ln(xi!) = -250000λR + 5000ln(λR) - ∑ ln(xi !).
Rural
Assuming λU = 1.5λR, Urban Loglikelihood is: Σ -1.5λR + xiln(1.5λR) - ln(xi!) =
-312500(1.5λR) + 10000ln(1.5λR) - ∑ ln(xi!)

Urban
Total Loglikelihood = -{250,000 + (1.5)(312,500)}λR + 15000ln(λR) + 10000ln(1.5) - Σln(xi!).
Setting the partial derivative with respect to λR equal to zero:
0 = -718,750 + 15,000/λR. ⇒ λR = 15000/718750 = 2.087%. ⇒ 250,000λR = 5217.

Alternately, 312,500 urban exposures are expected to produce as many claims as (1.5)(312,500)
= 468,750 rural exposures.
For the Poisson, maximum likelihood = method of moments.
λ R = (5000 + 10000)/(250,000 + 468,750) = 2.087%. ⇒ 250,000λR = 5217.
Comment: This trick of adjusting exposures works for a Poisson frequency, but does not work in
general for other frequency distributions. Similar to 4, 11/00, Q. 34, involving exponential severities,
in “Mahlerʼs Guide to Fitting Loss Distributions.” A similar trick of adjusting losses works for an
Exponential severity, but does not work in general for other severity distributions.
3.11. C. f(0) = 1/(1 + β)r. f(2) = {r(r+1)/2!}β2/(1 + β)r+2. f(3) = {r(r+1)(r+2)/3!}β3/(1 + β)r+3.
f(4) = {r(r+1)(r+2)(r+3)/4!} β4/(1 + β)r+4.
Likelihood is: f(0)f(2)f(3)f(4) = {r3 (r+1)3 (r+2)2 (r+3)/288} β9/(1 + β)4 r + 9.

Comment: One would need a computer in order to maximize the likelihood.
3.12. A. For grouped data, the loglikelihood is: Σ ni ln[F(bi) - F(ai)] =

100 ln f(0) + 267 ln f(1) + 311 ln f(2) + 208 ln f(3) + 114 ln(1 - f(0) - f(1) - f(2) - f(3)) =
100 ln(e−λ) + 267 ln(λe−λ) + 311 ln(λ2e−λ/2) + 208 ln(λ3e−λ/6) +
114 ln(1 - e−λ - λe−λ - λ2e−λ/2 - λ3e−λ/6) =

-100λ - 267λ + 267ln(λ) - 311λ + (2)(311)ln(λ) - 311 ln(2) - 208λ + (3)(208)ln(λ)
- 208 ln(6) - 114λ + 114 ln(eλ - 1 - λ - λ2/2 - λ3/6) =
-1000λ + 1513 ln(λ) + 114 ln(eλ - 1 - λ - λ2/2 - λ3/6) - 519 ln2 - 208 ln3.
Comment: For the Poisson, for grouped data, the method of maximum likelihood generally differs
from the method of moments. One could instead maximize the likelihood:
λ 1513 e-886λ (1 - e−λ - λe−λ - λ2e−λ/2 - λ3e−λ/6)114 /{2311 6208}.
Here is a graph of the loglikelihood as a function of lambda:
- 1600
- 1700
- 1800
- 1900
- 2000
1.5 2 2.5 3 3.5 4

The maximum loglikelihood is -1533.12 at lambda = 2.03011.
3.13. C. X = {(0)(390) + (1)(324) + (2)(201) + (3)(77) + (4)(8)} / 1000 = 0.989.

For the Poisson, for ungrouped data, Method of Maximum Likelihood = Method of Moments.
^ ^
λ = X = 0.989. Var[ λ ] = Var[ X ] = Var[X]/n = λ/1000 = 0.989/1000 = 0.000989.
3.14. D. For the Negative Binomial with r fixed, for ungrouped data,
^
Method of Maximum Likelihood = Method of Moments. β = X /r = .989/3 = 0.330.
^
Var[ β] = Var[ X /3] = Var[ X ]/9 = Var[X]/ 9n = rβ(1+β)/9000 = (3)(0.330)(1.330)/9000 = 0.000146.
^
Comment: Var[ β] = β(1+β)/ (r n) = (0.330)(1.330)/3000 = 0.000146.
3.15. E. f(1) = λe−λ. f(2) = λ2e−λ / 2. f(4) = λ4e−λ / 24. f(5) = λ5e−λ / 120.
Likelihood is: f(1)f(2)f(4)f(5) = λ12 e−4λ / 5760.

Comment: The likelihood is maximized for λ = X = 3.
3.16. E. The first five years each contribute the appropriate Poisson density to the likelihood.
The sixth year contributes the probability of either 0 or 1 claim: e−λ + λe−λ = e−λ(1 + λ).
The likelihood is: {e−λ λ10/10!}{e−λ λ2/2!} {e−λ λ4/4!} {e−λ} {e−λ λ6/6!} e−λ(1 + λ).
Ignoring the factorials, this is proportional to: e−6λ λ22 + e−6λ λ23.
Setting the derivative with respect to lambda equal to zero:
0 = -6e−6λ λ22 + 22e−6λ λ21 - 6e−6λ λ23 + 23e−6λ λ22.
0 = -6λ + 22 - 6λ2 + 23λ. ⇒ 6λ2 - 17λ - 22 = 0.
⇒ λ = {17 + 172 - (4)(6)(-22) } / {(2)(6)} = 3.80.

Comment: The first five observations are ungrouped data, while the sixth observation is grouped.
If there were 0 claims in year 6, then maximum likelihood is equal to method of moments:
^
λ = (10 + 2+ 4 + 0 + 6 + 0)/6 = 3.67.
If there were 1 claim in year 6, then maximum likelihood is equal to method of moments:
^
λ = (10 + 2+ 4 + 0 + 6 + 1)/6 = 3.83.
Thus we expect the answer to this question to be somewhere between 3.67 and 3.83.
3.17. B. Mean = {(9000)(0) + (800)(1) + (180)(2) + (20)(3)} / 10,000 = 1220/10,000 = 0.1220.

2nd moment = {(9000)(0) + (800)(1) + (180)(4) + (20)(9)} / 10,000 = 1700/10,000 = 0.1700.
Variance = 0.1700 - 0.12202 = 0.15512.
For the Poisson, maximum likelihood is equal to the method of moments. λ = 0.1220.
P = Prob[N > 0] = 1 - e−λ = 1 - e-0.1220 = 0.11485.

For the Negative Binomial, rβ = 0.1220, and rβ(1+β) = 0.15512.
⇒ β = 0.2715. ⇒ r = 0.4494.
Q = Prob[N > 0] = 1 - 1/(1+β)r = 1 - 1/1.2715.4494 = 0.10232.
|P - Q| = | 0.11485 - 0.10232 | = 0.01253.
3.18. A. X = 372/4000 = 0.093.

For m fixed, maximum likelihood is equal to the method of moments.
mq = 2q = 0.093. q = 0.0465. f(2) = 0.04652 = 0.216%. (4000)(0.216%) = 8.6.
3.19. C. X = 372/4000 = 0.093. Maximum likelihood is equal to the method of moments.

λ = 0.093. f(2) = 0.0932 e-0.093 / 2! = 0.394%. (4000)(0.394%) = 15.8.
3.20. E. X = 372/4000 = 0.093. Maximum likelihood is equal to the method of moments.

β = 0.093. f(2) = β2/(1 + β)3 = .0932 / (1.093)3 = 0.662%. (4000)(0.662%) = 26.5.
3.21. D. X = 372/4000 = 0.093.

For r fixed, maximum likelihood is equal to the method of moments.
rβ = β/2 = 0.093. β = 0.186. f(2) = {(0.5)(1.5)/2!} 0.1862 / (1.186)2.5 = 0.847%.
(4000)(0.847%) = 33.9.
3.22. A. In this situation, the method of moments is equal to maximum likelihood.

^
λ = (16,363 + 15,745 + 16,009)/(632,121 + 594,380 + 625,274) =
48,117/ 1,851,775 = 2.60%.
3.23. In this situation, the method of moments is equal to maximum likelihood.

2 q^ = (16,363 + 15,745 + 16,009)/(632,121 + 594,380 + 625,274) =
48,117/ 1,851,775 = 2.60%. ⇒ q^ = 2.60%/2 = 1.30%.

Alternately, the first year has a Binomial with m = (2)(632,121) = 1264242.
f(16363) = {1264242!/(16363!(1264242 - 16363)!)} q16363 (1- q)(2)(632,121) - 16363
The contribution to the loglikelihood is: 16363 lnq + {(2)(632,121) - 16363}ln(1 - q) + constants.
Summing up the contributions from the years, the loglikelihood is:
48,117 lnq + {(2)(1,851,775) - 48,117}ln(1 - q) + constants.
Setting the partial derivative with respect to q equal to zero:
0 = 48,117/q + {(2)(1,851,775) - 48,117}/(1 - q).
⇒ 0 = 48,117(1 - q) + q{(2)(1,851,775) - 48,117} ⇒ q^ = 48,117/{(2)(1,851,775)} = 1.30%.

^
β = (16,363 + 15,745 + 16,009)/(632,121 + 594,380 + 625,274) = 48,117/ 1,851,775 = 2.60%.
Alternately, the first year has a Negative Binomial with r = 632,121.
f(16363) = (632121 + 16363 - 1)!/{(632,121 - 1)! 16363!} β16363/(1 + β)16363+632121.
The contribution to the loglikelihood is: 16363 lnβ + (16363 + 632,121)ln(1 + β) + constants.
48,117 lnβ + (1,851,775 + 48,117) ln(1 + β) + constants.
Setting the partial derivative with respect to β equal to zero:
0 = 48,117/β + (1,851,775 + 48,117)/(1 + β).
⇒ 0 = 48,117(1 + β) + β(1,851,775 + 48,117) ⇒ β^ = 48,117/1,851,775 = 2.60%.


^
1.5 β = (16,363 + 15,745 + 16,009)/(632,121 + 594,380 + 625,274) = 2.60%.
⇒ β^ = 2.60%/1.5 = 1.73%.
Alternately, the first year has a Negative Binomial with r = (1.5)(632,121).
f(16363) = (constants) β16363/(1 + β)16363+(1.5)(632121).
The contribution to the loglikelihood is: 16363 lnβ + {16363 + (1.5)(632,121)}ln(1 + β) + constants.
48,117 lnβ + {(1.5)(1,851,775) + 48,117} ln(1 + β) + constants.
Setting the partial derivative with respect to β equal to zero:
0 = 48,117/β + {(1.5)(1,851,775) + 48,117}/(1 + β).
⇒ 0 = 48,117(1 + β) + β{(1.5)1,851,775 + 48,117} ⇒ β^ = 48,117/{(1.5)(1,851,775)} = 1.73%.
^
3.26. Using either the method of moments or maximum likelihood, λ = 40/100 = 0.4 per inning.
^
Var[ λ ] = (Variance for a single inning) / (number of innings observed) = λ/100 = 0.004.
^
A 90% confidence interval for λ is: 0.40 ± 1.645 0.004 = 0.400 ± 0.104 = 0.296 to 0.504.
To convert to an ERA we multiply by 9: 2.66 to 4.54.
Comment: There is a lot of random fluctuation in the results of pitching only 100 innings.
3.27. E. From the fitted numbers of contracts in the table, for the fitted distribution:
f(0) = 851/1000 = 0.851, f(1) = 0.117, f(2) = 0.024, f(3) = 0.006,
Prob[4 or more] = 0.002.
Based on the observations, the loglikelihood is:
852 ln[f(0)] + 113 ln[f(1)] + 28 ln[f(2)] + 7 ln[f(3)] =
852 ln[0.851] + 113 ln[0.117] + 28 ln[0.024] + 7 ln[0.006] = -520.16.
Comment: The fitted model is a Negative Binomial with r = 0.501 and β = 0.379.
2
1 rβ r (r + 1) β
3.28. D. f(0) = . f(1) = . f(2) = .
(1 + β)r (1 + β)r + 1 2 (1 + β)r + 2
loglikelihood is: 70 ln[f(0)] + 25 ln[f(1)] + 5 ln[f(2)] =

-70 r ln(1.2) + (25){ln(r) + ln(0.2) - (r+1)ln(1.2)} + (5){ln(r) + ln(r+1) - ln(2) + 2ln(0.2) - (r+2)ln(1.2)}.
Setting the partial derivative of the loglikelihood with respect to r equal to 0:
0 = -70 ln(1.2) + 25/r - 25ln(1.2) + 5/r + 5/(r+1) - 5 ln(1.2).
0 = (r)(r+1) 100 ln(1.2) - 30(r+1) - 5r.
0 = 100 ln(1.2) r2 + {100 ln(1.2) - 35}r - 30.
0 = 18.232 r2 - 16.768 r - 30 = 0. ⇒
16.768 ± 16.7682 - (4)(18.232)(-30)
r= = 1.823. (r has to be positive.)
(2)(18.232)
Comment: Since there are no policies with more than 2 claims, there is no contribution from ln[f(3)],
etc., and therefore we can solve in closed form for r.
The Method of Moments fit is: r = 0.35/0.2 = 1.75.
Here is a graph of the loglikelihood as a function of r:
loglike.
- 76
- 77
- 78
- 79
- 80
- 81
- 82
r
1.0 1.5 2.0 2.5 3.0
3.29. B. The likelihood associated with at most one is: F(1) = f(0) + f(1).
Thus the likelihood is: f(3) f(0) f(2) F(1) f(1) = f(3) f(0) f(2) {f(0) + f(1)} f(1) =
(λ3 e-λ / 6) e-λ (λ2 e-λ / 2) (e-λ + λ e-λ) (λ e-λ) = λ6 e-5λ (1 + λ) / 12.
Thus the loglikelihood is: 6 ln(λ) - 5λ + ln(1+λ) - ln(12).
Setting the derivative of the loglikelihood equal to zero:
6/λ - 5 + 1/(1+λ) = 0. ⇒ 6(1+λ) - 5(1+λ)λ + λ = 0. ⇒ 5λ2 - 2λ - 6 = 0.
2 + 22 - (4)(5)(-6)
⇒ λ= = 1.314.
(2)(5)
Comment: If we were to treat the fourth year as a observation of 1/2, then the sample mean would
be: 6.5/5 = 1.3, close to the right answer.
When dealing with the Poisson, if you do not know what else to do, try something intuitive.
The fourth year acts like grouped data.
(3)(4)...(2 + x) βx
3.30. f(x) = .
x! (1+ β) x+ 3
ln f(x) = ln[3] + ln[4] + ... ln[2 + x] - ln[x!] + x ln[β] - (x+3) ln[1+β].

∂ ln[f(x)] x x + 3 ∂2 ln[f(x)] x x+3
= - . =- 2 + .
∂β β 1+ β ∂β 2
β (1+ β) 2
∂2 ln[f(x)] 3β 3β + 3 1 1 3
E[X] = 3β. Thus E[ ]=- 2 + =3{ - } =- .
∂β 2
β (1+ β) 2 1+β β β (1+ β)
∂2 ln[f(x)] 3 2100
Fisherʼs Information = -n E[ ] = (-700){- }= .
∂q 2 β (1+ β) β (1+ β )
Alternately, for the Negative Binomial with r fixed, maximum likelihood is equal to the method of
^ ^
moments. β = X /3. ⇒ Var[ β ] = Var[ X ]/9 = Var[X]/630 = {(3)β(1+β)} / 6300 = β(1+β) / 2100.
^ 1 2100
However, Var[ β ] = . ⇒ Fisherʼs Information = .
Fisher's Information β (1+ β )
rn
Comment: For a Negative Binomial with r fixed, for sample size n, Fisherʼs Information = .
β (1+ β)
3.31. C. Let p = f(0) + f(1). Then the loglikelihood is: 20 ln[p] + 80 ln[1 - p].
Setting the derivative with respect to p equal to zero:
20/p - 80/(1-p) = 0. ⇒ p = 0.2. ⇒ 0.2 = f(0) + f(1) = (1-q)10 + 10 q (1 - q)9 = (1 + 9q) (1 - q)9 .
Try the choices and for q = 0.27, (1 + 9q) (1 - q)9 = 0.202.
3.32. For the Geometric, f(x) = βx/ (1 + β)x+1. ln[f(x)] = x ln[β] + (x + 1) ln[1+β].
∂ln[f(x)] ∂2 ln[f(x)]
= x/β + (x + 1)/(1+β). = -x/β2 - (x + 1)/(1+β)2 .
∂β ∂β2
∂2 ln[f(x)] 2 - (β + 1)/(1+β)2 = -1/β - 1/(1+β) = - 1

E[ ] = -β/β .
∂β2 β (1+ β)
∂2 ln[f(x)] ^
-n E[ ] = 80,000 / {(5)(6)} = 2667. Var[ β ] = 1/2667 = 0.000375.
∂β2
The quantity of interest is the density at three: h(β) = β3 / (1 + β)4 .

∂h (1+ β) 4 3β 2 - β 3 4(1+ β)3
= = 3β2 / (1+β)4 - 4β3 / (1+β)5
∂β (1+β) 8
= (3)(52 )/64 - (4)(53 )/65 = -0.00643.

∂h 2 ^
Var[h] = ( ) Var[ β ] = (-0.00643)2 (0.000375). StdDev[h] = 0.000125.
∂β
Thus an approximate 90% confidence interval for the probability of 3 claims next year is:
53 /64 ± (1.645)(0.000125) = 0.09645 ± 0.00021.
Comment: For the Geometric Distribution, maximum likelihood is equal to the method of moments.
^ ^ β (1+ β) (5)(6)
β = X . Thus, Var[ β ] = Var[ X ] = Var[X]/n = = = 0.000375.
n 80,000
3.33. B. Maximum likelihood is the same as the method of moments:

mq = X . ⇒ 4q = 0.521. ⇒ q = 0.13025.
ln[f(0)] = ln[(1-q)4 ] = 4 ln(1-0.13025) = -0.55820.
ln[f(1)] = ln[4q(1-q)3 ] = ln[4] + ln[0.13025] + 3 ln(1-0.13025) = -1.07065.
ln[f(2)] = ln[6q2 (1-q)2 ] = -2.56394. ln[f(3)] = ln[4q3 (1-q)] = -4.86815.
ln[f(4)] = ln[q4 ] = -8.15320.
The loglikelihood is:
(579)(-0.55820) + (332)(-1.07065) + (79)(-2.56394) + (9)(-4.86815) + (1)(-8.15320) =
-933.171.
3.34. B. X = {(0)(105) + (1)(37) + (2)(11) + (3)(6) + (4)(2)}/160 = 85/160 = 0.53125.

For the Geometric, β = X = 0.53125. f(2) = β2 /(1 + β)3 = 0.07861.
For the Poisson, maximum likelihood is equivalent to method of moments, λ = X = 0.53125.
f(2) = λ2e−λ/2 = 0.08296.

10,000 |0.07861 - 0.08296| = 43.5.
5!
3.35. f(x) = (1-q)5-x qx.
x! (5 - x)!
ln f(x) = ln[5!] - ln[x!] - ln[(5-x)!] + (5-x)ln[1 - q] + x ln[q].
∂ ln[f(x)] 5 - x x
=- + .
∂q 1 - q q
∂2 ln[f(x)] 5 - x x
=- - 2.
∂q 2
(1 - q) 2 q
E[X] = 5q.
∂2 ln[f(x)] 5 - 5q 5q 1 1 5
Thus E[ ]=- - = -5 { + } = - .
∂q 2
(1 - q)2 q2 (1- q) q q (1- q)
∂2 ln[f(x)] 5 500
Fisherʼs Information = -n E[ ] = (-100){- }= .
∂q 2 q (1- q) q (1- q)
Alternately, for the Binomial with m fixed, maximum likelihood is equal to the method of moments.
q^ = X /5. ⇒ Var[ q^ ] = Var[ X ]/25 = Var[X]/2500 = {(5)q(1-q)} / 2500 = q(1-q) / 500.
1 500
However, Var[ q^ ] = . ⇒ Fisherʼs Information = .
Fisher's Information q (1- q)
mn
Comment: For a Binomial with m fixed, for sample size n, Fisherʼs Information = .
q (1- q)
3.36. E. The probability covered by the first interval is:

1 β 1 + 2β
f(0) + f(1) = + = .
1+β (1+ β) 2 (1+ β) 2
The probability covered by the second interval is:
β2 β3 (1 + 2β) β 2
f(2) + f(3) = + = .
(1+ β) 3 (1+ β) 4 (1+ β) 4
The probability covered by the final interval is:

β4
β4 β5 (1+ β)5 β4
f(4) + f(5) + .... = + + ... = = .
(1+ β) 5 (1+ β) 6 β (1+ β) 4
1 -
1+β
1 + 2β (1 + 2β) β 2 β4
Therefore, the loglikelihood is: 5 ln[ ] + 7 ln[ ] + 3 ln[ ]=
(1+ β) 2 (1+ β) 4 (1+ β) 4
12 ln[1 + 2β] + 26 ln[β] - 50 ln[1 + β].

Setting the derivative with respect to beta equal to zero:
0 = 24 / (1 + 2β) + 26/β - 50 / (1 + β). ⇒ 24β2 - 52β - 26 = 0. ⇒ 12β2 - 26β - 13 = 0. ⇒
26 ± 262 - (4)(12)(-13)
β= . Taking the positive root, β = 2.586.
(2)(12)
Comment: A graph of the loglikelihood as a function of β:

Loglikelihood
- 17.4
- 17.6
- 17.8
- 18.0
- 18.2
- 18.4
Beta
2.0 2.5 3.0 3.5 4.0 4.5
3.37. D. The probability of seeing zero claims is: e−λ.
The probability of seeing more than zero claims is: 1 - e−λ.
Thus the likelihood is: (e−λ)70 (1 - e−λ)130.
Therefore, the loglikelihood is: 70 (-λ) + 130 ln(1 - e−λ).

Setting the partial derivative of the loglikelihood with respect to λ equal to zero:
-70 + 130e−λ/(1 - e−λ) = 0. ⇒ 130e−λ = (70)(1 - e−λ). ⇒ e−λ = 70/200 = 0.35. ⇒ λ = 1.05.
Alternately, when we have data grouped into only two intervals, one can fit a single parameter via
maximum likelihood by setting the theoretical and empirical distribution functions equal at the
boundary between the two intervals. (The empirical distribution function at x is the percentage of
observations that are less than or equal to x.) In this case, we set e−λ = F(0) = 0.35. ⇒ λ = 1.05.
Comment: Similar to CAS3L, 5/10, Q.21.
This is an example of data grouped into intervals; in this case there are two intervals:
0 claims, and at least one claim.
⎛6⎞ ⎛6⎞
3.38. For the Binomial, f(x) = ⎜ ⎟ qx (1-q)6-x. ln[f(x)] = ln[ ⎜ ⎟ ] + x ln[q] + (6 - x) ln[1-q].
⎝ x⎠ ⎝ x⎠
∂ln[f(x)] ∂2 ln[f(x)]
= x/ q - (6 - x)/(1 - q). = -x/q2 - (6-x)/(1-q)2 .
∂q ∂q2
∂2 ln[f(x)] 6
E[ ] = -(6q)/q2 + (6 - 6q)/(1-q)2 = -6/(1-q) - 6/q = - .
∂q2 q(1- q)
∂2 ln[f(x)]
-n E[ ] = (2000) (6) / {(0.15)(0.85)} = 94,118.
∂q2
Var[ ^q] = 1/94,118 = 0.00001063.

∂h
The quantity of interest is the density at zero: h(q) = (1-q)6 . = -6(1-q)5 .
∂q
Var[h] = {-6(1-q)5 }2 Var[ ^q] = (36)(0.8510) (0.00001063). StdDev[h] = 0.00868.

Thus an approximate 95% confidence interval for the probability of no claims next year is:
0.856 ± (1.960)(0.00868) = 0.377 ± 0.017.
Comment: For a Binomial with m fixed, maximum likelihood is equal to the method of moments.
^ q(1- q) (0.15)(0.85)
q = X /m. Thus, Var[ ^q] = Var[ X ]/m2 = (Var[X]/n)/m2 = = = 0.00001063.
mn (6)(2000)
3.39. f(x) = λx e−λ / x!. ln f(x) = xln[λ] - λ - ln[x!].
∂ ln[f(x)] ∂2 ln[f(x)]
= x/λ - 1. = -x/λ2.
∂λ ∂λ 2
E[X] = λ.
∂2 ln[f(x)]
Thus E[ ] = -λ/λ2 = -1/λ.
∂λ 2
∂2 ln[f(x)]
Information = -n E[ ] = 40/λ.
∂λ 2
Alternately, for the Poisson, maximum likelihood is equal to the method of moments.
^ ^
λ = X . ⇒ Var[ λ ] = Var[ X ] = Var[X]/40 = λ / 40.
^ 1
However, Var[ λ ] = . ⇒ Information = 40/λ.
Information
Comment: For a Poison, for sample size n, (Fisherʼs) Information = n/λ.
3.40. A. In order to have mean 2β, the Negative Binomial distribution in region 2 has r = 2.
Ignoring constants, the likelihood in region 1 is: β31 / (1+β)32.
Ignoring constants, the likelihood in region 2 is: β50 / (1+β)50+2.
Thus ignoring constants, the combined likelihood is: β81 / (1+β)84.
Ignoring constants, the combined loglikelihood is: 81ln[β] - 84ln[1+β].
Setting the derivative with respect to beta equal to zero:
0 = 81/β - 84 / (1+β). ⇒ β = 81/3 = 27.
Alternately, a year in Region 2 is expected to produce as many claims as 2 years in Region 1.
β = (31 + 50)/(1 + 2) = 81/3 = 27.
Comment: This trick of adjusting exposures works for a Poisson frequency and here, but does not
work in general for other frequency distributions.
Similar to 4, 11/00, Q. 34, involving exponential severities, in “Mahlerʼs Guide to Fitting Loss
Distributions.” A similar trick of adjusting losses works for an Exponential severity, but does not work
in general for other severity distributions.
3.41. D. X is Bernoulli with f(x) = (1-q)1-x qx. ln f(x) = (1-x)ln(1-q) + x ln[q].

∂ ln[f(x)] 1 - x x ∂2 ln[f(x)] 1 - x x
=- + . =- - .
∂q 1 - q q ∂q 2
(1 - q)2 q2
E[X] = q.
∂2 ln[f(x)] 1 - q q 1 1 1
Thus E[ ]=- - 2 =- - =- .
∂q 2
(1 - q) 2 q (1- q) q q (1- q)
∂2 ln[f(x)] 1 200
(Fisherʼs) Information = -n E[ ] = (-200) {- }= .
∂q 2 q (1- q) q (1- q)
The fitted q is 40/200 = 0.2. Thus the information is: 200 / {(0.2)(0.8)} = 1250.
Alternately, the sum of 200 independent Bernoullis is a Binomial with m = 200.
200!
f(x) = (1-q)200-x qx.
x! (200 - x)!
ln f(x) = ln[200!] - ln[x!] - ln{(200-x)!] + (200-x)ln[1 - q] + x ln[q].
∂ ln[f(x)] 200 - x x ∂2 ln[f(x)] 200 - x x
=- + . =- - 2.
∂q 1 - q q ∂q 2
(1 - q) 2 q
For the Binomial, E[X] = 200q.

∂2 ln[f(x)] 200 - 200q 200q 1 1 200
Thus E[ ]=- - = -200 { + } = - .
∂q 2
(1 - q)2 q2 (1- q) q q (1- q)
We have a sample of size one from the Binomial.

∂2 ln[f(x)] 200 200 200
(Fisherʼs) Information = -n E[ ] = (-1){- }= = = 1250.
∂q 2 q (1- q) q (1- q) (0.2)(0.8)
Alternately, for the Bernoulli, maximum likelihood is equal to the method of moments.
q^ = X . ⇒ Var[ q^ ] = Var[ X ] = Var[X]/200 = q(1-q) / 200 = (0.2)(0.8) / 200 = 0.0008.
1
However, Var[ q^ ] = . ⇒ (Fisherʼs) Information = 1/0.0008 = 1250.
Fisher's Information
mn
Comment: For a Binomial with m fixed, for sample size n, Fisherʼs Information = .
q (1- q)
200
Here m = 1 and n = 200, and the information is: .
q (1- q)
3.42. A. ln f(x) = 2x lnθ - θ2 - ln(x!).
Loglikelihood is: 2lnθΣxi - nθ2 - Σln(xi!).
Set the partial derivative with respect to θ equal to zero:
0 = 2Σxi /θ - 2nθ. ⇒ θ2 = Σxi/n = (17 + 10 + 32 + 5)/4 = 16. ⇒ θ^ = 4.
Alternately, let θ2 = λ. Then we have a Poisson, and maximum likelihood is the same as the method
moments: λ^ = X = 16. ⇒ θ^ = 4.
3.43. C. ln f(x) = ln(p) + (x - 1)ln(1-p).

Loglikelihood is: 3 ln(p) + (6 + 8 + 1)ln(1-p).
Set the derivative with respect to p equal to zero:
0 = 3/p - 15/(1-p). ⇒ 3(1-p) = 15p. ⇒ p = 1/6.
Alternately, X follows one plus a Geometric Distribution as per Loss Models.
⇒ X - 1 follows a Geometric Distribution as per Loss Models, with p = 1/(1+β).
For the Geometric, the method of moments is equal to maximum likelihood.
⇒ Working with X - 1, β^ = (6 + 8 + 1)/3 = 5. p = 1/(1+β) = 1/6.
3.44. A. f(x) = λxe−λ/x!. ln f(x) = xln(λ) - λ - ln(x!). Therefore, the loglikelihood is:
X1 ln(θ) - θ - ln(X1 !) + X2 ln(2θ) - 2θ - ln(X2 !) + X3 ln(3θ) - 3θ - ln(X3 !) =
X1 ln(θ) + X2 ln(θ) + X2 ln(2) + X3 ln(θ) + X3 ln(3) - 6θ - ln(X1 !) - ln(X2 !) - ln(X3 !).
Setting the partial derivative with respect to θ equal to zero:
(X1 + X2 + X3 )/θ - 6 = 0. ⇒ θ = (X1 + X2 + X3 )/6 = X /2.

^
3.45. B. A priori variance = (0.9)(0.1)/100 = 0.0009. Standard deviation = 0.03.

Maximum likelihood estimate ⇔ method of moments estimate: q = 50/100 = 0.5.
Estimated variance = (0.5)(0.5)/100 = 0.0025. Standard deviation = 0.05.
Revised estimate is: {(0.03)(0.9) + (0.05)(0.5)} / (0.03 + 0.05) = 0.65.
3.46. B. The likelihood is the product of: f(2)f(1) = (λ2e−λ/2)(λe−λ) = λ3e−2λ/2.

Trying all the given values of lambda, the maximum likelihood occurs at λ = 2.
Lambda Likelihood
1 0.0677
2 0.0733
3 0.0335
4 0.0107
5 0.0028
Comment: The method of maximum likelihood applied to the Poisson is equal to the method of
moments, so the result would ordinary be λ = 3/2. However, here λ has been restricted to be an
integer. Thus one expects the answer to be either 1 or 2.
3.47. For the Poisson, the method of maximum likelihood equals the method of moments.
λ = (70 + 60 + 100) / (500 + 750 + 1000) = 0.102.
Alternately, in year one, the number of claims is a Poisson frequency process with mean 500 λ.
In year one, the likelihood is f(70) = e−500λ (500 λ)70 / 70!.

In year one, the loglikelihood is ln(f(70)) = 70 ln(λ) +70 ln(500) - 500λ - ln(70!).
Thus the sum of the loglikelihoods over the three years is:
(70 + 60 + 100) ln( λ) + 70 ln(500) + 60 ln(750) + 100 ln(1000) - (500+750+1000)λ - ln(70!) -
ln(60!) - ln(100!).
Taking the partial derivative with respect to λ and setting it equal to zero:
0 = 230/λ - 2250. ⇒ λ = 230/2250 = 0.102.

3.48. B. For r fixed, for the Negative Binomial Distribution, maximum likelihood equals the method
of moments. Set rβ = x . ⇒ β = x /r.
Alternately, f(x) = {r(r+1)...(r+x-1)/x!} β x / (1+β)x+r .
ln f(x) = ln(r) + ln(r+1) + ... + ln(r+x-1) - ln(x!) + xln(β) - (x+r)ln(1+β).
Σ ln f(xi) = Σ {ln(r) + ln(r+1) + ... + ln(r+xi -1) - ln(xi!)} + ln(β)Σxi - ln(1+β)Σ(xi + r).
Taking the partial derivative with respect to β and setting it equal to zero:
0 = Σxi/β - Σ(xi + r)/(1 + β). ⇒ βΣ(xi + r) = (1 + β)Σxi. ⇒ β = Σxi / Σr = Σxi /(nr) = x /r.
Alternately, f(x) = {r(r+1)...(r+x-1)/x!} β x / (1+β)x+r.

ln f(x) = x ln β - (x + r) ln(1 + β) + “constants”, where since r is known we treat terms not involving β
as a “constant” that will drop out when we take the partial derivative with respect to β.
Loglikelihood: lnβ Σxi - ln(1 + β) (nr + Σxi) + constants.
Set the partial derivative with respect to β equal to 0:

0 = Σxi/β - (nr + Σxi)/(1 + β). ⇒ (1 + β)Σxi = β(nr + Σxi). ⇒ β = Σxi/(nr) = x /r.
3.49. D. Mean = (5 + 4 + 4 + 9 + 3)/5 = 5.

Sample Variance = {(5 - 5)2 + (4 - 5)2 + (4 - 5)2 + (9 - 5)2 + (3 - 5)2 } / ( 5 - 1) = 5.5
For m fixed, Maximum Likelihood ⇔ Method of Moments. Set 100q = 5.
⇒ q = 0.05. Thus for the Binomial, the variance is: q(1 - q)100 = (0.05)(0.95)(100) = 4.75.
5.5 - 4.75 = 0.75.
3.50. A. The number of bets won on each visit to the track is a Binomial with m = 3.
Maximum likelihood is equal to the method of moments.
mq = 3q = X = {(10)(0) + (7)(1) + (3)(2) + (0)(3)}/(10 + 7 + 3 + 0) = 13/20. ⇒ q = 13/60.
3.51. C. X = {(0)(157) + (1)(66) + (2)(19) + (3)(4) + (4)(2)}/(157 + 66 + 19 + 4 + 2) = 0.5.

For the Geometric, maximum likelihood is equivalent to the method of moments.
β = X = 0.5. f(0) = 1/(1 + β) = 1/1.5 = 0.6667.
For the Poisson, method of moments, λ = X = 0.5. f(0) = e−λ = e-0.5 = 0.6065.
The absolute difference is: |0.6667 - 0.6065| = 0.0601.
Comment: For the Poisson, maximum likelihood is equivalent to the method of moments.
3.52. E. f(0) = 1/(1 + β)r. f(3) = {r(r+1)(r+2)/3!}β3/(1 + β)r+3.
f(5) = {r(r+1)(r+2)(r+3)(r+4)/5!}β5/(1 + β)r+5.

⎛ 1 ⎞ 3r ⎛ β ⎞ 8 2 2 2
Likelihood is: f(0)f(3)f(5) = ⎜ ⎟ ⎜ ⎟ r (r +1) (r + 2) (r + 3)(r + 4) / (3! 5!).
⎝ β + 1⎠ ⎝ β + 1⎠
Comment: Using a computer one can maximize the loglikelihood: r = 1.871, β = 1.425.
3.53. B. X = {(5000)(0) + (5000)(1)}/(5000 + 5000) = 0.5.

For the Binomial with m fixed, the method of maximum likelihood is equal to the method of
moments. mq = 0.5. ⇒ q^ = 0.5/2 = 0.25.
loglikelihood is: 5000 ln[f(0)] + 5000 ln[f(1)] = 5000 ln[(1 - q)2 ] + 5000 ln[2(1-q)(q)]
= 15000 ln[1- q] + 5000 ln[q] + 5000 ln[2].
At q^ = 0.25, the loglikelihood is: 15000 ln[.75] + 5000 ln[.25] + 5000 ln[2] = -7781.
^
3.54. B. λ = X = (10 + 2 + 4 + 0 + 6 + 2 + 4 + 5 + 4 + 2)/10 = 3.9.
^ ^
Var[ λ ] = Var[ X ] = Var[X]/N = λ /10 = 0.39.
^
CV[ λ ] = 0.39 / 3.9 = 0.160.
Alternately, f(x) = e−λ λx/x!. lnf(x) = -λ + xln(λ) - ln(x!).
∂ln[f(x)] ∂2ln[f(x)] 2
2. E[ ∂ ln[f(x)] ] = -E[X]/λ2 = -λ/λ2 = -1/λ.
= -1 + x/λ. = -x/λ
∂λ ∂λ2 ∂λ2
^ -1
Var[ λ ] = = λ/N. Proceed as before.
∂2 ln[f(x)]
N E[ ]
∂λ 2
3.55. C. The method of maximum likelihood is equivalent to the method of moments in this case.
^
λ = X = {(0)(1000) + (1)(1200) + (2)(600) + (3)(200)}/3000 = 1.
^
Var[ λ ] = Var[ X ] = Var[X]/n = λ/n = 1/3000.
Therefore, a 90% confidence interval for λ is: 1 ± (1.645) 1/ 3000 = 0.970 to 1.030.
3.56. C. Prob[X = 0] = Prob[X = 1000] = p.

Therefore, Prob[X = 500] = 1 - 2p. Therefore, 0 ≤ p ≤ 1/2.
The likelihood of the observation is: p p p = p3 , an increasing function of p.
This is maximized for p as big as possible; p = 1/2.
3.57. E. For the Geometric Distribution, maximum likelihood is equal to the method of moments.
^
β = (0 + 1 + 2 + 3 + 4 + 4 + 5 + 6 + 7 + 8)/10 = 4.
The variance of the Geometric is: β(1+β) = (4)(5) = 20.
3.58. E. The probability of seeing zero claims in a day is: e−λ.
The probability of seeing more than zero claims in a day is: 1 - e−λ.
5 days with zero claims and 4 days with more than zero claims, thus the likelihood is:
(e−λ)5 (1 - e−λ)4 .
Therefore, the loglikelihood is: 5 (-λ) + 4 ln(1 - e−λ).

-5 + 4e−λ/(1 - e−λ) = 0. ⇒ e−λ = 5/9. ⇒ λ = 0.588.
Alternately, set y = e−λ. Then the likelihood is: y5 (1-y)4 .

Since this a one-to-one monotonic transformation, we can find the y that maximizes y5 (1-y)4 , and
the Iikelihood will be maximized for the corresponding λ.
Setting the derivative with respect to y equal to zero:
0 = 5y4 (1-y)4 + 4y5 (1-y)3 . ⇒ 5(1-y) = 4y. ⇒ y = 5/9. ⇒ e−λ = 5/9. ⇒ λ = 0.588.
boundary between the two intervals. (The empirical distribution function at x is the percentage of
observations that are less than or equal to x.) In this case, we set e−λ = F(0) = 5/9. ⇒ λ = 0.588.
Comment: This is an example of data grouped into intervals; in this case there are two intervals:
0 claims, and at least one claim.
In general, maximum likelihood is invariant under monotonic one-to-one change of variables, such as:
x2 , x , 1/x, ex, e-x, and ln(x).
3.59. E. The probability of seeing zero claims in a day is: 1/(1+β).

The probability of seeing more than zero claims in a day is: 1 - 1/(1+β) = β/(1+β).
5 days with zero claims and 4 days with more than zero claims, thus the likelihood is:
{1/(1+β)5 } {β4/(1+β)4 } = β4/(1+β)9 .
Therefore, the loglikelihood is: 4lnβ - 9 ln(1+β).
4/β - 9/(1+β). = 0. ⇒ 4(1+β) = 9β. ⇒ β = 0.8.

boundary between the two intervals.
In this case, we set 1/(1+β) = F(0) = 5/9. ⇒ β = 0.8.
3.60. D. For a Binomial with m fixed, maximum likelihood is equal to the method of moments.
Then q = (35 + 18 + 41) / (100 + 50 + y). ⇒ y = 85.
Comment: Backwards question, they give you an output and ask you to solve for a missing input.
3.61. E. For the Geometric, maximum likelihood is equal to the method of moments:
^
β = (1 + 0 + 5 + 6 + 4 + 8 + 2 + 3 + 7 + 4) /10 = 4.
^ ^
The variance of the fitted Geometric is: β (1+ β ) = (4)(5) = 20.
2016-C-5, Fitting Frequency § 4 Chi-Square Test, HCM 10/27/15, Page 89
Section 4, Chi-Square Test33
The principle method to test the fits of frequency distributions is the Chi-Square Test.
A smaller Chi-Square statistic is better, assuming the same degrees of freedom; as discussed
below we can compare fits using the corresponding p-values.
Chi-Square Statistic:
One can use the Chi-Square Statistic to test the fit of Frequency Distributions or Loss Distributions.
The Chi-Square Statistic is computed as a sum of terms, one for each interval.
(observed number - expected number)2
For each interval one computes: .
expected number
For example, compare a Poisson Distribution with λ = 1/2 to the following data:
Number of Observed
Claims # Insureds
0 22,281
1 10,829
2 2,706
3 429
4 44
5 and over 6
Sum 36,295
Then the expected number of insureds with n claims is: 36,295 f(n) = 36,295 e−λ λn /n!.
For example for n = 3: (36,295)(e-0.5 0.53 / 3!) = (36,295)(0.012636) = 458.6.
Number of Observed Poisson Expected Chi-Square =

Claims # Insureds Distribution # Insureds (observed - expected)^2 / expected
0 22,281 0.606531 22,014.03 3.238
1 10,829 0.303265 11,007.02 2.879
2 2,706 0.075816 2,751.75 0.761
3 429 0.012636 458.63 1.914
4 44 0.001580 57.33 3.099
5 and over 6 0.000172 6.25 0.010
Sum 36,295 1.000000 36,295.00 11.900
33
See also “Mahlerʼs Guide to Fitting Loss Distributions.”
The contribution to the Chi-Square Statistic from the n = 3 interval is:

(429 - 458.63)2 /458.63 = 1.914.
The Chi-Square Statistic is: 3.238 + 2.879 + 0.761 + 1.914 + 3.099 + 0.010 = 11.900.
The closer the match between the observed and expected numbers in an interval, the smaller the
contribution to the Chi-Square Distribution. A small Chi-Square Statistic indicates a good
match between the data and the assumed distribution.
Note that the sum of the assumed column is equal to the sum of the observed column.
That should always be the case, when you compute a Chi-Square Statistic.
In this case, we just assumed the given Poisson Distribution, and did not fit it to this data.
In the case of a distribution fit to a data set, one can calculate the Chi-Square Statistic in the same
manner.
The Negative Binomial distribution previously fit by Method of Moments to the following data has
parameters β = 0.230 and r = 1.455.
Number of Drivers 17649 4829 1106 229 44 9 4 1 1 23872
Here is the computation of the Chi-Square Statistic for this case:
Number of Fitted Fitted Chi-Square =

Claims Observed Neg. Binomial # Drivers (observed # - fitted #)^2 / fitted #
0 17,649 0.7399149 17,663.25 0.01
1 4,829 0.2013197 4,805.90 0.11
2 1,106 0.0462114 1,103.16 0.01
3 229 0.0099522 237.58 0.31
4 44 0.0020728 49.48 0.61
5 and over 15 0.0005291 12.60 0.46
Sum 23,872 1.0000000 23,871.97 1.50
For example, for 3 claims the fitted number of drivers is:

(total # of drivers observed)(f(3) of the fitted Negative Binomial Distribution) = (23872)(.0099522)
= 237.58.
The total of the observed and fitted columns are equal, as they should be.
Note that in this case I have chosen to group the intervals for 5 and over, so as to have at least 5
expected observations.34 However, unless an exam question has specifically told you which
groups to use, use the groups for the data given in the exam question.35
Hypothesis Testing:36
One could test the fit of the method of moments Negative Binomial Distribution to the observed
data by comparing the computed Chi-Square Statistic of 1.50 to the Chi-Square Distribution with
degrees of freedom equal to:
the number of intervals - 1 - number of fitted parameters.37
In this case with six intervals and two fitted parameters, one would compare 1.50 to the
Chi-Square for 6 - 1 - 2 = 3 degrees of freedom.
Degrees Significance Levels

of Freedom 0.100 0.050 0.025 0.010 0.005
3 6.251 7.815 9.348 11.345 12.838
One would not reject this fit at a 10% significance level, since 1.50 < 6.251. We do not reject at
10% the null hypothesis that the data was generated by this Negative Binomial Distribution. The
alternate hypothesis is that it was not.
In general, the number of degrees of freedom is used to determine which row of the
Chi-Square Table to consult.
We then see which two columns bracket the value of the Chi-Square statistic and we then reject at
the significance level of the left hand of the two columns, and we do not reject at the significance level
of the right hand of the two columns, which is in this case is 10%.
In general, reject to the left and do not reject to the right.
If instead the Chi-Square statistic had turned out to be 11, since 9.348 < 11 < 11.345, we would
have rejected at 2.5% and not rejected at 1.0%.
34
Six and over would have only 2.5 expected observations.
In practical applications, there are a a number of different rules of thumb one can use for determining the groups to
use. See the footnote in Section 16.4.3 of Loss Models. I use one of the rules mentioned in Loss Models:
One should have an expected number of claims in each interval of 5 or more, so that the normal approximation that
underlies the theory, is reasonably close; therefore, some of the given intervals for grouped data may be combined
for purposes of applying the Chi-Square test.
35
As in for example 4, 5/00, Q.29.
36
See “Mahlerʼs Guide to Fitting Loss Distributions” for a more detailed discussion of Hypothesis Testing.
37
In computing the number of degrees of freedom, we use the number of intervals used to compute the
Chi-Square statistic. As discussed above, this may be less than the number of intervals in the original data set, if we
have combined some intervals.
In determining the number of degrees of freedom, we only subtract the number of parameters when
the distribution has been fit to the data set we are using to compute the Chi-Square. We do not
decrease the number of degrees of freedom if this distribution has been fit to some similar but
different data set. If the distribution is assumed, then the number of degrees of freedom is: number
of intervals minus one.
Exercise: Use the Chi-Square Statistic computed previously as 11.900, in order to test the null
hypothesis H0 : that the data with 36,295 insureds is a random sample from the Poisson Distribution
with λ = 1/2, against the alternative hypothesis H1 : this data is not a random sample from the
Poisson Distribution with λ = 1/2.

[Solution: With six intervals and no fitted parameters, there are: 6 - 1 = 5 degrees of freedom.
of Freedom 0.100 0.050 0.025 0.010 0.005
5 9.236 11.070 12.832 15.086 16.750
11.070 < 11.900 < 12.832; reject H0 at 5%, and do not reject H0 at a 2.5%.]
To compute the number of Degrees of Freedom:

1. Determine the groups to use in computing the Chi-Square statistic. Unless the exam
question has specifically told you which groups to use, use the groups for the data
given in the question.
2. Determine whether any parameters have been fit to this data, and if so how many.
3. Degrees of freedom =
(# intervals from step 1) - 1 - (# of fitted parameters, if any, from step #2).
Exercise: Assume one has observed drivers over nine years and got the following distribution of
drivers by number of claims over the nine year period:
Use the Chi-Square Statistic to test the fit of a Poisson Distribution fit to this data via the Method of
Maximum Likelihood.
Group the intervals for 4 and over in computing the Chi-Square Statistic.
[Solution: For the Poisson, the Method of Maximum Likelihood and the Method Moments applied
to ungrouped data produce the same result. By taking the average value of the number of claims,
one can calculate the first moment:
(0)(17649) + (1)(4829) + (2)(1106) + (3)(229) + (4)(44) + (5)(9) + (6)(4) + (7)(1) + (8)(1)
23,872
= 7988 / 23,872 = 0.3346.
The fitted Poisson parameter λ is equal to the observed mean of 0.3346.
Fitted Chi-Square =
Number of Poisson (observed number - fitted number )^2 / fitted #
Claims Observed Distribution
0 17,649 17,083.4 18.73
1 4,829 5,716.1 137.67
2 1,106 956.3 23.43
3 229 106.7 140.33
4 and over 59 9.5 257.92
Sum 23,872 23872.0 578.08
Note that the total of the observed and fitted columns are equal, as they always should be for the
computation of a Chi-Square. For example, for 2 claims, the fitted number of observations is the
total number of observed claims times the Poisson density at 2:
(λ2 e−λ/2!)(23,872) = {(0.33462 )(e-0.3346)/2} (23,872) = (0.04006) (23,872) = 956.3.
Then the contribution to the Chi-Square is: (1106 - 956.3)2 / 956.3 = 23.43.
One tests the fit by comparing the computed value of 578.08 to the Chi-Square Distribution with
degrees of freedom equal to the number of intervals - 1 - number of fitted parameters.
In this case with one fitted parameter and 5 intervals, one would compare to the Chi-Square for
5 - 1 - 1 = 3 degrees of freedom.
of Freedom 0.100 0.050 0.025 0.010 0.005
3 6.251 7.815 9.348 11.345 12.838
One would reject this fit at a 0.5% significance level, since 12.838 < 578.08.
Comment: The Poisson is too short-tailed to fit this data. ]
p-values:
The probability value or p-value is the value of the Survival Function of the Chi-Square
Distribution (for the appropriate number of degrees of freedom) at the value of the Chi-Square
Statistic. If the data set came from the fitted or assumed distribution, then the p-value is the
probability that the Chi-Square statistic would be greater than its observed value, due to random
fluctuation.
Good Match between data and distribution. ⇒ Small Chi-Square Statistic. ⇒

A small Chi-Square distribution function value. ⇒ A large Chi-Square survival function value. ⇒
A large p-value.
A large p-value indicates a good fit. Thus one can compare the fit of two distributions to a data
by comparing the p-values corresponding to their Chi-Square Statistics; the distribution with the
larger the p-value is a better fit. In general, the p-value is the value at which one can reject the fit or
the null hypothesis.38
For the previous exercise, the p-value was less than 0.5%, since 12.838 < 578.08.
If instead the Chi-Square statistic with 3 degrees of freedom had been 8.2, then the p-value would
have been 5% > p > 2.5%, since 7.815 < 8.2 < 9.348.
Exercise: The Chi-Square statistic with 3 degrees of freedom is 10.

Estimate the corresponding p-value.
[Solution: 2.5% > p > 1%, since 9.348 < 10 < 11.345.
Comment: Thus we reject at 2.5% and do not reject at 1%.]
Note that using the Chi-Square table one can only get interval estimates of p. Using computer
software one can instead get specific values for p. For example, for the Chi-Square statistic of 10
with 3 degrees of freedom, the corresponding p-value is 1.86%.39
Exercise: A Negative Binomial distribution has a Chi-Square statistic of 14.8 with 6 degrees of
freedom, while a Poisson distribution has a Chi-Square statistic of 15.4 with 7 degrees of freedom.
Which one has the better p-value?
[Solution: Since 14.449 < 14.8 < 16.812, the p-value for the Negative Binomial is between 2.5%
and 1%. Since 14.067 < 15.4 < 16.013, the p-value for the Poisson is between 5% and 2.5%.
The Poisson has a larger and therefore better p-value.]
38
When using the Chi-Square Table, one rejects at the significance value in the table that first exceeds the
p-value. For example, with a p-value of 0.6% one rejects at 1%. Reject to the left and do not reject to the right.
39
The Chi-Square Statistic for 3 d.f. is a Gamma Distribution with α = 3/2 and θ = 2. The survival function at 10 is
1 - Γ[1.5; 10/2] = 0.0186.
Years of Data:40
Exercise: Use the following data for three years:

2001 900 158
2002 1000 160
2003 1100 162
Use the Chi-Square Goodness of Fit Test to test the null hypothesis that the frequency per
exposure is the same for all three years.
[Solution: The observed overall frequency is: (158 + 160 + 162)/(900 + 1000 + 1100) = 0.16.
The expected number of claims in 2001 is: (900)(0.16) = 144.
Year Observed Exposures Assumed Freq. Expected Chi-Square
2001 158 900 0.16 144 1.361
2002 160 1,000 0.16 160 0.000
2003 162 1,100 0.16 176 1.114
Sum 480 3,000 0.16 480 2.475
There are 3 - 1 = 2 degrees of freedom. We lose one degree of freedom because the sum of the
expected is equal to the observed column.
The 10% critical value is 4.605. 2.475 < 4.605, so we do not reject H0 at 10%.
Comment: Using a computer, the p-value is 29.0%.]
When given years of data, the Chi-Square test statistic is computed as: χ2 = ∑ (nk - Ek)2 / Vk ,
k
where nk is the number of claims for year k, Ek is the expected number for year k,
and Vk is the variance for year k.41
For the above data, n1 = 158 and E1 = (900)(mean frequency).
With no assumption as to the form of the frequency distribution, we take Vk = Ek.

If frequency is assumed to be Poisson, then E1 = 900λ, and V1 = 900λ = E1 .
If frequency is assumed to be Binomial, then E1 = 900mq, and V1 = 900mq(1-q) = E1 (1-q).
If frequency is assumed to be Negative Binomial, then E1 = 900rβ,
and V1 = 900rβ(1+β) = E1 (1+β).
40
See Example 16.8 and Exercise 16.14 in Loss Models.
41
nk is approximately Normally distributed. (nk - Ek)/ V k follows approximately a Standard Normal Distribution.
The sum of the squares of Standard Normal Distributions follows a Chi-Square Distribution.
We lose one degree of freedom because the total of the expected Ek, is equal to the total of the observed nk.
Exercise: Use the same data for three years:

2001 900 158
2002 1000 160
2003 1100 162
Fit a Poisson Distribution to this data via method of moments.
[Solution: The observed overall frequency is: (158 + 160 + 170)/(900 + 1000 + 1100) = 0.16.
^
λ = 0.16. We get the same expected numbers of claims by year and the same Chi-Square
Statistic as in the previous exercise.
Year Observed Exposures Assumed Freq. Expected Chi-Square
2001 158 900 0.16 144 1.361
2002 160 1,000 0.16 160 0.000
2003 162 1,100 0.16 176 1.114
Sum 480 3,000 0.16 480 2.475
As before, there are 3 - 1 = 2 degrees of freedom.
Comment: For years of data, for the Poisson, maximum likelihood is equal to the method of
moments.]
Even though in the second exercise we fit a Poisson Distribution, given the form of the data we
compared to, we only used the mean of that Poisson Distribution. Unlike in previous situations, we
did not compare the expected and observed numbers of insureds with zero claims, one claim, two
claims, etc. We did not use the density at 0, 1, 2, etc., of the fitted Poisson Distribution.
Therefore, we were really only assuming the same mean frequency in each year, just as in the
previous exercise. Therefore, in this situation, we only subtract one from the degrees of freedom;
in other words we subtract the number of fitted parameter.
In general, if applying the Chi-Square Goodness of Fit Test to data with total claims and
exposures by year, and one has a fit distribution to the data, then the number of degrees
of freedom is the number of years minus the number of fitted parameters.42
Recall that in the first example, where we just assumed each year has the same frequency per
exposure we subtracted one from the number of degrees of freedom.
42
Recall, that in the more common case where we are given the number of insureds with zero claims, with one claim,
etc., we subtracted one plus the number of fitted parameters.
Exercise: Use the same data for three years:

2001 900 158
2002 1000 160
2003 1100 162
Fit a Geometric Distribution to this data via method of moments.
[Solution: The observed overall frequency is: (158 + 160 + 162) / (900 + 1000 + 1100) = 0.16.
^
β = 0.16. Thus, we get the same expected numbers of claims by year as in the previous exercise.
However, the denominator, Vk = β(1+β)(exposures) = Ek(1+β).
χ2 = ∑(nk - E k)2 / Vk =
k
(158 - 144)2 /{(144)(1.16)} + (160 - 160)2 /{(160)(1.16)} + (162 - 176)2 / {(176)(1.16)} = 2.133.
As before, there are 3 - 1 = 2 degrees of freedom.
Comment: For years of data, for the Geometric, maximum likelihood is equal to the method of
moments. The Chi-Square statistic here is that for the Poisson assumption divided by 1.16, subject
to rounding: 2.475/1.16 = 2.134.]
Kolmogorov-Smirnov Statistic:43
The Kolmogorov-Smirnov Statistic is computed by finding the maximum absolute difference

between the observed distribution function and the fitted distribution function:
Max | observed distribution function at x - theoretical distribution function at x |

x
Shown below is the computation for the Negative Binomial fit above to data via the Method of
Moments. In this case, the maximum absolute difference between the Fitted and Observed
Distribution Functions is .00060 and occurs at 0. This K-S statistic of .0006 can be used to compare
the fit of the distribution to that of some other type of distribution. The smaller the
K-S Statistic the better the fit.
Observed Method of Fitted

Number of Distribution Moments Distribution Absolute
Claims Observed Function Function Difference
0 17,649 0.73932 0.73991 0.73991 0.00060
1 4,829 0.94161 0.20132 0.94123 0.00037
2 1,106 0.98794 0.04621 0.98745 0.00049
3 229 0.99753 0.00995 0.99740 0.00013
4 44 0.99937 0.00207 0.99947 0.00010
5 9 0.99975 0.00042 0.99989 0.00015
6 4 0.99992 0.00009 0.99998 0.00006
7 1 0.99996 0.00002 1.00000 0.00004
8 1 1.00000 0.00000 1.00000 0.00000
9 0 1.00000 0.00000 1.00000 0.00000
10 0 1.00000 0.00000 1.00000 0.00000
Note that we do not attach any specific significance level, as we did with the Chi-Square Statistic.
The use of the K-S Statistic to reject or not reject a fit should be confined to ungrouped data and
continuous distributions. The K-S Statistic should not be applied to discrete distributions or grouped
data in order to reject or not reject a fit . Nevertheless, even when dealing with discrete distributions,
a smaller value of the K-S Statistic does indicate a better fit.
43
See “Mahlerʼs Guide to Fitting Loss Distributions” for a more extensive discussion of the Kolmogorov-Smirnov
Statistic. For a discrete distribution one compares at all the (available) points, rather than “just before” and “just after”
each observed claim, as with a continuous distribution.
Chi-Square Table attached to Exam 4/C:
The Chi-Square Table that has been attached to Exam 4/C is shown on the next page.
As usual there are different rows corresponding to different degrees of freedom; in this case the
degrees of freedom extend from 1 to 20.
The values shown in each row are the places where the Chi-Square Distribution Function for that
numbers of degrees of freedom has the stated P values. The value of the distribution function is
denoted by P (capital p.)
For example, for 4 degrees of freedom, F(9.488) = 0.950.
95%
2 4 6 9.488 14
Unity minus the distribution function is the Survival Function; the value of the Survival Function is the
p-value (small p) or the significance level, sometimes denoted by α.
For example, for 4 degrees of freedom, 9.488 is the critical value corresponding to a significance
level of 1 - 0.95 = 5%. The critical values corresponding to a 5% significance level are in the column
labeled P = 0.950.
Similarly, the critical values corresponding to a 1% significance level are in the column labeled
P = 0.990.
For the following questions, use the following Chi-Square table:
χ2 Distribution P
1-P
0 χ02
The table below gives the value of χ 02 for which Prob[ χ 2 < χ 02 ] = P for a given number of degrees
of freedom and a given value of P.
Degrees of Value of P
Freedom 0.005 0.010 0.025 0.050 0.900 0.950 0.975 0.990 0.995
1 0.000 0.000 0.001 0.004 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 9.236 11.070 12.832 15.086 16.750
6 0.676 0.872 1.237 1.635 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 12.017 14.067 16.013 18.475 20.278
8 1.344 1.647 2.180 2.733 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 15.987 18.307 20.483 23.209 25.188
11 2.603 3.053 3.816 4.575 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 22.307 24.996 27.448 30.578 32.801
16 5.142 5.812 6.908 7.962 23.542 26.296 28.845 32.000 34.267
17 5.697 6.408 7.564 8.672 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.390 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 27.204 30.144 32.852 36.191 38.582
20 7.434 8.260 9.591 10.851 28.412 31.410 34.170 37.566 39.997
Problems:
You observe 10,000 trials of a process you believe is Binomial with parameters m = 8 and q
unknown.
Number of Claims Number of Observations
0 188
1 889
2 2123
3 2704
4 2362
5 1208
6 426
7 88
8 12
10000
4.1 (2 points) Determine q by the method of moments.

A. 0.40 B. 0.42 C. 0.44 D. 0.46 E. 0.48
4.2 (4 points) What is the Chi-Square statistic for the fitted Binomial Distribution from the previous
question?
A. 11 B. 12 C. 13 D. 14 E. 15
4.3 (1 point) Based on the Chi-Square statistic computed in the previous question, one tests the
hypothesis H0 that the data is drawn from the fitted Binomial Distribution.
Which of the following is true?
A. Reject H0 at 0.005.
B. Do not reject H0 at 0.005. Reject H0 at 0.010.
C. Do not reject H0 at 0.010. Reject H0 at 0.025.
D. Do not reject H0 at 0.025. Reject H0 at 0.050.
E. Do not reject H0 at 0.050. Reject H0 at 0.100.
4.4 (3 points) What is the Kolmogorov-Smirnov Statistic for the fitted Binomial Distribution?
A. 0.005 B. 0.010 C. 0.015 D. 0.020 E. 0.025
4.5 (2 points) Assume one has observed drivers over nine years and got the following distribution
of drivers by number of claims over the nine year period:
One fits a Negative Binomial Distribution via the method of moments to this data, assuming
r = 1.8. What is the value of β?
A. less than 0.16
E. at least 0.19
4.6 (3 points) What is the Chi-Square statistic for the fitted Negative Binomial Distribution from the
previous question?
Group the intervals so that each has at least 5 expected observations.
A. less than 4
E. at least 10
hypothesis H0 that the data is drawn from the fitted Negative Binomial Distribution.
Use the following information for the next nine questions:
Number of Claims 0 1 2 3 4 All

Number of Policies 90,000 9,000 700 50 5 99,755
4.8 (2 points) One fits a Poisson Distribution via the method of moments to this data.
What is the value of the fitted parameter λ?
A. less than 0.10
E. at least 0.13
4.9 (3 points) What is the Chi-Square statistic for the fitted Poisson Distribution from the previous
question?
A. less than 10
E. at least 40
hypothesis H0 that the data is drawn from the fitted Poisson Distribution.
How many degrees of freedom does one use?
A. 1 B. 2 C. 3 D. 4 E. 5
4.11 (1 point) Based on the Chi-Square statistic computed previously, one tests the hypothesis
H0 that the data is drawn from the fitted Poisson Distribution.
4.12 (2 points) One fits a Negative Binomial Distribution via the method of moments to this data.
What is the value of the fitted parameter β?
A. less than 0.06
E. at least 0.12
4.13 (1 point) One fits a Negative Binomial Distribution via the method of moments to this data.
What is the value of the fitted parameter r?
A. less than 1.5
E. at least 1.8
4.14 (3 points) What is the Chi-Square statistic for the fitted Negative Binomial Distribution from the
previous two questions?
A. less than 1
E. at least 7
4.15 (1 point) Based on the Chi-Square statistic computed in the previous question, what is the
p-value of the fitted Negative Binomial Distribution.
A. 0.005 < p < 0.010
B. 0.010 < p < 0.025
C. 0.025 < p < 0.050
D. 0.050 < p < 0.100
E. 0.100 < p
4.16 (3 points) What is the loglikelihood to be maximized in order to fit a Negative Binomial
Distribution, with parameters β and r, to the above data via the method of maximum likelihood?
A. 10570ln(β) + 9755ln(r) + 755ln(r+1) + 55ln(r+2) + 5ln(r+3) - 110325 ln(1+β)
B. 10570ln(β) + 9000ln(r) + 700ln(r+1) + 50ln(r+2) + 5ln(r+3) - 110325 ln(1+β)
C. 10570ln(β) + 9755ln(r) + 755ln(r+1) + 55ln(r+2) + 5ln(r+3) - (99755r + 10570)ln(1+β)
D. 10570ln(β) + 9000ln(r) + 700ln(r+1) + 50ln(r+2) + 5ln(r+3) - (99755r + 10570)ln(1+β)
4.17 (2 points) A Geometric, a compound Poisson-Binomial, and a mixed Poisson-Inverse

Gaussian Distribution have each been fit to the same set of data.
The set of data has 8 intervals.
(For each distribution each interval has more than 5 expected observations.)
Use the Chi-Square Statistics to rank the fits from best to worst.
Distribution # of Parameters Chi-Square
1. Geometric 1 15.5
2. Compound Poisson-Binomial 3 14.6
3. Mixed Poisson-Inverse Gaussian 2 12.7
A. 1, 2, 3 B. 1, 3, 2 C. 2, 1, 3 D. 2, 3, 1 E. 3, 1, 2
4.18 (3 points) The following data is for the number of hurricanes hitting the continental United States
from 1900 to 1999.
Decade Observed
0 15
1 20
2 15
3 17
4 23
5 18
6 15
7 12
8 16
9 13
Sum 164
Let H0 be the hypothesis that the mean frequency during each decade is the same.
Using the Chi-Square test, which of the following are true?
A. Do not reject H0 at 0.5%. Reject H0 at 1.0%.
B. Do not reject H0 at 1.0%. Reject H0 at 2.5%.
C. Do not reject H0 at 2.5%. Reject H0 at 5.0%.
D. Do not reject H0 at 5.0%. Reject H0 at 10.0%.
E. Do not reject H0 at 10.0%.
Use the data shown below for the next two questions:
Region Exposures Claims
1 1257 124
2 1025 119
3 1452 180
4 1311 177
Total 5045 600
4.19 (3 points) Assume that each exposure in every region has a Poisson Distribution, with the
same mean λ.
Assume that each Poisson Distribution process is independent across exposures and regions.
Fit λ via Maximum Likelihood.
Use the Chi-Square Statistic to test H0 , the hypothesis that the data came from the fitted
Poisson Distribution.
Which of the following statements is true?
A. H0 will be rejected at the 1% significance level.
B. H0 will be rejected at the 2.5% significance level, but not at the 1% level.
C. H0 will be rejected at the 5% significance level. but not at the 2.5% level.
D. H0 will be rejected at the 10% significance level, but not at the 5% level.
E. H0 will not be rejected at the 10% significance level.
4.20 (4 points) Assume that each exposure in every region has a Binomial Distribution,
with m = 2 and the same parameter q.
Assume that each Binomial frequency process is independent across exposures and regions.
Fit q via Maximum Likelihood.
Use the Chi-Square Statistic to test H0 , the hypothesis that the data came from the fitted Binomial
Distribution.
0 518,228
1 105,070
2 47,936
3 21,673
4 9736
5 4033
6 1689
7 639
8 274
9 107
10 36
11 25
12 5
13 7
14 1
15 0
16 1
Total 709,460
Fit a Negative Binomial via the method of moments.
Use the Chi-Square Statistic to test H0 , the hypothesis that the data came from the fitted Negative
Binomial Distribution.
A. H0 will be rejected at the 0.005 significance level.
B. H0 will be rejected at the 0.01 significance level, but not at the 0.005 level.
C. H0 will be rejected at the 0.025 significance level. but not at the 0.01 level.
D. H0 will be rejected at the 0.05 significance level, but not at the 0.025 level.
E. H0 will not be rejected at the 0.05 significance level.
4.22 (4 points) You are given the following data on the number of injuries per claim on automobile
bodily injury liability insurance:
Number of Injuries Number of Policies
1 4121
2 430
3 71
4 19
5 6
6 4
7 and more 1
Total 4652
Use the Chi-Square Statistic to test H0 , the hypothesis that the data came from the following
0.2n
distribution: f(n) = .
-n ln(0.8)

4.23 (3 points) Ten thousand insureds are assigned to one of five classes based on their total
number of claims over the prior 5 years, as follows:
Class Number of Claims Number of Insureds in Class
A No Claims 6825
B One Claim 2587
C Two Claims 499
D Three Claims 78
E Four or More Claims 11
The null hypothesis is that annual claim frequency follows a Poisson distribution with mean 0.08,
which implies over 5 years a Poisson distribution with mean 0.4.
Which of these classes has the largest contribution to the Chi-square goodness-of-fit statistic?
4.24 (4 points) You are given the following data on the number of claims per year on automobile
insurance:
0 20,592
1 2,651
2 297
3 41
4 7
5 0
6 1
7 and more 0
Total 23,589
Fit a Negative Binomial via the method of moments.
Use the Chi-Square Statistic to test H0 , the hypothesis that the data came from the fitted Negative
Binomial Distribution. Group the intervals so that each has at least 5 expected observations. Which
of the following statements is true?
B. H0 will be rejected at the 0.025 significance level, but not at the .01 level.
4.25 (2 points) You are given the following data on the number of claims:
0 91,000
1 8,100
2 800
3 or more 100
Total 100,000
You compare this data to a Geometric Distribution with β = 0.1.
Compute the chi-square goodness-of-fit statistic.
A. less than 11
E. at least 14
Use the data shown below for the next three questions:
1 3000 200
2 3300 250
3 3700 310
Total 10,000 760
4.26 (2 points) Assume that the expected frequency per exposure is the same in every year.
Conduct a Chi-Square goodness-of-fit test of this hypothesis.
4.27 (2 points) Determine the maximum likelihood estimate of the Poisson parameter for the above
data. Conduct a Chi-Square goodness-of-fit test for this fitted Poisson model.
4.28 (2 points) Fit instead a Geometric Distribution via maximum likelihood to the above data.
Conduct a Chi-Square goodness-of-fit test for this fitted model.
4.29 (3 points) Use the following information on the number of accidents over a nine year period for
a set of drivers:
Number of Accidents Number of Drivers
0 17,649
1 4,829
2 1,106
3 229
4 44
5 9
6 4
7 1
8 1
9 0
Total 23,872
A Negative Binomial has been fit to this data via maximum likelihood, with fitted parameters are
r = 1.4876 and β = 0.2249. Calculate the Chi-Square Goodness of Statistic. Group the data using
the largest number of groups such that the expected number of drivers in each group is at least 5.
A. less than 1
E. at least 4
4.30 (3 points) For a set of insurance policies, there is at most one claim per policy per year.
Year Number of Policies Number of Claims
2006 9,000 842
2007 10,500 1016
2008 12,000 1258
2009 13,500 1380
2010 15,000 1594
Use Maximum Likelihood to fit a Bernoulli Distribution.
Test this fit using the Chi-Square Goodness-of-Fit Test.
What is your conclusion?
B. Do not reject H0 at 0.005. Reject Ho at 0.010.
C. Do not reject H0 at 0.010. Reject Ho at 0.025.
D. Do not reject H0 at 0.025. Reject Ho at 0.050.
E. Do not reject H0 at 0.050.
4.31 (2 points) You are given the following data for private passenger automobile insurance:
Number of Accidents Number of Drivers
0 81,714
1 11,306
2 1618
3 250
4 40
5 7
You fit a Poisson Distribution via maximum likelihood.
Using the Chi-Square Goodness of Fit Test, what is you conclusion?
4.32 (2 points) Use the following data:

1 4000 220
2 4500 210
3 5000 200
4 5500 270
Total 19,000 900
Assume that the frequency for each exposure follows a Negative Binomial Distribution with r = 1/2.
Fit β via maximum likelihood.
Conduct a Chi-Square goodness-of-fit test for this fitted Negative Binomial model.
4.33 (10 points) 1000 adults have been surveyed on how many countries they have been in
during their lifetime (including the country in which they currently reside.).
Number of Countries Number of People
1 250
2 132
3 95
4 69
5 59
6 43
7 32
8 25
9 26
10 29
11 16
12 20
13 13
14 15
15 11
16 13
17 15
18 9
19 4
20 5
21 to 25 28
26 to 30 20
31 to 35 15
36 to 40 13
41 to 45 11
46 to 50 9
more than 50 23
Let the null hypothesis be that this data was drawn from a Logarithmic Distribution with β = 30.
With the aid of a computer, calculate the Chi-square goodness-of-fit-statistic.
4.34 (3 points) You are investigating the length of hospital stays.

You have a “standard” distribution for the typical hospital.
You also have data for St. Eligius Hospital.
Number Days Standard Probability Observed Number for St. Eligius
1 0.15 165
2 0.30 291
3 0.17 163
4 0.10 109
5 0.08 70
6 0.06 75
7 0.04 44
8 to 20 0.06 41
21 or more 0.04 42
Total 1.00 1000
Determine the result of a chi-square test of the null hypothesis that hospital stays at St. Eligius follow
the standard distribution.
(A) Reject at the 0.005 significance level.
(B) Reject at the 0.010 significance level, but not at the 0.005 level.
(C) Reject at the 0.025 significance level, but not at the 0.010 level.
(D) Reject at the 0.050 significance level, but not at the 0.025 level.
(E) Do not reject at the 0.050 significance level.
4.35 (3 points) The following data is for the number of people riding in each of 10,000 cars exiting
the Abraham Tunnel:
Number of People 1 2 3 4 5 6
Number of Cars 3560 3610 2020 700 100 10
H0 : This data was drawn from a Zero-Truncated Binomial Distribution with m = 6 and q = 0.3.
Which of the following is the result of applying the Chi-square goodness-of-fit test?
4.36 (2 points) You are given a long sequence of digits.

You count the lengths of the runs odd and even digits:
Length of run: 1 2 3 4 5 or more
Number of runs: 169 56 42 22 11
If the digits are random, then Prob(run of length n) = 1/2n , n ≥ 1.
H0 : The digits are random.
(A) The hypothesis is not rejected at the 0.10 significance level.
(B) The hypothesis is rejected at the 0.10 significance level, but is not rejected at the 0.05
significance level.
(C) The hypothesis is rejected at the 0.05 significance level, but is not rejected at the
0.025 significance level.
(D) The hypothesis is rejected at the 0.025 significance level, but is not rejected at the
(E) The hypothesis is rejected at the 0.01 significance level.
4.37 (3 points) The following data is for the number of automobiles insured on 200 private
passenger automobile insurance policies:
Number of Automobiles: 1 2 3 4 or more
Number of Policies: 90 60 40 10
H0 : This data was drawn from a Zero-Truncated Poisson Distribution with λ = 1.5.
significance level.
4.38 (2, 5/83, Q. 38) (1.5 points) A die was rolled 30 times with the results shown below.
Number of Spots 1 2 3 4 5 6
Frequency 1 4 9 9 2 5
If a chi-square goodness-of-fit test is used to test the hypothesis that the die is fair at a significance
level of α = 0.05, then what is the value of the chi-square statistic and the decision reached?
A. 11.6; reject hypothesis
B. 11.6; do not reject hypothesis
C. 22.1; reject hypothesis
D. 22.1; do not reject hypothesis
E. 42.0; reject hypothesis
4.39 (4, 5/89, Q.51) (3 points)

The following claim frequency observations were made for a group of 1,000 policies:
# of Claims # of Policies
0 800
1 180
2 19
3 or more 1
Your hypothesis is that the number of claims per policy follows a Poisson distribution with a mean of
µ = 0.20. What is the Chi-square statistic for this data under your hypothesis?
A. Less than 2.4
E. 3.0 or more
4.40 (4B, 5/92, Q.30) (3 points)

You are given the following information for 10,000 risks grouped by number of claims:
• A negative binomial distribution was fit to the grouped risks.
• Minimum chi-square estimation was used to estimate the two parameters of the
negative binomial.
• The results are as follows:
Number of Actual Number Estimated Number Of Risks
Claims Of Risks Using Negative Binomial
0 8,725 8,738
1 1,100 1,069
2 135 162
3 35 26
4 3 4
5 2 1
10,000 10,000
You are to use the Chi-square statistic to test the hypothesis, H0 , that the negative binomial
provides an acceptable fit. Group the intervals so that each has at least 5 expected observations.
B. Do not reject H0 at 0.005. Reject Ho at 0.010.
C. Do not reject H0 at 0.010. Reject Ho at 0.025.
D. Do not reject H0 at 0.025. Reject Ho at 0.050.
4.41 (4B, 11/92, Q.22) (3 points)

You are given the following information for 10,000 risks grouped by number of claims.
• A Poisson distribution with mean λ was fit to the grouped risks.
• Minimum chi-square estimation has been used to estimate λ.
• The results are as follows:
Number of Claims Actual Number of Risks Estimated Number of Risks Using Poisson
0 7,673 7,788
1 2,035 1,947
2 262 243
3 or more 30 22
10,000 10,000
You are to use the Chi-square statistic to test the hypothesis, H0 , that the Poisson provides an
acceptable fit. Which of the following is true?

A portfolio of 5000 insureds are grouped by number of claims as follows:
0 4,101
1 806
2 85
3 8
The underlying distribution for number of claims is assumed to be Poisson with mean µ. Maximum
likelihood estimation is used to estimate µ for the above data.
4.42 (4B, 5/93, Q.16) (2 points) Determine the Chi-square statistic, χ2, using the above
information.
A. Less than 0.50
E. At least 3.50
4.43 (4B, 5/93, Q.17) (1 point) How many degrees of freedom are associated with the χ2 statistic
that was calculated for the previous question?
A. 1 B. 2 C. 3 D. 4 E. 5
4.44 (4B, 5/93, Q.18) (2 points) Determine the Kolmogorov-Smirnov statistic using the above
information.
A. Less than 0.0003
E. At least 0.0018
4.45 (4B, 5/94, Q.7) (2 points) You are given the following information for a portfolio of 10,000
risks grouped by number of claims:
A Poisson distribution is fitted to the grouped risks with these results:
Number of Risks
Number of Claims Actual Expected
0 9,091 9,048
1 838 905
2 51 45
3 or more 20 2
Total 10,000 10,000
Determine the Kolmogorov-Smirnov statistic for the fitted Poisson distribution.
A. Less than 0.001
E. At least 0.007
• The observed number of claims for a group of 1,000 risks has been recorded as
follows:
Number of Claims Number of Risks
0 729
1 242
2 29
3 or more 0
• The null hypothesis, H0 , is that the number of claims per risk follows a Poisson
distribution.
• A chi-square test is performed with three classes. The first class contains those
risks with 0 claims, the second contains those risks with 1 claim, and the third
contains those risks with 2 or more claims.
• The minimum chi-square estimate of the mean of the Poisson distribution is 0.3055.
• The observed number of claims for a group of 100 risks has been recorded as
follows:

0 80
1 20
• The null hypothesis, H0 , is that the number of claims per risk follows a Bernoulli
distribution with mean q.
• A chi-square test is performed.
Determine the smallest value of q for which H0 will not be rejected at the 0.01 significance level.
A. Less than 0.08
E. At least 0.11

• The observed number of claims for a group of 50 risks has been recorded as follows:
0 7
1 10
2 12
3 17
4 4
• The null hypothesis, H0 , is that the number of claims per risk follows a uniform distribution on
0, 1, 2, 3, and 4.
4.48 (4B, 5/98, Q.10) (2 points) A chi-square test is performed with five classes.
C. H0 will be rejected at the 0.05 significance level, but not at the 0.025 level.
4.49 (4B, 5/98, Q.11) (2 points) Two adjacent classes of the five classes above are combined,
and a chi-square test performed with four classes.
Determine which of the following combinations will result in a p-value of the Chi-Square test most
different from the one in the previous question.
A. Combining the risks with 0 claims and the risks with 1 claim
B. Combining the risks with 1 claim and the risks with 2 claims
C. Combining the risks with 2 claims and the risks with 3 claims
D. Combining the risks with 3 claims and the risks with 4 claims
E. Can not be determined.
Note: This exam question has been rewritten.
4.50 (4, 5/00, Q.29 & 4, 11/02, Q.28 & 2009 Sample Q. 47) (2.5 points) You are given the
following observed claim frequency data collected over a period of 365 days:
Number of Claims per Day Observed Number of Days
0 50
1 122
2 101
3 92
4+ 0
Fit a Poisson distribution to the above data, using the method of maximum likelihood.
Regroup the data, by number of claims per day, into four groups:
0 1 2 3+
Apply the chi-square goodness-of-fit test to evaluate the null hypothesis that the claims follow
a Poisson distribution.
Determine the result of the chi-square test.
4.51 (IOA 101, 4/01, Q.14) (12 points) Consider a group of 1000 policyholders, all of the same
age, and each of whose lives is insured under one or more policies. The following frequency
distribution gives the number of claims per policyholder in 1999 for this group.
Number of claims per policyholder (i) 0 1 2 3 ≥4
Number of policyholders (fi) 826 128 39 7 0
A statistician argues that an appropriate model for the distribution of X, the number of claims per
policyholder, is X ~ Poisson. Under this proposal, the frequencies expected are as follows (you are
not required to verify these):
Number of claims per policyholder 0 1 2 3 ≥4
Expected number of policyholders 796.9 180.9 20.5 1.6 0.1
A second statistician argues that a more appropriate model for the distribution of X is given by:
P(X = x) = p(1 - p)x, x = 0, 1, 2, ...
(i) (1.5 points) Without doing any further calculations, comment on the first statisticianʼs
proposed model for the data.
(ii) (3 points) Consider the second statistician's proposed model.
Verify that the mean of the distribution of X is (1 - p)/p and hence calculate the
method of moments estimate of p.
(Note: this estimate is also the maximum likelihood estimate.)
(iii) (2.25 points) Verify that the frequencies expected under the second statisticianʼs
proposed model are as follows:
Number of claims per policyholder 0 1 2 3 ≥4
Expected number of policyholders 815.0 150.8 27.9 5.2 1.2
(iv) (5.25 points) (a) Test the goodness-of-fit of the second statisticianʼs proposed model to
the data, quoting the p-value of your test statistic and your conclusion.
(b) Assuming that you had been asked to test the goodness-of-fit “at the 1% level”,
state your conclusion.
4.52 (4, 5/01, Q.19) (2.5 points)

During a one-year period, the number of accidents per day was distributed as follows:
Number of
Accidents Days
0 209
1 111
2 33
3 7
4 3
5 2
You use a chi-square test to measure the fit of a Poisson distribution with mean 0.60.
The minimum expected number of observations in any group should be 5.
The maximum possible number of groups should be used.
Determine the chi-square statistic.
(A) 1 (B) 3 (C) 10 (D) 13 (E) 32
4.53 (4, 11/01, Q.25 & 2009 Sample Q.71) (2.5 points) You are investigating insurance fraud
that manifests itself through claimants who file claims with respect to auto accidents with which they
were not involved. Your evidence consists of a distribution of the observed number of claimants per
accident and a standard distribution for accidents on which fraud is known to be absent.
The two distributions are summarized below:
Number of Claimants
per Accident Standard Probability Observed Number of Accidents
1 0.25 235
2 .35 335
3 .24 250
4 .11 111
5 .04 47
6+ .01 22
Total 1.00 1000
Determine the result of a chi-square test of the null hypothesis that there is no fraud in the observed
accidents.
4.54 (4, 11/03, Q.16 & 2009 Sample Q.13) (2.5 points)
A particular line of business has three types of claims.
The historical probability and the number of claims for each type in the current year are:
Historical Number of Claims
Type Probability in Current Year
A 0.2744 112
B 0.3512 180
C 0.3744 138
You test the null hypothesis that the probability of each type of claim in the current year is the
same as the historical probability.
Calculate the chi-square goodness-of-fit test statistic.
(A) Less than 9
(E) At least 12
4.55 (4, 11/05, Q.10 & 2009 Sample Q.222) (2.9 points)
1000 workers insured under a workers compensation policy were observed for one year. The
number of work days missed is given below:
Number of Days of Work Missed Number of Workers
0 818
1 153
2 25
3 or more 4
Total 1000
Total Number of Days Missed 230
The chi-square goodness-of-fit test is used to test the hypothesis that the number of work
days missed follows a Poisson distribution where:
(i) The Poisson parameter is estimated by the average number of work days missed.
(ii) Any interval in which the expected number is less than one is combined with the
previous interval.
Determine the results of the test.
significance level.
4.56 (CAS3L, 11/12, Q.23) (2.5 points)

A six-sided die is rolled 120 times with the following distribution of outcomes:
Outcome Frequency
1 15
2 13
3 28
4 25
5 12
6 27
The following hypothesis test has been set up:
H0 : The die is fair (outcomes are equally likely).
H1 : The die is not fair.
Determine the significance level at which one would reject the null hypothesis given the outcomes in
the table above.
A. Less than 0.5%
E. At least 5.0%
4.1. A. The observed mean is 32003 / 10000 = 3.200. Set mq = 8q = 3.2.

Thus q = 0.400.
Number of Claims Observed Observed times # of Claims
0 188 0
1 889 889
2 2123 4246
3 2704 8112
4 2362 9448
5 1208 6040
6 426 2556
7 88 616
8 12 96
Sum 10000 32003
Comment: This is also the solution for the Method of Maximum Likelihood.
4.2. C. One needs to compute 10000 times the fitted Binomial with parameters m = 8 and
q = 0.40. Then, in order to compute the Chi-Square one sums:
(observed number - fitted number)2 / fitted number
Fitted via Chi
Number of Method of Square
Claims Observed Moments
0 188 167.962 2.39
1 889 895.795 0.05
2 2123 2090.189 0.52
3 2704 2786.918 2.47
4 2362 2322.432 0.67
5 1208 1238.630 0.76
6 426 412.877 0.42
7 88 78.643 1.11
8 12 6.554 4.53
Sum 10000 10000 12.91
4.3. E. One tests the significance by using the Chi-Square Distribution for 9 - 1 - 1 = 7 degrees of
freedom, # of intervals - 1 - number of fitted parameters.
Using the 12.91 Chi-Square Statistic computed in the previous question, we reject at 10% and do
not reject at 5%, since 14.067 > 12.91 > 12.017.
4.4. A. Calculate the densities for the Binomial and then cumulate them to get the Distribution
Function. For example, F(2) = 0.0168 + 0.0896 + 0.2090 = 0.3154.
The Kolmogorov-Smirnov Statistic is computed by finding the maximum absolute difference
between the observed distribution function and the fitted distribution function:
Observed Method of Fitted
Number of Distribution Moments Distribution Absolute
Claims Observed Function Binomial Function Difference
0 188 0.0188 0.0168 0.0168 0.0020
1 889 0.1077 0.0896 0.1064 0.0013
2 2123 0.3200 0.2090 0.3154 0.0046
3 2704 0.5904 0.2787 0.5941 0.0037
4 2362 0.8266 0.2322 0.8263 0.0003
5 1208 0.9474 0.1239 0.9502 0.0028
6 426 0.9900 0.0413 0.9915 0.0015
7 88 0.9988 0.0079 0.9993 0.0005
8 12 1.0000 0.0007 1.0000 0.0000
This maximum absolute difference is 0.0046 and occurs at 2 claims.
Comment: Note that we do not attach any specific significance level as we did with the
Chi-Square Statistic. Nevertheless, even when dealing with discrete distributions, a smaller value of
the K-S Statistic does indicate a better fit.
4.5. D. By taking the average value of the number of claims, one can calculate that the first moment
is 7988 / 23,872 = 0.3346. Set the mean = rβ = 1.8β = 0.3346. ⇒ β = 0.3346/1.8 = 0.186.
Claims times
# of claims
0 17,649 0
1 4,829 4829
2 1,106 2212
3 229 687
4 44 176
5 9 45
6 4 24
7 1 7
8 1 8
Sum 23872 7988
4.6. D. One needs to compute 23,872 times the fitted Negative Binomial with parameters
r = 1.8 and β = 0.186. Then, in order to compute the Chi-Square one sums:
A B C D E
Number of Negative Binomial Fitted via Chi
Claims Observed r = 1.8, beta = .186 Method of Moments Square
0 17,649 0.73561 17,560.5 0.45
1 4,829 0.20766 4,957.2 3.32
2 1,106 0.04559 1,088.4 0.28
3 229 0.00906 216.2 0.76
4 44 0.00170 40.7 0.27
5 and over 15 0.00038 9.0 4.01
Sum 23872 1 23,872.0 9.08
Comment: We were told to group intervals so that the fitted column to has entries that are at least 5
on each row. If we instead grouped as “6 and over”, the observed column would have more than 5,
but not the fitted column.
4.7. E. We have a number of degrees of freedom =

(#intervals - 1) - (# fitted parameters) = (6 - 1) - 1 = 4.
of Freedom 0.100 0.050 0.025 0.010 0.005
4 7.779 9.488 11.143 13.277 14.860
The computed Chi-Square of 9.08 is less than 9.488 and greater than 7.779.
Do not reject at 5%. Reject at 10%.
4.8. B.
A B C
Number of Col. A
Claims Observed times Col. B
0 90,000 0
1 9,000 9000
2 700 1400
3 50 150
4 5 20
Sum 99755 10570
By taking the average value of the number of claims, one can calculate that the mean is
10570 / 99,755 = 0.10596.
Using the method of moments, match the first moment: λ = 0.106.
4.9. E. One needs to compute 99,755 times the fitted Poisson Distribution with parameter
λ = 0.106. Then, in order to compute the Chi-Square one sums:
Fitted Chi
Number of Poisson Fitted Square
Claims Observed Distribution Number
0 90,000 0.89942 89,722.1 0.86
1 9,000 0.09534 9,510.5 27.41
2 700 0.00505 504.1 76.17
3 and over 55 0.00018 18.3 73.66
Sum 99755.0 99755.0 178.1
on each row. The intervals for 3 and over were grouped together so as to get at least 5 expected
insureds. (The interval 4 and over would have had only 0.5 expected insureds.)
4.10. B. Number of degrees of freedom =

(number of intervals) -1 - (number of fitted parameters) = 4 - 1 - 1 = 2.
4.11. A. With two degrees of freedom, we reject at 0.005 since 178.1 > 10.597.
Comment: This data is fit well by a Negative Binomial rather than a Poisson Distribution; the variance
is significantly greater than the mean.
4.12. B. and 4.13. D. By taking the average value of the number of claims, one can calculate that
the first moment is: 10570 / 99,755 = 0.1060.
By taking the average value of the square of the number of claims observed for each driver, one can
calculate that the second moment is: 12330 / 99755 = 0.1236.
Thus the estimated variance is: 0.1236 - 0.10602 = 0.1124.
A B C D
Number of Col. A Square of Col. A
Claims Observed times Col. B times Col. B
0 90,000 0 0
1 9,000 9000 9000
2 700 1400 2800
3 50 150 450
4 5 20 80
Sum 99755 10570 12330
Using the method of moments one would try to match the first two moments by fitting the two
parameters of the Negative Binomial Distribution β and r.
One can write down two equations in two unknowns, by matching the mean and the variance:
mean = 0.1060 = rβ. variance = 0.1124 = rβ(1+β).
1 + β = Variance / Mean = 0.1124 / 0.1060 = 1.0604. ⇒ β = 0.060.

r = mean/β = 0.1060 / 0.0604 = 1.76.
4.14. A. One needs to compute 99,755 times the fitted Negative Binomial with parameters
r = 1.755 and β = 0.0604. Then, in order to compute the Chi-Square one sums:
Number of Fitted via Chi
Claims Observed Method of Square
Moments
0 90,000 89,998.5 0.0000
1 9,000 8,996.6 0.0013
2 700 705.9 0.0492
3 and over 55 54.0 0.0196
Sum 99755 99755 0.0700
on each row. The number of expected observations for 4 and more would be only 3.65, thus we
group the final interval as 3 and more.
4.15. E. Number of degrees of freedom = (number of intervals) - 1 - (number of fitted parameters)

= 4 - 1 - 2 = 1. With one degree of freedom, since .070 < 2.706, 0.100 < p.
Comment: In other words, we do not reject at 10% since .070 < 2.706. The p-value is the survival
of this Chi-Square Distribution at 0.0700. Using a computer, the p-value is 79.1%.
This data is fit very closely by a Negative Binomial.
4.16. C. The likelihood is the product of terms: f(0)90000 f(1)9000f(2)700f(3)50f(4)5 =

(1+β)-r90000 {rβ/(1+β)r+1}9000{r(r+1)β2 / 2(1+β)r+2}700
{r(r+1)(r+2)β3 / 6(1+β)r+3}50 {r(r+1)(r+2)(r+3) β4 / 24(1+β)r+4}5 .
Which is proportional to: β10570 r9755 (r+1)755 (r+2)55 (r+3)5 / (1+β)99755r+10570
Thus other than an additive constant the loglikelihood is:
10570ln(β) + 9755ln(r) + 755ln(r+1) + 55ln(r+2)+ 5ln(r+3) - (99755r + 10570)ln(1+β).
Comment: Maximizing the loglikelihood will also maximize the likelihood. Any terms which do not
depend on the parameters, do not affect which parameters produce the maximum likelihood.
4.17. E. The degrees of freedom are 7 - number of fitted parameters.

Geometric Distribution has 1 parameter so 6 d.f.; 14.449 < 15.5 < 16.812 so 1% < p < 2.5%.
Compound Poisson-Binomial Distribution has 3 parameter so 4 d.f.; 13.277 < 14.6 < 14.860 so
1/2% < p < 1%.
Mixed Poisson-Inverse Gaussian Distribution has 2 parameters so 5 d.f.;
11.070 < 12.7 < 12.832 so 2.5% < p < 5%. The larger the p-value, the better the fit.
Therefore, the fits from best to worst are: Mixed Poisson-Inverse Gaussian Distribution, Geometric,
Compound Poisson-Binomial Distribution or 3, 1, 2.
4.18. E. Chi-Square of 5.88 with 10 - 1 = 9 degrees of freedom. (There is an assumed rather than
fitted distribution. Thus there are no fitted parameters.)
Decade Observed Assumed Assumed Chi-Square
Number Distribution Number ((Observed - Assumed)^2)/Assumed
0 15 0.1 16.4 0.12
1 20 0.1 16.4 0.79
2 15 0.1 16.4 0.12
3 17 0.1 16.4 0.02
4 23 0.1 16.4 2.66
5 18 0.1 16.4 0.16
6 15 0.1 16.4 0.12
7 12 0.1 16.4 1.18
8 16 0.1 16.4 0.01
9 13 0.1 16.4 0.70
Sum 164 1 164.0 5.88
Since 5.88 < 14.684, therefore we do not reject at 10%.
Comment: The p-value is 75%; we do not reject the hypothesis that the expected frequency by
decade is the same. The data was taken from “A Macro Validation Dataset for U.S. Hurricane
Models”, by Douglas J. Collins and Stephen P. Lowe, CAS Forum, Winter 2001.
4.19. D. The overall mean is: 600/5045 = 0.119.

Claim data by region is mathematically the same as claim data by years.
When applied to years of data, the Method of Maximum Likelihood applied to the Poisson,
produces the same result as the Method of Moments. λ = 0.119.
In each case, the expected number of claims is the number of exposures times λ = 0.119.
Region Observed Number Exposures Expected Number ((Observed - Expected)^2)/Expected
1 124 1257 149.49 4.348
2 119 1025 121.90 0.069
3 180 1452 172.69 0.310
4 177 1311 155.92 2.851
Sum 600 5045 600 7.578
We have 4 years (regions) of data, and one fitted parameter,
so there are 4 - 1 = 3 degrees of freedom.
Since 6.251 < 7.578 < 7.815, H0 is rejected at the 10% significance level, but not at 5%.
4.20. C. In region 1, the number of claims is the sum of 1311 independent Binomial Distributions
each with the same parameters m = 2 and q, which is a Binomial frequency process with parameters
m = (2)(1311) = 2622 and q.
In region 1 the likelihood is f(170) = {(2622!)/(170!)(2622-170)!} q170 (1-q)2622-170.
ln f(170) = ln[2622!] - ln[170!] - ln[(2622-170)!] + 170lnq + (2622-170)ln(1-q).
Sum of the loglikelihoods is: ln[2514!] + ln[2050!] + ln[2904!] + ln[2622!] - ln[130!] - ln[120!] - ln[180!]
- ln[170!] - ln[(2514-130)!] - ln[(2050-120)!] - ln[(2904-180)!] - ln[(2622-170)!] + 600lnq +
{(2)(5045)-600} ln(1-q).
Taking the partial derivative with respect to q and setting it equal to zero:
600/q = {(2)(5045)-600}/(1-q). 600(1-q) = q((2)(5045)-600). q = 600/{(2)(5045)} = 0.0595.
In each case, the fitted number of claims is the number of exposures times m = 2 times q = 0.0595.
χ2 = ∑ (nk - Ek)2 / Vk , where the denominator is based on the variance of the fitted Binomial.
While the expected number would be the variance of a Poisson, the variance of a Binomial is:
mq(1-q)(total exposures) = (1-q) (Expected Number).
The Chi-Square Statistic would be somewhat bigger:
Region Observed Number Exposures Expected Number
1 124 1257 149.49 4.623
2 119 1025 121.90 0.073
3 180 1452 172.69 0.329
4 177 1311 155.92 3.031
Sum 600 5045 600 8.057
We have 4 years (regions) of data, and one fitted parameter,
so there are 4 - 1 = 3 degrees of freedom.
Since 7.815 < 8.057 < 9.348, H0 is rejected at the 5% significance level, but not at 2.5%.
Comment: The Chi-Square Statistic here with the Binomial assumption is that for the Poisson
divided by 1 - q: 8.057 = 7.578/(1 - 0.0595).
For years of data, for the Binomial with m fixed, method moments is equal to maximum likelihood.
4.21. A. The mean is: {(518228)(0) + (105070)(1) + ... + (1)(16)} / 709,460 = 0.48438.
The 2nd moment is: {(518228)(02 ) + (105070)(12 ) + ... + (1)(162 )} / 709,460 = 1.23442.
rβ = 0.48438, and rβ(1 + β) = 1.23442 - 0.484382 = 0.99980.
⇒ 1 + β = 0.99980/0.48438 = 2.0641. ⇒ β = 1.0641. ⇒ r = 0.48438/1.0641 = 0.4552.

So that each interval has at least 5 expected observations, we group 15 or more.
Number of Observed Fitted Fitted Chi
runs Number Neg. Bin. Number Square
0 518,228 0.71900940 510,108.4 129.2
1 105,070 0.16872853 119,706.1 1789.5
2 47,936 0.06328968 44,901.5 205.1
3 21,673 0.02670240 18,944.3 393.0
4 9,736 0.01189091 8,436.1 200.3
5 4,033 0.00546216 3,875.2 6.4
6 1,689 0.00256021 1,816.4 8.9
7 639 0.00121713 863.5 58.4
8 274 0.00058474 414.8 47.8
9 107 0.00028320 200.9 43.9
10 36 0.00013804 97.9 39.2
11 25 0.00006764 48.0 11.0
12 5 0.00003329 23.6 14.7
13 7 0.00001644 11.7 1.9
14 1 0.00000815 5.8 4.0
15 or more 1 0.00000808 5.7 3.9
Sum 709,460 1 709,460 2957.2
There are 16 intervals, and 16 - 1 - 2 = 13 degrees of freedom.
For 13 degrees of freedom, the critical value for 1/2% is 29.819.
Since 2957.2 > 29.819, reject at 1/2%!
Comment: Even though the variance is greater than the mean, the Negative Binomial is a terrible fit
to his data. The data is taken from “An Analytic Model for Per-inning Scoring Distributions,” by Keith
Woolner.
4.22. A. So that each interval has at least 5 expected observations, we group 4 or more.
Number of Observed Assumed Assumed Chi
Claims Number Density Number Square
1 4,121 0.89628 4,169.5 0.564
2 430 0.08963 417.0 0.408
3 71 0.01195 55.6 4.270
4 or more 30 0.00214 9.9 40.468
Sum 4,652 1 4,652 45.710
There are 4 intervals, and 4 - 1 = 3 degrees of freedom; the critical value for 0.5% is 12.838.
Since 12.838 < 45.710, reject at 0.5%.
Comment: Data set taken from page 301 of Insurance Risk Models by Panjer and WiIlmot.
A Logarithmic Distribution with β = 0.25.
4.23. B. For the Poisson, f(0) = e-0.4 = 0.670320. f(1) = 0.4e-0.4 = 0.268128.
f(2) = 0.42 e-0.4/2 = 0.053626. f(3) = 0.43 e-0.4/6 = 0.007150.
Prob[4 or more] = 1 - 0.670320 - 0.268128 - 0.053626 - 0.007150 = 0.000776.
There are a total of 10,000 observed insureds, and therefore, the expected number by class are:
6703.20, 2681.29, 536.26, 71.50, and 7.76.
Chi-square statistic is: Σ (Observed - Expected)2 /Expected = 10.058.
Class Observed Assumed Expected Chi-Square
Number Poisson Number ((Observed - Expected)^2)/Expected
A 6825 0.670320 6,703.20 2.213
B 2587 0.268128 2,681.28 3.315
C 499 0.053626 536.26 2.588
D 78 0.007150 71.50 0.591
E 11 0.000776 7.76 1.350
Sum 10000 1.000000 10,000.00 10.058
The largest contribution is from Class B: (2587 - 2681.28)2 /2681.28 = 3.315.
4.24. E. The mean is:

{(20,592)(0) + (2,651)(1) + (297)(2) + (41)(3) + (7)(4) + (1)(6)} / 23,589 = 0.14422.
The 2nd moment is:
{(20,592)(02 ) + (2,651)(12 ) + (297)(22 ) + (41)(32 ) + (7)(42 ) + (1)(62 )} / 23,589 = 0.18466.
rβ = 0.14422, and rβ(1 + β) = 0.18466 - 0.144222 = 0.16386.
⇒ 1 + β = 0.16386/0.14422 = 1.1362. ⇒ β = 0.1362. ⇒ r = 0.14422/0.1362 = 1.0589.

So that each interval has at least 5 expected observations, we group 4 or more.
Number of Observed Fitted Fitted Chi
Claims Number Neg. Bin. Number Square
0 20,592 0.873532 20,605.8 0.009
1 2,651 0.110881 2,615.6 0.480
2 297 0.013683 322.8 2.057
3 41 0.001672 39.5 0.061
4 or more 8 0.000232 5.5 1.181
Sum 23,589 1 23,589 3.788
There are 5 intervals, and 5 - 1 - 2 = 2 degrees of freedom.
For 2 degrees of freedom, the critical value for 10% is 4.605.
Since 3.788 < 4.605, do not reject H0 at 10%.
Comment: Taken from pages 302-303 of Insurance Risk Models by Panjer and WiIlmot.
See also Example 14.5 in Loss Models.
4.25. E. For the Geometric: f(0) = 1/1.1 = 0.909091, f(1) = 0.1/1.12 = 0.082645,
f(2) = 0.12 /1.13 = 0.007513.
Prob[3 or more] = 1 - 0.909091 - 0.082645 - 0.007513 = 0.000751.
Number Observed Assumed Expected Chi-Square
of Claims Number Geometric Number ((Observed - Expected)^2)/Expected
0 91,000 0.9090909 90,909.09 0.091
1 8,100 0.0826446 8,264.46 3.273
2 800 0.0075131 751.31 3.155
3 or more 100 0.0007513 75.13 8.231
Sum 100,000 1.0000000 100,000.00 14.750
4.26. C. The overall mean is: 760/10000 = 7.6%.

Year Observed Number Exposures Expected Number ((Observed - Expected)^2)/Expected
1 200 3000 228.00 3.439
2 250 3300 250.80 0.003
3 310 3700 281.20 2.950
Sum 760 10000 760 6.391
We have 3 years of data, so there are 3 - 1 = 2 degrees of freedom.
4.27. C. The overall mean is: 760/10000 = 7.6%.

produces the same result as the Method of Moments. λ = 0.076.
χ2 = ∑ (nk - Ek)2 / Vk , where the denominator is the variance based on the fitted Poisson:
λ(exposures) = Expected Number.
Year Observed Number Exposures Expected Number ((Observed - Expected)^2)/Expected
1 200 3000 228.00 3.439
2 250 3300 250.80 0.003
3 310 3700 281.20 2.950
Sum 760 10000 760 6.391
We have 3 years of data, and one fitted parameter, so there are 3 - 1 = 2 degrees of freedom.
Comment: The same answer as in the previous question. Similar to Example 16.8 in Loss Models.
4.28. D. The overall mean is: 760/10000 = 7.6%.

When applied to years of data, the Method of Maximum Likelihood applied to the Geometric,
produces the same result as the Method of Moments. β = 0.076.
In each case, the expected number of claims is the number of exposures times β = 0.076.
χ2 = ∑ (nk - Ek)2 / Vk , where the denominator is the variance based on the fitted Geometric:
β(1+β)(exposures) = (1+β)(Expected Number).
The Chi-Square Statistic would be somewhat smaller than in the Poisson case:
(200 - 228)2 /{(228)(1.076)} + (250 - 250.8)2 /{(250.8)(1.076)} + (310 - 281.2)2 /{(281.2)(1.076)}
= 5.939.
Since 4.605 < 5.939 < 5.991, H0 is rejected at the 10% significance level, but not at 5%.
Comment: The Chi-Square statistic here is that for the Poisson assumption divided by 1.076,
subject to rounding: 6.391/1.076 = 5.940.
Similar to Exercise 16.14 in Loss Models.
4.29. B. The last group is 5 and over, since 6 and over would have had only 2.4 expected drivers.
Number of Maximum Likelihood Chi
Claims Observed Negative Binomial Expected Square
0 17,649 0.7395056 17,653.5 0.001
1 4,829 0.2019838 4,821.8 0.011
2 1,106 0.0461271 1,101.1 0.021
3 229 0.0098458 235.0 0.155
4 44 0.0020281 48.4 0.403
5 and over 15 0.0005096 12.2 0.660
Sum 23,872 1 23,872.0 1.252
Comment: Data for female drivers in California, taken from Table 2 of “A Markov Chain Model of
Shifting Risk Parameters,” by Howard C. Mahler, PCAS 1997.
With 2 fitted parameters, we have 6 - 1 - 2 = 3 degrees of freedom.
For the Chi-Square with 3 d.f. the critical value for 10% is 6.251.
1.252 < 6.251. Thus we do not reject this fit at 10%.
4.30. B. For the Bernoulli, Method of Moments is equal to Maximum Likelihood.

The mean of the data is:
(842 + 1016 + 1258 + 1380 + 1594) / (9000 + 10,500 + 12,000 + 13,500 + 15,000) = 0.1015.
Thus q^ = 0.1015.
In each case, the expected number of claims is the number of exposures times 0.1015.
With years of data, χ2 = ∑ (nk - Ek)2 / Vk , where the denominator is the variance based on the
fitted Bernoulli: Vk = q (1-q) (exposures) = (1-q) (Expected Number) = (0.8985) (Expected
Number).
For example, for 2006: (842 - 913.50)2 / {(0.8985)(913.50)} = 6.229.
Year Observed Number Exposures Expected Number Chi-Square
2006 842 9,000 913.50 6.229
2007 1016 10,500 1065.75 2.585
2008 1258 12,000 1218.00 1.462
2009 1380 13,500 1370.25 0.077
2010 1594 15,000 1522.50 3.737
Sum 6090 60,000 6090 14.090
The Chi-Square Good-of-Fit statistic is: 14.090.
We have 5 years of data, so there are: 5 - 1 = 4 degrees of freedom.
Comment: We are really only assuming the same mean frequency for every exposure in every
year. Therefore, we do not lose a degree of freedom for the fitted parameter.
Similar to Example 16.8 in Loss Models.
4.31. A. The sample mean is 0.1631. Thus the fitted Poison has λ = 0.1631.
Then, in order to compute the Chi-Square one sums:
(observed number - fitted number)2 / fitted number.
Number of Chi
Accidents Observed Fitted Square
0 81,714.00 80,647.87 14.09
1 11,306.00 13,153.67 259.54
2 1,618.00 1,072.68 277.22
3 250.00 58.32 630.03
4 40.00 2.38 595.23
5 or more 7.00 0.08 600.68
Sum 94,935.00 94,935.00 2376.80
Degrees of freedom: 6 - 1 - 1 = 4. 0.5% critical value is 14.860.
2376.80 > 14.860, so reject at 0.5%.
Alternately, group the last three intervals:
Number of Chi
Accidents Observed Fitted Square
0 81,714.00 80,647.87 14.09
1 11,306.00 13,153.67 259.54
2 1,618.00 1,072.68 277.22
3 or more 297.00 60.78 918.16
Sum 94,935.00 94,935.00 1469.01
Degrees of freedom: 4 - 1 - 1 = 2. 0.5% critical value is 10.597.
1469.01 > 10.597, so reject at 0.5%.
Comment: Data taken from “Some Considerations on Automobile Rating Systems Utilizing
Individual Driving Records,” by Lester B. Dropkin, PCAS 1959. While the Poisson Distribution is a
terrible fit, this data would be fit well by a Negative Binomial Distribution.
4.32. B. The overall mean is: 900/19,000 = 4.7368%.

When applied to years of data, the Method of Maximum Likelihood applied to the Negative
Binomial with r fixed, produces the same result as the Method of Moments.
(1/2)β = 4.7368%. ⇒ β = 0.094736.
In each case, the expected number of claims is the number of exposures times rβ = 4.7368%.
χ2 = ∑ (nk - Ek)2 / Vk , where the denominator is the variance based on the fitted Negative
Binomial: β(1+β)(exposures) = (1+β)(Expected Number) = 1.09473(Expected Number).
The Chi-Square Statistic is:
(220 - 189.47)2 (210 - 213.16)2 (200 - 236.84)2 (270 - 260.52)2
+ + + = 10.086.
(189.47)(1.09473) (213.16)(1.09473) (236.84)(1.09473) (260.52)(1.09473)
Since 9.348 < 10.086 < 11.345, H0 is rejected at the 2.5% significance level, but not at 1%.
⎛ β ⎞x
⎜ ⎟
⎝ 1+ β ⎠
4.33. f(x) = , for x = 1, 2, 3,...
x ln(1+ β)
f(1) = (30/31) / ln(31) = 0.28181. f(2) = (30/31)2 / {2ln(31)} = 0.13636.

f(3) = f(2) (30/31)(2/3) = 0.08797. f(4) = f(3) (30/31)(3/4) = 0.06385.
Number of Observed Logarithmic Expected Chi
Countries Number Probability Number Square
1 250 0.2818129 281.813 3.591
2 132 0.1363611 136.361 0.139
3 95 0.0879749 87.975 0.561
4 69 0.0638527 63.853 0.415
5 59 0.0494344 49.434 1.851
6 43 0.0398664 39.866 0.246
7 32 0.0330689 33.069 0.035
8 25 0.0280019 28.002 0.322
9 26 0.0240877 24.088 0.152
10 29 0.0209796 20.980 3.066
11 16 0.0184571 18.457 0.327
12 20 0.0163732 16.373 0.803
13 13 0.0146262 14.626 0.181
14 15 0.0131434 13.143 0.262
15 11 0.0118714 11.871 0.064
16 13 0.0107705 10.770 0.462
17 15 0.0098099 9.810 2.746
18 9 0.0089660 8.966 0.000
19 4 0.0082201 8.220 2.167
20 5 0.0075572 7.557 0.865
21 to 25 28 0.0300103 30.010 0.135
26 to 30 20 0.0208871 20.887 0.038
31 to 35 15 0.0150263 15.026 0.000
36 to 40 13 0.0110680 11.068 0.337
41 to 45 11 0.0082978 8.298 0.880
46 to 50 9 0.0063070 6.307 1.150
more than 50 23 0.0231677 23.168 0.001
Sum 1000 1.0000000 1000.000 20.796
Comment: There are 27 intervals, and thus 26 degrees of freedom.
Using a computer, the probability-value is 75.2%; there is a good match.
4.34. E. There are 9 intervals and thus 8 degrees of freedom.

Number of Observed Expected Chi
Days Number Probability Number Square
1 165 15% 150.000 1.500
2 291 30% 300.000 0.270
3 163 17% 170.000 0.288
4 109 10% 100.000 0.810
5 70 8% 80.000 1.250
6 75 6% 60.000 3.750
7 44 4% 40.000 0.400
8 to 20 41 6% 60.000 6.017
21 or more 42 4% 40.000 0.100
Sum 1,000 100% 1,000.000 14.385
For each interval one computes:
(observed number - expected number)2 / (expected number).
For example, (165 - 150)2 / 150 = 1.5.
14.385 < 15.507. ⇒ Do not reject at 5%.
4.35. D. For the Zero-Truncated Binomial: f(1) = (6)(0.3)(0.75 )/(1 - 0.76 ) = 0.34286.
f(2) = (15)(0.32 )(0.74 )/(1 - 0.76 ) = 0.36735. f(3) = (20)(0.33 )(0.73 )/(1 - 0.76 ) = 0.20992.
f(4) = (15)(0.34 )(0.72 )/(1 - 0.76 ) = 0.06747. f(5) = (6)(0.35 )(0.7)/(1 - 0.76 ) = 0.01157.
f(6) = (0.36 )/(1 - 0.76 ) = 0.000826.
For example, (3560 - 3428.636)2 / 3428.636 = 5.033.
People per Car Number Probability Number Square
1 3,560 0.3428636 3,428.636 5.033
2 3,610 0.3673538 3,673.538 1.099
3 2,020 0.2099165 2,099.165 2.985
4 700 0.0674731 674.731 0.946
5 100 0.0115668 115.668 2.122
6 10 0.0008262 8.262 0.366
Sum 10,000 1 10,000.000 12.552
There are 6 intervals and thus 5 degrees of freedom.
11.070 < 12.552 < 12.833. ⇒ Reject at 5% but not at 2.5%.
4.36 D. For example, (169 - 150)2 / 150 = 2.407.

Length Observed Expected Chi
of Run Number Probability Number Square
1 169 0.50000 150.000 2.407
2 56 0.25000 75.000 4.813
3 42 0.12500 37.500 0.540
4 22 0.06250 18.750 0.563
5 or more 11 0.06250 18.750 3.203
Sum 300 1.00000 300.000 11.527
11.143 < 11.527 < 13.277. ⇒ Reject at 2.5%, but do not reject at 1%.
Comment: The distribution is a Zero-Truncated Geometric, with β = 1.
4.37. A. For the Zero-Truncated Poisson: f(1) = (1.5 e-1.5) / (1 - e-1.5) = 0.43083.
f(2) = f(1) (1.5)/2 = 0.32312. f(3) = f(2) (1.5)/3 = 0.16156.
The remaining probability goes with 4 or more.
For example, (90 - 86.165)2 / 86.165 = 0.171.
Cars per Policy Number Probability Number Square
1 90 0.4308254 86.165 0.171
2 60 0.3231190 64.624 0.331
3 40 0.1615595 32.312 1.829
4 or more 10 0.0844961 16.899 2.817
Sum 200 1.0000000 200.000 5.147
5.147 < 6.251. ⇒ Do not reject at 10%.
4.38. A. The expected number in each case is: 30/6 = 5.

Chi-Square Statistic = Σ (observed - expected)2 /expected =
(1 - 5)2 /5 + (4 - 5)2 /5 + (9 - 5)2 /5 + (9 - 5)2 /5 + (2 - 5)2 /5 + (5 - 5)2 /5 = 11.6.
There are 6 - 1 = 5 degrees of freedom.
The critical value for 5% is 11.07. The critical value for 2.5% is 12.83.
11.07 < 11.6 < 12.83. ⇒ Reject at 5% and do not reject at 2.5%.
Comment: The p-value is between 2.5% and 5%. The p-value is the S(11.6) for the Chi-Square
Distribution with 5 degrees of freedom, which using a computer turns out to be 4.07%.
4.39. A. For each interval one computes:

(observed number - fitted number)2 / fitted number.
Group the last two intervals so as to have at least 5 expected observations per interval.
Number of Chi
Claims Observed Fitted Square
0 800 818.7 0.43
1 180 163.7 1.61
2 or more 20 17.5 0.35
Sum 1000 1000.0 2.39
4.40. C. Group the last two intervals, so that one has at least 5 expected observations per interval,
as recommended. In that case, one has 5 intervals, and one has fit two parameters, therefore one
has 5 - 1 - 2 = 2 degrees of freedom. One computes the Chi-Square Statistic as 8.53 as shown
below. Since 8.53 > 7.378, one rejects at 2.5%. Since 8.53 < 9.210, one does not reject at 1%.
# Claims Observed Number Fitted Number ((Observed - Fitted)^2)/Fitted
0 8725 8738 0.02
1 1100 1069 0.90
2 135 162 4.50
3 35 26 3.12
4 & over 5 5 0.00
Sum 10000 10000 8.53
4.41. B. For each of the groupings one computes:

(observed number of risks - expected number of risks )2 / expected number of risks
Taking the sum one gets a Chi-Square Statistic of 10.07.
Number Observed Expected Chi
of Claims # Risks # Risks Square
0 7673 7788.00 1.70
1 2035 1947.00 3.98
2 262 243.00 1.49
3 or more 30 22.00 2.91
10000 10000 10.07
There are four groupings. Since one parameter (the Poisson distribution is described by one
parameter) was fit to the data, we have 4 - 1 - 1 = 2 degrees of freedom. Use the row of the table
for 2 degrees of freedom. Since 10.07 > 9.210 one can reject at α = 0.010.
On the other hand, 10.07 < 10.597, so one can not reject at α = 0.005.
Comment: The fitted Poisson has parameter 0.25, with density function:
Number of Claims 0 1 2 3 4 5
Probability 0.778801 0.194700 0.024338 0.002028 0.000127 0.000006
4.42. B. The first step is to estimate µ. For the Poisson Distribution, maximum likelihood is the same
as the method of moments. The observed mean is (806 + (2)(85) + (3)(8)) / 5000 =
1000 / 5000 = 0.2. Thus µ = 0.2. Then for example 5000f(2) = 5000(0.22 )e-0.2 / 2! = 81.87.
The final interval of three or more is obtained as: 5000 - (4093.65 + 818.73 + 81.87) = 5.74.
The Chi-Square statistic of 1.218 is computed as follows:
Result Observed Number Fitted Number ((Observed - Fitted)^2)/Fitted
0 4101 4093.65 0.013
1 806 818.73 0.198
2 85 81.87 0.119
3 or more 8 5.74 0.888
Sum 5000 5000.00 1.218
Comment: The solution is sensitive to rounding.
4.43. B. Degrees of freedom = (# of intervals) - 1 - (#estimated parameters) = 4 - 1 - 1 = 2.
4.44. D. First, one calculates the Empirical Distribution Function. For example at x = 2, the Empirical
Distribution Function is (4101 + 806 + 85) / 5000 = 0.9984. Then one calculates the fitted
distribution function by summing the density function for a Poisson with mean of 0.2.
For example, F(1) = f(0) + f(1) = e-0.2 + 0.2 e-0.2 = 0.9825.
One then computes the absolute difference of the empirical and fitted distribution functions.
The K-S statistic is the maximum absolute difference of 0.0015.
Empirical Fitted
Distribution Distribution Absolute
x Function Function Difference
0 0.8202 0.8187 0.0015
1 0.9814 0.9825 0.0011
2 0.9984 0.9989 0.0005
3 1.0000 0.9999 0.0001
Comment: For a discrete distribution such as the Poisson, one compares at the actual points.
For a continuous distribution, one would compare each fitted probability to the observed distribution
function just before and just after each observed claim value.
See “Mahlerʼs Guide to Fitting Loss Distributions.”
4.45. C. The maximum absolute difference between the empirical and fitted distributions occurs at
x = 0, and is 0.0043.
Empirical Fitted
0 0.9091 0.9048 0.0043
1 0.9929 0.9953 0.0024
2 0.9980 0.9998 0.0018
Comment: For a discrete distribution one compares at all the (available) points.
Due to the grouping of data, comparisons canʼt be made for x ≥ 3.
(Both distribution functions equal one at infinity.)
4.46. E. There are 3 intervals and weʼve fit one parameter, so we have 3 - 1 - 1 = 1 degree of
freedom. The chance of 0 claims is e-0.3055 = 0.73675.
The chance of 1 claim is: (0.3055)e-0.3055 = 0.22508.
The chance of 2 or more claims is: 1 - 0.73675 - 0.22508 = 0.03817.
Chi-Square Statistic is 3.556. 3.556 < 3.841, so do not reject at 5%.
Number Number Fitted ((Observed - Fitted)^2)/Fitted
of Claims of Risks Number Chi-Square
0 729 736.75 0.082
1 242 225.08 1.272
2 or more 29 38.17 2.203
Sum 1000.00 1000.00 3.556
Comment: I have used the groups that we were told to use.
4.47. E.
Number of Claims Observed Number Fitted Number ((Observed - Fitted)^2)/Fitted
0 80 100-100q (20-100q)^2 / (100-100q)
1 20 100q
There are 2 intervals, so we have 2 - 1 = 1 degrees of freedom. The critical value for a 1%
significance level is 6.635. The Chi-square statistic is computed as the sum of the two contributions:
(20-100q)2 {1/(100-100q) + 1/100q } = (2-10q)2 / {q(1-q)} .
Setting the Chi-Square Statistic equal to the critical value of 6.635, we solve for q:
(2-10q)2 / {q(1-q)} = 6.635. (2-10q)2 = 6.635q(1-q). 106.635q2 - 46.635q + 4 = 0.
Thus q = {46.635 ± 46.6352 - (4)(106.635)(4) } / {(2)(106.635)} = 0.2187 ± 0.1015
= 0.117 or 0.320.
We have a bad fit for q far from the observed mean of 0.2. Thus we reject the fit at 1% for
q < 0.117 or q > 0.320. Thus the smallest value of q for which H0 will not be rejected at the 0.01
significance level is about 0.117.
Comment: Note that we have assumed various values of q, rather than estimating q by fitting a
Bernoulli to this data. (For example, using the method of moments one would estimate
q = 20/100 = 0.2.) Therefore, we do not subtract any fitted parameters in order to determine the
number of degrees of freedom.
4.48. C. For 5 classes we have 5 -1 = 4 degrees of freedom. Chi-Square Statistic is 9.80.

Since 9.80 > 9.488, we reject at 5%; since 9.80 < 11.143, we do not reject at 2.5%.
0 7 10 0.90
1 10 10 0.00
2 12 10 0.40
3 17 10 4.90
4 4 10 3.60
Sum 50 50 9.80
4.49. D. From the solution to the previous question, the overwhelming majority of the
Chi-Square Statistic came from the last two intervals. In addition, the differences between fitted and
observed were in opposite directions. Thus combining the risks with 3 claims and the risks with 4
claims greatly reduces the Chi-Square Statistic.
0 7 10 0.90
1 10 10 0.00
2 12 10 0.40
3 or 4 21 20 0.05
Sum 50 50 1.35
For 4 classes we have 4 - 1 = 3 degrees of freedom. Chi-Square Statistic is 1.35, which is less than
6.251, so p-value > 10%. Thus combining the risks with 3 claims and the risks with 4
claims results in a much different p-value from the p-value in the previous question, which was
between 2.5% and 5%. The other listed combinations do not alter the p-value as much as does
combining the risks with 3 claims and the risks with 4 claims.
A. Combining the risks with 0 claims and the risks with 1 claims:
0 or 1 17 20 0.450
2 12 10 0.400
3 17 10 4.900
4 4 10 3.600
Sum 50 50 9.350
Since 9.348 < 9.350 < 11.345, the p-value is between 1% and 2.5%.
B. Combining the risks with 1 claims and the risks with 2 claims:
0 7 10 0.90
1 or 2 22 20 0.20
3 17 10 4.90
4 4 10 3.60
Sum 50 50 9.60
Since 11.345 > 9.60 > 9.348, the p-value is between 1% and 2.5%.
C. Combining the risks with 2 claims and the risks with 3 claims:
0 7 10 0.90
1 10 10 0.00
2 or 3 29 20 4.05
4 4 10 3.60
Sum 50 50 8.55
Since 9.348 > 8.55 > 7.815, the p-value is between 2.5% and 5%.
Comment: Combining classes after one has seen the data solely in order to decrease the
computed Chi-Square Statistic, would defeat the whole purpose of the test.
4.50. C. For the Poisson, the method of Maximum Likelihood is equal to the Method of Moments.
The observed mean is:
{(0)(50) + (1)(122) + (2)(101) + (3)(92)} / (50 + 122 + 101 + 92) = 600/365 = 1.644.
Thus λ = 1.644. Thus for example, the fitted number of days with 2 claims is:
(365)(e−λλ 2 /2!) = (365)(e-1.6441.6442 /2) = (365)(0.2611) = 95.30.

Using the groupings specified in the question, the Chi-Square Statistic is 7.55.
Number of Claims Observed Fitted Poisson Fitted Number ((Observed - Fitted)^2)/Fitted
0 50 19.32% 70.52 5.97
1 122 31.76% 115.93 0.32
2 101 26.11% 95.30 0.34
3 or more 92 22.81% 83.25 0.92
Sum 365 1 365 7.55
For 4 classes and one fitted parameter we have 4 - 1 - 1 = 2 degrees of freedom.
Since 7.55 > 7.378, we reject at 2.5%; since 7.55 < 9.210, we do not reject at 1%.
Comment: 7.378 < 7.55 < 9.210. Reject to the left (at 100% - 97.5% = 2.5%) and do not reject to
the right (at 100% - 99% = 1%.) Since an interval of 6 or more would only have about 2.5 fitted
insureds, the six groupings below are what I would have used in the absence of any special
instructions, such as those that were given in this question.
0 50 70.52 5.97
1 122 115.93 0.32
2 101 95.30 0.34
3 92 52.22 30.30
4 0 21.46 21.46
5 or more 0 9.56 9.56
Sum 365 365 67.95
For 6 - 1 - 1 = 4 degrees of freedom, one would reject at 0.5%.
The assumed distribution does a terrible job of fitting the data in the righthand tail!
4.51. (i) The Poisson has way too large a probability of one claim, while greatly underestimating the
number of insureds who have multiple claims. The Poisson is a poor match for this data.
(ii) E[X] = 0p + (1)p(1 - p) + 2p(1 - p)2 + 3p(1- p)3 + ...
= p(1 - p){{1 + (1 - p) + (1 - p)2 + ...} + {(1 - p) + (1 - p)2 + ...} + {(1 - p)2 + (1 - p)3 + ...} + ...}
= p(1 - p){1/(1 - (1-p)) + (1-p)/(1 - (1-p)) + (1-p)2 /(1 - (1-p)) + ...}
= (1 - p){1 + (1-p) + (1-p)2 + ...} = (1 - p){1/(1 - (1-p))} = (1 - p)/p.
Alternately, this is a Geometric Distribution with: (1 - p) = β/(1 + β). ⇒ Mean = β = (1- p)/p.
X = {(0)(826) + (1)(128) + (2)(39) + (3)(7)}/1000 = 0.227. (1 - p)/p = 0.227. ⇒ p = 0.815.

(iii) f(0) = p = 0.815. f(1) = p(1 - p) = 0.150775. f(2) = p(1 - p)2 = 0.02789.
f(3) = p(1 - p)3 = 0.005160. The expected number policyholders with four or more claims is:
(1000){1 - (0.815 + 0.150775 + 0.02789 + 0.005160)} = 1.2.
Multiplying by 1000 one gets the other expected numbers of policyholders:
815.0 @ 0, 150.8 @ 1, 27.9 @ 2, and 5.2 @ 3.
(iv) (a) The number of degrees of freedom is: 5 - 1 - 1 = 3.
Number Number Geometric Fitted Chi-Square =
of of Insureds Distribution Number (observed # - fitted #)^2 / fitted #
Claims Observed of Insureds
0 826 0.81500 815.00 0.148
1 128 0.15077 150.78 3.440
2 39 0.02789 27.89 4.422
3 7 0.00516 5.16 0.656
4 and more 0 0.00117 1.17 1.171
Sum 1000 1.00000 1,000.00 9.838
For 3 degrees of freedom, the 2.5% critical value is 9.348, while the 1% critical value is 11.345.
Since 9.348 < 9.838 < 11.345, the p-value is between 1% and 2.5%.
(b) Since 9.838 < 11.345, we do not reject the fit at the 1% significance level.
(Since 9.348 < 9.838, we reject the fit at the 2.5% significance level.)
Alternately, since the p-value is greater than 1%, we do not reject the fit at 1%.
Comment: In part (iv) one could combine the rows 3 and 4 or more, and arrive at the same
conclusion, based on a Chi-Square Statistic of 8.082 for 2 degrees of freedom.
Number Number Geometric Fitted Chi-Square =
of of Insureds Distribution Number (observed # - fitted #)^2 / fitted #
Claims Observed of Insureds
0 826 0.81500 815.00 0.148
1 128 0.15077 150.78 3.440
2 39 0.02789 27.89 4.422
3 or more 7 0.00633 6.33 0.071
Sum 1000 1.00000 1,000.00 8.082
4.52. B. The fitted number of days with n accidents is: (365)(e-0.60.6n /n!).
Number Number Fitted Fitted Chi-Square =
of of Days Poisson Number (observed # - fitted #)^2 / fitted #
Claims Observed Distribution of Days
0 209 0.549 200.316 0.38
1 111 0.329 120.190 0.70
2 33 0.099 36.057 0.26
3 and + 12 0.023 8.437 1.50
Sum 365 1.000 365.000 2.84
Chi-Square Statistic is: (209 - 200.316)2 /200.316 + (111 - 120.190)2 /120.190 +
(33 - 36.057)2 /36.057 + (12 - 8.437)2 /8.437 = 2.84.
Comment: In general, you should compute your Chi-Square Statistics to more accuracy than the
nearest integer. (I have shown a little more accuracy than needed.) An interval of 4 and over would
have had only 1.2 expected observations. Thus the final group used is “3 and over”.
This final groupʼs expected number of days is: 365 - (200.316 + 120.190 + 36.057) = 8.437.
4.53. A. Since there are no fitted parameters, there are 6 - 1 = 5 degrees of freedom.
# Observed Standard Expected ((Observed - Expected)^2)/Expected
Claimants Number Probability Number
1 235 25% 250 0.90
2 335 35% 350 0.64
3 250 24% 240 0.42
4 111 11% 110 0.01
5 47 4% 40 1.23
6 or more 22 1% 10 14.40
Sum 1000 100% 1000 17.59
Since 17.59 > 16.750, reject the null hypothesis at 1/2%.
4.54. B. For example, (0.2744)(430) = 117.99. (112 - 117.99)2 /117.99 = 0.304.

Number Historical Assumed (Observed - Assumed)^2/Assumed
Type of Claims Probability Number Chi-Square
A 112 0.2744 117.99 0.304
B 180 0.3512 151.02 5.563
C 138 0.3744 160.99 3.284
Sum 430.00 1.00 430.00 9.151
Comment: For 3 - 1 = 2 d.f., 7.378 < 9.151 < 9.210; reject at 2.5%, but not at 1%.
4.55. E. The estimate of λ is: 230/1000 = 0.23.

Number Number Fitted Fitted Chi-Square =
of of Workers Poisson Number (observed # - fitted #)^2 / fitted #
Days Observed Distribution of Days
0 818 0.79453 794.53 0.69
1 153 0.18274 182.74 4.84
2 25 0.02102 21.02 0.76
3 and + 4 0.00171 1.71 3.08
Sum 1000 1.000 1,000.00 9.36
We have 4 intervals and fit 1 parameter, and thus there are 4 - 1 - 1 = 2 degrees of freedom.
9.210 < 9.36 < 10.597, reject H0 at 1% (but not at 1/2%.)
Comment: The CAS/SOA also accepted choice D, presumably to allow for intermediate rounding
in computing the Chi-Square Statistic. For example, if one rounds the fitted values to the nearest
tenth, 794.5, 182.7, 21.0, and 1.8, where the final value is gotten by subtraction from 1000, then the
computed statistic is instead 8.97.
In general, when computing the Chi-Square Statistic avoid intermediate rounding.
Since the expected number in each interval is at least one, bullet number ii has no effect.
4.56. C. The expected values are each equal to: 120/6 = 20.
Chi-Square statistic is:
(15 - 20)2 (13 - 20)2 (28 - 20)2 (25 - 20)2 (12 - 20)2 (27 - 20)2
+ + + + + = 13.80.
20 20 20 20 20 20
There are 6 - 1 = 5 degrees of freedom.
Since 12.83 < 13.80 < 15.09, reject at 2.5% but not 1%.
Comment: Using a computer the p-value is 1.69%.
2016-C-5, Fitting Frequency § 5 Likelihood Ratio Test, HCM 10/27/15, Page 154
Section 5, Likelihood Ratio Test44
Another way to rank fits of distributions is to compare the likelihoods or the loglikelihoods.
As discussed previously, the larger the likelihood or loglikelihood the better the fit, all other things
being equal. While more parameters usually results in a better fit, the improvement from additional
parameters may or may not be significant. It turns out that one can use the Chi-Square Distribution
to test whether a fit is significantly better.
For example, assume we have fit via Maximum Likelihood a Negative Binomial Distribution and a
Geometric Distribution to the same data. Assume the loglikelihoods are:
Geometric: -3112.2 and Negative Binomial: -3110.4.
The Negative Binomial has a larger loglikelihood, which is not surprising since the Geometric is a
special case of the Negative Binomial, with r = 1. The Negative Binomial has two parameters and
thus a greater ability to fit the peculiarities of a particular data set.
In order to determine whether the Negative Binomial fit is significantly better, we take twice the
difference of the loglikelihood: 2{-3110.4 - (-3112.2)} = 3.6.
Since, the difference in the number of parameters is one, we compare to the

Chi-Square Distribution with one degree of freedom:
0.100 0.050 0.025 0.010 0.005

2.706 3.841 5.024 6.635 7.879
Since 2.706 < 3.6 < 3.841, we reject at 10%, but not at the 5% significance level, the hypothesis
that the Geometric is a more appropriate model than the Negative Binomial.
The Principle of Parsimony says that one should use the smallest number of parameters that get the
job done.
At a 5% level of significance, the one parameter Geometric Distribution is preferred to the two
parameter Negative Binomial Distribution, even though the Negative Binomial Distribution has a
somewhat larger loglikelihood. At the 5% significance level, the improvement in the loglikelihood
would have to be somewhat larger in order to abandon the simpler Geometric model in favor of the
more complicated Negative Binomial model.
44
The Likelihood Ratio Test is also discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
On the other hand, at a 10% level of significance, the two parameter Negative Binomial Distribution
is preferred to the one parameter Geometric Distribution. At the 10% significance level, the
improvement in the loglikelihood is large enough in order to abandon the simpler Geometric model
in favor of the more complicated Negative Binomial model.
In general the Likelihood Ratio Test (or Loglikelihood Difference Test) proceeds as follows:45
1. One has two distributions, one with more parameters than the other, both fit to the same data
via Maximum Likelihood.
2. One of the distributions is a special case of the other, with fewer parameters.46
3. One computes twice the difference in the loglikelihoods.47
4. One compares the result of step 3 to the Chi-Square Distribution, with a number of degrees of
freedom equal to the difference in the number of fitted parameters of the two distributions.
5. One draws a conclusion as to whether the more general distribution fits significantly better than its
special case. H0 is that the distribution with fewer parameters is appropriate.
The alternative hypothesis H1 is that the distribution with more parameters is appropriate.
Unlike some other hypothesis tests, the likelihood ratio test is set up to compare two possibilities.
H0 : We use the simpler distribution with fewer parameters.
H1 : We use the more complicated distribution with more parameters.
We always prefer a model with fewer parameters, unless a model with more parameters is a
significantly better fit.
For example,
H0 : Data was drawn from a Geometric Distribution (Negative Binomial with r = 1).
H1 : Data was drawn from a Negative Binomial.
The best fitting Negative Binomial has to have a loglikelihood that is greater than or equal to that of
the best fitting Geometric.
45
Note that the twice the difference of the loglikelihoods approaches a Chi-Square Distribution as the data set gets
larger and larger. Thus one should be cautious about drawing any conclusion concerning fits to small data sets.
46
This test is sometimes applied by actuaries when one distribution is the limit of the other.
For example, the Poisson is a limit of Negative Binomial Distributions.
Loss Models at pages 339 states that in this case the test statistic has a mixture of Chi-Square Distributions.
47
Equivalently, one computes twice the log of the ratio of the likelihoods.
Exercise: Both a Geometric Distribution and a mixed Poisson-Transformed Gamma Distribution (a

Poisson mixed via a Transformed Gamma Distribution) have been fit to the same data via the
Method of Maximum Likelihood. The loglikelihoods are -1737.3 for the mixed Poisson-Transformed
Gamma (3 parameters) and -1740.8 for the Geometric (1 parameter).
Use the Likelihood Ratio Test to determine whether the mixed
Poisson-Transformed Gamma fits this data significantly better than the Geometric.
[Solution: The Geometric is a special case of the Negative Binomial. A mixed
Poisson-Gamma is a Negative Binomial. Thus a Negative Binomial is a special case of a mixed
Poisson-Transformed Gamma Distribution. Therefore, a Geometric is a special case of a mixed
Poisson-Transformed Gamma Distribution. The Geometric has one parameter while the mixed
Poisson-Transformed Gamma Distribution has three parameters, those of the Transformed Gamma.
Therefore we compare to the Chi-Square Distribution with 3 - 1 = 2 degrees of freedom.

of Freedom 0.100 0.050 0.025 0.010 0.005
2 4.605 5.991 7.378 9.210 10.597
Twice the difference of the loglikelihoods is: 2{-1737.3 - ( -1740.8)} = 7.0.

5.991 < 7.0 < 7.378. Thus we do not reject the hypothesis of a better fit for the Geometric at a
2.5% level and reject at the 5% level. In other words, at the 2.5% level we do not reject the null
hypothesis that the simpler Geometric model is appropriate rather than the alternative hypothesis
that the more complex mixed Poisson-Transformed Gamma model is appropriate.]
Exercise: One observes 663 claims from 2000 insureds. A Negative Binomial distribution is fit via
maximum likelihood, obtaining β = 0.230 and r = 1.442 with corresponding maximum loglikelihood of
-1793.24. One then takes r = 1.5 and finds the maximum likelihood β is 0.221, with corresponding
maximum loglikelihood of -1796.11.
Use the likelihood ratio test in order to test the hypothesis that r = 1.5.
[Solution: The Negative Binomial with r and β has two parameters, while the Negative Binomial with
r fixed at 1.5 is a special case with one parameter β. The difference in number of parameters is 1.
We compare to the Chi-Square Distribution with one degree of freedom.
Twice the difference in loglikelihoods is: 2{ -1793.24 - (-1796.11)} = 5.74.
5.024 < 5.74 < 6.635 ⇒ reject at 2.5% and do not reject at 1%, the hypothesis that r = 1.5.
Comment: See 4, 11/03, Q.28, involving a Pareto Distribution, in “Mahlerʼs Guide to Loss
Distributions.” H0 is that the distribution with fewer parameters is appropriate ⇔ r = 1.5.]
Exercise: One observes 2100 claims on 10,000 exposures. You fit a Poisson distribution via
maximum likelihood. Use the likelihood ratio test in order to test the hypothesis that λ = 0.2.
[Solution: The maximum likelihood fit is: λ = 2100/10000 = 0.21.
For 10,000 exposures we have a Poisson with mean 10000λ.
Loglikelihood = ln f(2100) = ln[e-10000λ (10000λ)2100 / 2100!] =

-10000λ + 2100ln(λ) + 2100ln(10000) - ln(2100!).
For λ = 0.2, loglikelihood = (-10000)(0.2) + 2100ln(0.2) + 2100ln(10000) - ln(2100!) =
-5379.820 + 2100ln(10000) - ln(2100!).
For λ = 0.21, loglikelihood = (-10000)(0.21) + 2100ln(0.21) + 2100ln(10000) - ln(2100!) =
-5377.360 + 2100ln(10000) - ln(2100!).
Twice the difference in loglikelihoods is: 2{-5377.360 - (-5379.820)} = 4.92.
The restriction that λ be 0.2 is one dimensional.
Alternately, the Poisson with λ unknown has one parameter, while the Poisson with λ = 0.2 is a
special case with zero parameters; the difference in number of parameters is 1.
In any case, we compare to the Chi-Square Distribution with one degree of freedom.
3.841 < 4.92 < 5.024 ⇒ reject at 5% and do not reject at 2.5%, the hypothesis that λ = 0.2.]
Testing Other Hypothesis:
Assume you observe 100,000 claims in Year 1 and 92,000 claims in Year 2.
Assume that claims the first year are Poisson with parameter λ1.
Assume that claims the second year are Poisson with parameter λ2.
Using maximum likelihood to estimate λ1 is the same as the method of moments.
Estimated λ1 = 100,000. Year 1 maximum loglikelihood is: -λ1 + 100000ln(λ1) - ln(100000!) =

-100000 + 100000ln(100000) - ln(100000!) = 1,051,292.55 - ln(100000!).
^
Similarly applying maximum likelihood to the data for Year 2: λ2 = 92,000.
Year 2 maximum loglikelihood is: -92000 + 92000ln(92000) - ln(92000!) =

959,518.03 - ln(92000!).
As discussed previously, instead of separately estimating λ1 and λ2, one can assume some sort of
relationship between them. For example, let us assume λ2 = 0.9λ1.

For a Poisson Distribution, f(x) = e−λλ x/x!. ln f(x) = -λ + xln(λ) - ln(x!).

Year 1 Loglikelihood is: -λ1 + 100000ln(λ1) - ln(100000!).
Assuming λ2 = 0.9λ1, Year 2 Loglikelihood is: -λ2 + 92000ln(λ2) - ln(92000!) =
-0.9λ1 + 92000ln(.9λ1) - ln(92000!) = -0.9λ1 + 92000ln(λ1) + 92000ln(0.9) - ln(92000!).

Total Loglikelihood =
-λ1 + 100000ln(λ1 ) - ln(100000!) - 0.9λ1 + 92000ln(λ1) + 92000ln(0.9) - ln(92000!) =
-1.9λ1 + 192000ln(λ1) - ln(100000!) + 92000ln(0.9) - ln(92000!) .
Setting the partial derivative with respect to λ1 equal to zero:
0 = -1.9 + 192000/λ1. ⇒ λ1 = 192000/1.9 = 101,052.63.

Maximum loglikelihood is: -1.9(101,052.63) + 192000ln(101,052.63) - ln(100000!)
+ 92000ln(0.9) - ln(92000!) = 2,010,799.01 - ln(100000!) - ln(92000!).
The unrestricted maximum loglikelihood is:

1,051,292.55 - ln(100000!) + 959,518.03 - ln(92000!) =
2,010,810.58 - ln(100000!) - ln(92000!), better than the restricted maximum loglikelihood of
2,010,799.01 - ln(100000!) - ln(92000!).
It is not surprising that without the restriction we can do a somewhat better job of fitting the data. The
unrestricted model involves two Poissons, while the restricted model is a special case in which one
of the Poissons has 0.9 times the mean of the other.
Let the null hypothesis H0 be that λ2 = 0.9λ1. Let the alternative H1 be that H0 is not true.
Then we can use the likelihood ratio test as follows.
We use the loglikelihood for the unrestricted model of 2,010,810.58 - ln(100000!) - ln(92000!), and
the loglikelihood for the restricted model of 2,010,799.01 - ln(100000!) - ln(92000!).
The test statistic is as usual twice the difference in the loglikelihoods:

(2)(2,010,810.58 - ln(100000!) - ln(92000!) - {2,010,799.01 - ln(100000!) - ln(92000!)}) = 23.14.
We compare to the Chi-Square Distribution with one degree of freedom, since the restriction is one
dimensional. Since 7.879 < 23.14, we reject H0 at 0.5%.
Problems:
5.1 (1 point) For some data, a Geometric distribution and a Negative Binomial distribution have
each been fit by maximum likelihood.
The fitted Geometric has a loglikelihood of -725.3, and the fitted Negative Binomial has a
loglikelihood of -722.9.
Treating the Geometric distribution as the null hypothesis which of the following is true?
A. H0 is rejected at the 0.005 significance level.
B. H0 is rejected at the 0.010 significance level, but not at the 0.005 level.
C. H0 is rejected at the 0.025 significance level, but not at the 0.010 level.
D. H0 is rejected at the 0.050 significance level, but not at the 0.025 level.
E. H0 is not rejected at the 0.050 significance level.
5.2 (1 point) A Binomial with m = 6 distribution and a mixture of two Binomial distributions each with
m = 6 have been fit by maximum likelihood to the same data.
The fitted Binomial has a loglikelihood of -1052.73,
and the fitted mixture of Binomials has a loglikelihood of -1049.16.
5.3 (3 points) You have the following data from the state of West Carolina:
Region Number of Claims Number of Exposures Claim Frequency
Rural 5000 250,000 2.000%
Urban 10,000 320,000 3.125%
You assume that the distribution of numbers of claims is Poisson.
Based on data from other states, you assume that the mean claim frequency for Urban insureds is
1.5 times that for Rural insureds.
Let H0 be the hypothesis that the mean claim frequency in West Carolina for Urban is 1.5 times that
for Rural. Using the likelihood ratio test, one tests the hypothesis H0 .
A. Reject H0 at 1/2%.
B. Reject H0 at 1%. Do not reject H0 at 1/2%.
C. Reject H0 at 2.5%. Do not reject H0 at 1%.
D. Reject H0 at 5%. Do not reject H0 at 2.5%.
E. Do not reject H0 at 5%.
5.4 (3 points) To the following data on the number of claims from each of 1000 policyholders,
a Geometric distribution and a Negative Binomial distribution have each been fit by maximum
likelihood.
Expected Number of Expected Number of
Number Number of Policyholders based Policyholders based on the
of Claims Policyholders on the Fitted Geometric Fitted Negative Binomial
0 889 892.857 889.401
1 103 95.663 102.226
2 7 10.250 7.841
3 1 1.098 0.501
4 or more 0 0.132 0.031
Using the likelihood ratio test, which of the following is true?
A. H0 is rejected at the 1% significance level.
B. H0 is rejected at the 2.5% significance level, but not at the 1% level.
C. H0 is rejected at the 5% significance level, but not at the 2.5% level.
D. H0 is rejected at the 10% significance level, but not at the 5% level.
E. H0 is not rejected at the 10% significance level.
5.5 (1 point) A Negative Binomial Distribution and a Geometric Distribution have each been fit by
maximum likelihood to the same accident data.
The fitted Negative Binomial has a loglikelihood of -2602.78,
and the fitted Geometric Distribution has a loglikelihood of -2604.56.
If one had had twice as much data, with the same proportion of insureds with a given number of
accidents, which of the following would have been the conclusion of the likelihood ratio test?
5.6 (3 points) You have the following data on automobile insurance theft claims:
Color of Car Number of Claims Number of Cars
Black 1200 30,000
Other 2250 75,000
You assume that the distribution of numbers of claims for each car is Geometric.
Using the likelihood ratio test, one tests the hypothesis H0 that the mean claim frequency for black
cars is 25% more than that for the other colored cars.
A. Reject H0 at 1%.
B. Reject H0 at 2.5%. Do not reject H0 at 1%.
C. Reject H0 at 5%. Do not reject H0 at 2.5%.
D. Reject H0 at 10%. Do not reject H0 at 5%.
5.7 (3 points) To the following data on the number of claims from each of 400 policyholders,
a Poisson distribution and a mixture of two Poisson distributions have each been fit by maximum
likelihood.
Expected Number of Expected Number of
Number Number of Policyholders based Policyholders based on the
of Claims Policyholders on the Fitted Poisson Fitted Mixture of Two Poissons
0 111 100.883 111.454
1 135 138.967 133.527
2 85 95.713 86.771
3 43 43.948 41.971
4 17 15.135 17.180
5 6 4.170 6.251
6 2 0.957 2.043
7 1 0.188 0.599
8 or more 0 0.039 0.204
Using the likelihood ratio test, which of the following is true?
5.8 (3 points) You are given the following data from 500 insurance policies:
0 314
1 105
2 56
3 21
4 4
H0 : The data was drawn from a Negative Binomial Distribution with r = 3.
H1 : The data was drawn from a Negative Binomial Distribution.
The maximum likelihood Negative Binomial has r = 1.207, β = 0.490, and loglikelihood -525.010.
What is the probability value of the Likelihood Ratio Test?
A. 10% B. 5% C. 2.5% D. 1% E. 0.5%
0 900
1 80
2 15
3 5
4+ 0
The null hypothesis is that the data was drawn from a Poisson Distribution with λ = 10%.
The alternate hypothesis is that the data was drawn from the maximum likelihood Poisson.
Using the likelihood ratio test, what is the conclusion?
0 657
1 233
2 79
3 27
4 3
5 1
H0 : The data was drawn from a Geometric Distribution.
H1 : The data was drawn from a Negative Binomial Distribution.
The maximum likelihood Negative Binomial has r = 1.55 and β = 0.315.
What is the conclusion of the Likelihood Ratio Test?
5.11 (4, 5/01, Q.20) (2.5 points) During a one-year period, the number of accidents per day was
distributed as follows:
Number of
Accidents Days
0 209
1 111
2 33
3 7
4 3
5 2
For these data, the maximum likelihood estimate for the Poisson distribution is
^ ^ ^
λ = 0.60, and for the negative binomial distribution, it is r = 2.9 and β = 0.21.
The Poisson has a negative loglikelihood value of 385.9, and the negative binomial has a negative
loglikelihood value of 382.4.
Determine the likelihood ratio test statistic, treating the Poisson distribution as the null hypothesis.
(A) -1 (B) 1 (C) 3 (D) 5 (E) 7
5.1. D. The likelihood ratio test statistic = twice the difference in the loglikelihoods =
(2){(-722.9) - (-725.3)} = 4.8. Since the Negative Binomial has one more parameter, compare to
the Chi-Square distribution for one degree of freedom. Since 3.841 < 4.8 < 5.024, reject at 5%
and do not reject at 2.5%, the null hypothesis that the simpler Geometric Distribution should be
used rather than the more complicated Negative Binomial.
5.2. D. The likelihood ratio test statistic = twice the difference in the loglikelihoods =
(2){(-1049.16) - (-1052.73)} = 7.14.
The mixture has two more parameters: an extra q as well as the weight to the first component.
Thus compare to the Chi-Square distribution for two degrees of freedom.
Since 5.991 < 7.14 < 7.378, reject at 5% and do not reject at 2.5%, the null hypothesis that the
simpler Binomial Distribution should be used rather than the more complicated mixture.
Comment: For the likelihood ratio test, the null hypothesis is always to use the simpler distribution
5.3. C. For a Poisson Distribution, f(x) = e−λλ x/x!. ln f(x) = -λ + xln(λ) - ln(x!).
Loglikelihood is: Σ -λ + xiln(λ) - ln(xi!) = -λE + Nln(λ) - Σln(xi!),

where E = exposures, and N = total # of claims.
Separate estimate of λ for Rural Poisson Distribution, λ = 0.02, same as the method of moments.
The corresponding maximum loglikelihood is:
-(.02)(250000) + 5000ln(.02) - ∑ ln(xi !) = -24560.12 - ∑ ln(xi !).

Rural Rural
Separate estimate of λ for Urban Poisson Distribution, λ = 0.03125.

-(0.03125)(320000) + 10000ln(0.03125) - ∑ ln(xi!) = -44657.36 - ∑ ln(xi!) .

Urban Urban
Restricted by H0 , λU = 1.5λR, the loglikelihood for the combined sample is:
-250,000λR + 5000ln(λR) - 320000(1.5λR) + 10000ln(1.5λR) - Σln(xi!).
Setting the partial derivative with respect to λR equal to zero, and solving:
λ R = (5000 + 10000) / {250000 + (320000)(1.5)} = 0.020548.
λ U = (1.5)(.020548) = 0.030822.
-250,000(.020548) + 5000ln(.020548) - 320000(.030822) + 10000ln(.030822) - Σln(xi!) =

-69220.27 - Σln(xi!).
The unrestricted maximum loglikelihood is:
-24560.12 - ∑ ln(xi !) -44657.36 - ∑ ln(xi!) = -69217.48 - Σln(xi!).

Rural Urban
Twice the difference in the loglikelihoods: (2){-69217.48 - (-69220.27)} = 5.58.
The restriction is one dimensional, so compare to the Chi-Square with one degree of freedom.
Alternately, the unrestricted model has two parameters, while the restricted model has one
parameter; the difference in number of parameters is one, so compare to the Chi-Square with one
degree of freedom.
Since 5.024 < 5.58 < 6.635, we reject H0 at 2.5% and do not reject H0 at 1%.
5.4. E. For each fitted distribution the loglikelihood is: Σ ni ln f(xi) =

889 ln(E0 /1000) + 103 ln(E1 /1000) + 7 ln(E2 /1000) + 1 ln(E3 /1000) =
889 ln(E0 ) + 103 ln(E1 ) + 7 ln(E2 ) + 1 ln(E3 ) - 1000 ln(1000).
Thus the difference of the loglikelihoods for the Negative Binomial and the Geometric is:
889 ln(889.401/892.857) + 103 ln(102.226/95.663) + 7 ln(7.841/10.250) + 1 ln(0.501/1.098) =
0.727. The likelihood ratio test statistic is: (2)(0.727) = 1.454.
Since there is a difference of one parameter, compare to the Chi-Square with one degree of
freedom. 1.454 < 2.706. Do not reject H0 at 10%.
Comment: The loglikelihood for the fitted Negative Binomial, with r = 2.9984 and β = 0.04000, is:
-380.632. The loglikelihood for the fitted Geometric with β = 0.120, is: -381.360.
The null hypothesis is to use the simpler Geometric distribution; the alternative hypothesis is to use
the more complicated Negative Binomial distribution.
5.5. B. Each of the loglikelihoods would be twice as much.

The maximum likelihood fitted parameters would be the same.
Thus now the maximum likelihood Negative Binomial has a loglikelihood of:
(2)(-2602.78) = -5205.56.
Now the maximum likelihood Geometric has a loglikelihood of: (2)(-2604.56) = -5209.12.
The likelihood ratio test statistic = twice the difference in the loglikelihoods =
(2){-5205.56 - (-5209.12)} = 7.12.
The difference in number of parameters is: 2 - 1 = 1.
Comparing to the Chi-Square Distribution with one degrees of freedom:
6.635 < 7.12 < 7.879.
Thus we reject H0 at 1% and not at 0.5%.
5.6. D. For a Geometric Distribution, f(x) = βx / (1+β)x+1. ln f(x) = x ln(β) - (x+1) ln(1+β).
Loglikelihood is: Σ xi ln(β) - (xi+1) ln(1+β) = N ln(β) - N ln(1+β) - E ln(1+β),
where E = exposures, and N = total # of claims.

Separate estimate of β for the Black Geometric Distribution, β = 1200/30,000 = 0.04,
the same as the method of moments.
1200 ln(0.04) - 1200 ln(1.04) - 30,000 ln(1.04) = -5086.337.
Separate estimate of β for the Other Geometric Distribution, β = 2250/75,000 = 0.03.
2250 ln(0.03) - 2250 ln(1.03) - 75,000 ln(1.03) = -10,173.173.
Restricted by H0 , βB = 1.25βO, the loglikelihood for the combined sample is:
1200 ln(1.25βO) - 1200 ln(1 + 1.25βO) - 30,000 ln(1 + 1.25βO) +
2250 ln(βO) - 2250 ln(1 + βO) - 75,000 ln(1 + βO) =
1200 ln(1.25) + 3450 ln(βO) - 31,200 ln(1 + 1.25βO) - 77,250 ln(1 + βO).
Setting the partial derivative with respect to βO equal to zero:
0 = 3450 / βO - 39,000 / (1 + 1.25βO) - 77,250 / (1 + βO).
⇒ 131,250 βO2 + 108,487.5 βO - 3450 = 0. ⇒ βO = 0.030664.

1200 ln(1.25) + 3450 ln(0.030664) - 31,200 ln[1 + (1.25)(0.030664)] - 77,250 ln(1.030664) =
-15,261.073.
The unrestricted maximum loglikelihood is: -5086.337 + (-10,173.173) = -15,259.510.
Twice the difference in the loglikelihoods: (2){(-15,261.073) - (-15,259.510)} = 3.126.
Alternately, the unrestricted model has two parameters, while the restricted model has one
parameter; the difference in number of parameters is one, so compare to the Chi-Square with one
degree of freedom.
Since 2.706 < 3.126 < 3.841, we reject H0 at 10% and do not reject H0 at 5%.
5.7. E. For each fitted distribution the loglikelihood is: Σ ni ln f(xi) =

111 ln(E0 /400) + 135 ln(E1 /400) + 85 ln(E2 /400) + 43 ln(E3 /400) + 17 ln(E4 /400) +
6 ln(E5 /400) + 2 ln(E6 /400) + 1 ln(E7 /400) =
111 ln(E0 ) + 135 ln(E1 ) + 85 ln(E2 ) + 43 ln(E3 ) + 17 ln(E4 ) + 6 ln(E5 ) + 2 ln(E6 ) + ln(E7 ) -
400 ln(400).
Thus the difference of the loglikelihoods for the mixture of two Poissons and the Poisson is:
111 ln(111.454/100.883) + 135 ln(133.527/138.967) + 85 ln(86.771/95.713) +
43 ln(41.971/43.948) + 17 ln(17.180/15.135) + 6 ln(6.251/4.170) + 2 ln(2.043/0.957) +
ln(0.599/0.188) = 2.613.
The likelihood ratio test statistic is: (2)(2.613) = 5.226.
A mixture of Poissons has three parameters: λ1, λ2, and p the weight to the first Poisson.
Since there is a difference of two parameters, (three versus one), we compare to the Chi-Square
with two degrees of freedom. 4.605 < 5.226 < 5.991. ⇒ Reject H0 at 10%, but not at 5%.
Comment: The null hypothesis is to use the simpler Poisson distribution; the alternative hypothesis
is to use the more complicated mixture of two Poissons.
The loglikelihood for the fitted mixture of two Poissons is -612.319.
The loglikelihood for the fitted Poison is -614.932.
The fitted Poisson has λ = sample mean = 1.3775.
5.8. E. For r = 3, the maximum Iikelihood fit is equal to the method of moments.
β = X /r = 0.592 / 3 = 0.19733.
f(0) = 1/1.197333 = 0.58258. f(1) = (3)(0.19733)/1.197334 = 0.28804.
f(2) = {(3)(4) / 2} (0.197332 )/1.19735 = 0.09494.
f(3) = {(3)(4)(5) / 6} (0.197333 ) / 1.19736 = 0.02608.
f(4) = {(3)(4)(5)(6) / 24} (0.197334 ) / 1.197337 = 0.00645.
loglikelihood is:
314 ln[ 0.58258] + 105 ln[0.28804] + 56 ln[0.09494] + 21 ln[0.02608] + 4 ln[0.00645] =
-528.945.
Likelihood Ratio test statistic is: (2){-525.010 - (-528.945)} = 7.870.
The difference in fitted parameters is: 2 - 1 = 1.
For the Chi-Square Distribution with one degree of freedom, the 0.5% critical value is 7.787.
Thus the p-value of the test is about 0.5%.
Comment: Using a computer for the whole calculation, the test statistic is 7.864
and the p-value is 0.504%.
^
5.9. C. X = {(900)(0) + (80)(1) + (15)(2) + (5)(3)} / 1000 = 12.5%. Thus λ = 0.125.
We can think of the Poisson Distribution with λ = 10% as no fitted parameters, and thus a special
case of the Poisson fit via maximum likelihood.
The loglikelihood is: 900 ln[f(0)] + 80 ln[f(1)] + 15 ln[f(2)] + 5 ln[f(3)] =
-900λ + (80){ln(λ) -λ} + (15){2ln(λ) - λ - ln(2)} + (5){3ln(λ) - λ - ln(6)}.
Therefore, the difference between the maximum loglikelihood and that for λ = 0.10 is:
(-900)(0.125 - 0.10) + (80){ln(1.25) - 0.025} + (15){2 ln(1.25) - 0.025} + (5){3 ln(1.25) - 0.025} =
(1000)(-0.025) + 125 ln(1.25) = 2.893.
Thus the Likelihood Ratio Test Statistic is: (2)(2.893) = 5.786.
We are comparing zero and one fitted parameter, so we have one degree of freedom.
5.024 < 5.786 < 6.635.
Thus we reject H0 at 2.5% and not at 1%.
5.10. D. X = {(657)(0) + (233)(1) + (79)(2) + (27)(3) + (3)(4) + (1)(5)}/1000 = 0.489.

Thus maximum Iikelihood Geometric Distribution has β = 0.489.
f(0) = 1/ (1+β) = 0.671592. f(1) = f(0) β / (1 + β) = 0.220556.
f(2) = f(1) β / (1 + β) = 0.072433. f(3) = f(2) β / (1 + β) = 0.023787.
f(4) = f(3) β / (1 + β) = 0.007812. f(5) = f(0) β / (1 + β) = 0.002566.
The corresponding loglikelihood is: 657 ln(0.671592) + 233 ln(0.220556) + 79 ln(0.072433)
+ 27 ln(0.023787) + 3 ln(0.007812) + 1 ln(0.002566) = -942.605.
We are told that the maximum likelihood Negative Binomial has r = 1.55 and β = 0.315.
1 rβ
f(0) = = 0.654132. f(1) = = 0.242874.
(1 + β)r (1 + β)r + 1
2 3
r (r + 1) β r (r + 1) (r + 2) β
f(2) = = 0.074178. f(3) = = 0.0210266.
2 (1 + β)r + 2 6 (1 + β)r + 3
4
r (r + 1) (r + 2)(r + 3) β
f(4) = = 0.005729.
24 (1 + β)r + 4
r (r + 1) (r + 2)(r + 3)(r + 4) β5
f(5) = = 0.001523.
120 (1 + β)r + 5
The corresponding loglikelihood is: 657 ln(0.654132) + 233 ln(0.242874) + 79 ln(0.074178)
+ 27 ln(0.0210266) + 3 ln(0.005729) + 1 ln(0.001523) = -940.354.
Thus the Likelihood Ratio Test Statistic is: (2) {-940.354 - (-942.605)} = 4.502.
5.024 < 4.502 < 6.635.
Thus we reject H0 at 5% and not at 2.5%.
Comment: Using a computer, the probability value is 3.4%.
5.11. E. The likelihood ratio test statistic = twice the difference in the loglikelihoods =
(2)(385.9 - 382.4) = 7.0.
Comment: We would compute the loglikelihoods in each case as Σni lnf(xi):
Number of Number Poisson Contribution to Neg. Binomial Contribution to

Claims of Days Density Loglikelihood Density Loglikelihood
0 209 0.54881 -125.40 0.57534 -115.53
1 111 0.32929 -123.30 0.28957 -137.57
2 33 0.09879 -76.39 0.09800 -76.65
3 7 0.01976 -27.47 0.02778 -25.08
4 3 0.00296 -17.46 0.00711 -14.84
5 2 0.00036 -15.88 0.00170 -12.75
Sum 365 0.99996 -385.91 0.99950 -382.43
The maximum likelihood Poisson is the same as the method of moments:
λ = {(209)(0) + (111)(1) + (33)(2) + (7)(3) + (3)(4) + (2)(5)}/365 = 0.6027.
The Poisson is a limit of Negative Binomials as beta approaches zero; if the null hypothesis is true,
then according to Loss Models the Iikelihood ratio test statistic follows a mixture of Chi-Square
Distributions.
2016-C-5, Fitting Frequency § 6 (a, b, 1) Class, HCM 10/27/15, Page 173
Section 6, Fitting to the (a, b, 1) Class48
Members of the (a, b, 1) family include: all the members of the (a, b, 0) family,
Zero-Truncated Binomial, Zero-Truncated Poisson, Extended Truncated Negative Binomial,
the Logarithmic Distribution, and the corresponding zero-modified distributions.49
As with the members of the (a, b, 0) family, one can fit these distributions to data via Method of
Moments or Maximum Likelihood.
Method of Moments, Zero-Truncated Distributions:
Assume we have the following data on number of persons injured in bodily injury accidents:
Number of People Injured in the Accident: 1 2 3 4&+

Number of Accidents 1256 100 6 0
The mean of this data is: X = (1256 + 200 + 18) / (1256 + 100 + 6) = 1.08223.
The mean of a zero-truncated Poisson is: λ / {1 - f(0)} = λ / (1 - e−λ).
Therefore, using the Method of Moments to fit a zero-truncated Poisson:

λ / (1 - e−λ) = 1.08223. ⇒ 1.08223 - 1.08223e−λ - λ = 0.
One can solve this equation numerically. The result is λ = 16.02%.50
48
49
The (a, b, 1) class is discussed in “Mahlerʼs Guide to Frequency Distributions.”
50
λ = 0 is also a root, but this makes no sense for a zero-truncated Poisson.
Here is a graph of the lefthand side of this equation as a function of lambda:
0.003
0.002
0.001
lambda
0.05 0.1 0.15 0.2
- 0.001
We can see the lefthand side of this equation is zero for about λ = 16%
Since the fitted lambda is small, we can use the approximation: e−λ ≅ 1 - λ + λ2/2 - λ3/6.
Therefore, (1 - e−λ)/λ ≅ 1 - λ/2 + λ2/6.
Therefore, the equation for method of moments becomes: 1 - λ/2 + λ2/6 ≅ 1/1.08223.
Solving this quadratic equation, λ ≅ 16.06%.51
Exercise: We observe 1936 ones, 449 twos, and 37 threes.

Fit via Method of Moments to the above data a zero-truncated Binomial with m = 3.
[Solution: X = {1936 + (2)(449) + (3)(37)}/(1936 + 449 + 37) = 2945/2422.
The mean of the zero-truncated Binomial is: 3q/{1 - (1 - q)3 } = 1/(1 - q + q2 /3).
Set the theoretical mean equal to the sample mean:
1 - q + q2 /3 = 2422/2945. ⇒ 2445q2 - 8835q + 1569 = 0. ⇒ q = 18.96%.
Comment: For larger values of m, one would have to solve numerically.]
Exercise: We observe 1200 ones, 310 twos, 70 threes, 15 fours, and 5 fives.
Fit via Method of Moments to the above data a zero-truncated Geometric.
[Solution: X = {1200 + (2)(310) + (3)(70) + (4)(15) + (5)(5)}/(1200 + 310 + 70+ 15 + 5)
= 2115/1600 = 1.322.
The mean of the zero-truncated Geometric is: β/{1 - 1/(1 + β)} = 1 + β.
Set the theoretical mean equal to the sample mean: β = 0.322.]
51
2.84 is also a root to the approximate equation, but is not a solution to the original equation.
A zero-truncated Negative Binomial has a mean of: rβ / {1 - 1/(1 + β)r}.

Therefore for r fixed, X = rβ / {1 - 1/(1 + β)r}. One could solve numerically for β.
Similarly, one could fit a zero-truncated Negative Binomial with both r and β varying by matching the
first and second moments, and then solving numerically.
Method of Moments, Logarithmic Distribution:
Exercise: We observe 889 ones, 97 twos, 9 threes, 3 fours, and 2 fives.

Fit via Method of Moments to the above data a Logarithmic Distribution.
[Solution: X = {889 + (2)(97) + (3)(9) + (4)(3) + (5)(2)}/(889 + 97 + 9 + 3 + 2)
= 1132/1000 = 1.132. The mean of the Logarithmic Distribution is: β / ln[1 + β].
Set the theoretical mean equal to the sample mean: β = 1.132 ln[1 + β].
We can either solve numerically, or use the approximation that: ln[1 + β] ≅ β - β2/2 + β3/3.
Thus β ≅ 1.132β - 1.132β2/2 + 1.132β3/3. ⇒ β ≅ 0.29.

Comment: Solving numerically, β = 0.2751.]
Maximum Likelihood, Zero-Truncated Distributions:
When applied to individual ungrouped data, for the zero-truncated Poisson,

zero-truncated Binomial with m fixed, and zero-truncated Negative Binomial with r fixed,
Maximum Likelihood is equivalent to Method of Moments.
Exercise: Verify that for a zero-truncated Poisson, fitting to individual ungrouped data is the same for
Maximum Likelihood and the Method of Moments.
[Solution: f(x) = λx e−λ/x!. ln f(x) = x ln(λ) - λ + constants.
The zero-truncated density is: h(x) = f(x)/(1 - e−λ).

∂ ln[h(x)]
ln h(x) = x ln(λ) - λ - ln[1 - e−λ] + constants. = x/λ - 1 - e−λ/(1 - e−λ).
∂λ
Set the partial derivative of the loglikelihood with respect to λ equal to zero:
0= ∑ xi / λ - n - n e−λ/(1 - e−λ). ⇒ X /λ = 1 + e−λ/(1 - e−λ) = 1/(1 - e−λ). ⇒ X = λ/(1 - e−λ).

The righthand side of the equation is the mean of the zero-truncated Poisson.
Thus Maximum Likelihood is the same as the Method of Moments.
Comment: See equation 14.10 in Loss Models, with pM
0 = 0.]
Method of Maximum Likelihood, Logarithmic Distribution:
f(x) = βx/{x (1+β)x ln(1+β)}.

ln f(x) = x lnβ - lnx - x ln(1+β) - ln[ln(1+β)].
Then the loglikelihood is: ∑ xi ln[β] - ∑ ln[xi] - ∑ xi ln[1+β] - n ln[ln(1+β)].

Set the partial derivative of the loglikelihood with respect to β equal to zero:
0= ∑ xi / β - ∑ xi / (1+ β) - n /{ (1+β)ln(1+β)}. ⇒
∑ β(1+β)
xi
= n / {(1+β)ln(1+β)}. ⇒
X = β/ln(1+β).
Since the righthand side is the mean of the Logarithmic Distribution, this is the same as the method of
moments.
When applied to individual ungrouped data, for the Logarithmic Distribution, Maximum
Likelihood is equivalent to Method of Moments.
Method of Maximum Likelihood, Zero-Modified Distributions:
Exercise: Fit a Zero-Modified Poisson to the following data using maximum likelihood.
Number of Claims: 0 1 2 3 4&+
Number of Policies: 18638 1256 100 6 0
[Solution: f(x) = λx e−λ/x!. ln f(x) = x ln(λ) - λ + constants.
The zero-modified density is: h(x) = f(x)(1 - pM −λ

0 )/(1 - e ), x > 0.
ln h(x) = x ln(λ) - λ + ln[1 - pM −λ M

0 ] - ln[1 - e ] + constants. ln h(0) = ln( p 0 ). Then the loglikelihood is:
18638 ln( pM M −λ
0 ) + ln(λ)Σxi - 1362λ + 1362ln[1 - p 0 ] - 1362ln[1 - e ] + constants.
Set the partial derivative of the loglikelihood with respect to pM

0 equal to zero:
0 - 1362/(1 - p 0 ). ⇒ p 0 = 18638/(18638 + 1362) = 18638/20,000 = 93.19%.

18638 / pM M M
Set the partial derivative of the loglikelihood with respect to λ equal to zero:
0 = Σxi/λ - 1362 - 1362e−λ/(1 - e−λ). ⇒ Σxi/λ= 1362/(1 - e−λ). ⇒ Σxi/1362= λ/(1 - e−λ).
This is the same equation as for fitting via maximum likelihood a zero-truncated Poisson to the data
x ≥ 1. (1256 + 200 + 18)/1362 = 1.0822 = λ/(1 - e−λ). ⇒ λ = 16.01%.]
Thus in this example, we would assign to zero the observed probability of zeros, and fit lambda as
one would fit a zero-truncated Poisson to the observations other than zeros. The latter is the same
as the method of moments
In the above example, the mean of the zero-modified distribution is: λ(1 - pM −λ
0 )/(1 - e ).
If we substitute the fitted value pM

0 = 18,638/20,000, then the mean of the zero-modified distribution
is: λ(1362/20,000)/(1 - e−λ).

Thus the equation for the fitted lambda can be rewritten as:
(1362/20,000) Σxi/1362 = λ(1362/20,000)/(1 - e−λ). ⇒ X = λ(1 - p0 )/(1 - e−λ).
M
In other words, we match the observed mean to the mean of the zero-modified distribution.
When applied to individual ungrouped data, for the zero-modified Poisson, zero-modified
Binomial with m fixed, and zero-modified Negative Binomial with r fixed, Maximum
Likelihood is equivalent to assigning to zero the observed proportion of zeros and
matching the mean of the zero-modified distribution to the sample mean.
Set pM
0 = the proportion of zeros in the data.
Variance of Maximum Likelihood Estimates:
As discussed previously, the approximate variance of the estimate of a single parameter using the
method of maximum likelihood is given by negative the inverse of the product of the number of
points times the expected value of the second partial derivative of the log likelihood:
^ ∂2 ln[f(x)]
Variance of θ ≅ -1 / {n E [ ]}.
∂θ2
For the zero-truncated Poisson we had:

ln f(x) = x ln(λ) - λ - ln[1 - e−λ] + constants.
∂ ln[f(x)]
= x/λ - 1 - e−λ/(1 - e−λ) = x/λ - 1 - 1/(eλ - 1).
∂λ
∂2 ln[f(x)]
= -x/λ2 + eλ/(eλ - 1)2 .
∂λ 2
∂2 ln[f(x)]
E[ ] = -E[x]/λ2 + eλ/(eλ - 1)2 = -1/{λ(1 - e−λ)} + e−λ/(1 - e−λ)2
∂λ 2
= (λe−λ - 1 + e−λ)/{λ(1 - e−λ)2 }
^ ∂2 ln[f(x)]
Variance of λ ≅ -1 / {n E [ ]} = λ(1 - e−λ)2 /{n(1 - λe−λ - e−λ)}.
∂λ 2
For example, the following data was fit previously to a zero-truncated Poisson:
Number of People Injured in the Accident: 1 2 3 4&+
^
λ = 16%.
^
Variance of λ = λ(1 - e−λ)2 /{n(1 - λe−λ - e−λ)} =
^
(0.16)(1 - e-0.16)2 / {(1362)(1 - 0.16e-0.16 - e-0.16)} = 0.000223. Standard Deviation of λ = 1.5%.
For the zero-modified Poisson we had:

ln f(x) = x ln(λ) - λ + ln[1 - pM −λ
0 ] - ln[1 - e ] + constants, x > 0.
ln f(0) = ln( pM
0 ).
When we observe n0 values of zero out of sample of size n, the loglikelihood is:
n0 ln( pM M −λ
0 ) + ln(λ)Σxi - (n - n0 )λ + (n - n0 )ln[1 - p 0 ] - (n - n0 )ln[1 - e ] + constants.
∑ xi / λ - (n - n0) - (n - n0)e−λ/(1 - e−λ) = ∑ xi / λ - (n - n0) - (n - n0)/(eλ - 1) .

∂ loglikelihood
=
∂λ
∑ xi / λ2
∂2 loglikelihood
=- + (n - n0 )eλ/(eλ - 1)2 .
∂λ2
The mean of the zero-modified Poisson is: (1 - pM −λ

0 )λ/(1 - e ). Therefore,
∂2 loglikelihood
E[ ] = -n(1 - pM −λ −λ −λ 2
0 )/{λ(1 - e )} + (n - n0 )e /(1 - e ) .
∂λ2
Substituting for pM
0 its estimator n0 /n:
∂2 loglikelihood
E[ ] = (n - n0 )/(λe−λ + e−λ - 1)/{λ(1 - e−λ)2 }.
∂λ2
^ ∂2 loglikelihood
Variance of λ ≅ -1 / E [ ] = λ(1 - e−λ)2 / {(n - n0 )(1 - λe−λ - e−λ)}.52
∂λ2
∂ loglikelihood ∂2 loglikelihood
= n0 / pM M
M 0 - (n - n0 )/(1 - p 0 ). = 0.
∂p 0 ∂pM
0 ∂λ
Therefore, Cov[ λ , p^M

^ 53
0 ] = 0.
∂2 loglikelihood
= -n0 / pM 2 M 2 M
M 0 - (n - n0 )/(1 - p 0 ) . Substituting for p 0 its estimator n0 /n:
∂p0 2
∂2 loglikelihood
= -n2 /n0 - (n - n0 )/(1 - n0 /n)2 = -n2 /n0 - n2 /(n - n0 ) = -n3 /{n0 (n - n0 )}.
∂pM0
2
Variance of p^M ^M ^M
0 ≅ {n0 (n - n0 )}/n = {(n0 /n)(1 - n0 /n)}/n = p 0 (1 - p 0 )/n.
3
This is the same result one would get by noting that each observation has a chance of pM
0 of being
zero and a chance of 1 - pM

0 of not being zero. Thus we have a Bernoulli for a single draw, and the
variance of the average goes down as 1/n.
52
By making use of the loglikelihood rather than the log density, we have used the “observed information” as
53
As discussed in “Mahlerʼs Guide to Fitting Loss Distributions,” the information matrix would have zeros on the
off-diagonal, and thus so would its inverse, the covariance matrix.
Problems:
6.1. (3 points) The following data is for the number of people riding in each car crossing the
Washington Bridge:
Number of People 1 2 3 4 5 6
Number of Cars 200 300 150 60 30 10
Fit a Zero-Truncated Binomial Distribution with m = 6. What is the fitted q?
A. 32% B. 35% C. 38% D. 41% E. 44%
6.2 (4 points) The average number of claims is 0.2.

The proportion of insureds with zero claims is 85%.
Fit via Maximum Likelihood a zero-modified Poisson Distribution to this data.
What is the fitted density at 2?
A. less than 1.5%
E. at least 3.0%
6.3 (2 points) The following data is for the number of pages for the resumes submitted with each of
138 job applications to the 19th Century Insurance Company:
Number of Pages 1 2 3 4 5 6 7 8
Number of Resumes 60 35 19 11 6 4 2 1
Fit a Zero-Truncated Geometric Distribution. What is the fitted β?
A. 1.2 B. 1.3 C. 1.4 D. 1.5 E. 1.6
A. 0.10 B. 0.12 C. 0.14 D. 0.16 E. 0.18
6.5 (3 points) The following data is for the number of people in each of 500 households:
Number of People 1 2 3 4 5 6 7 8 9 10
Number of Households 50 90 110 100 70 45 20 10 4 1
Fit a Zero-Truncated Poisson Distribution. What is the fitted λ?
A. 3.3 B. 3.4 C. 3.5 D. 3.6 E. 3.7
A. 0.06 B. 0.08 C. 0.10 D. 0.12 E. 0.14
6.7. (3 points) Actuaries buy on average 0.2 computers per year.

The proportion of actuaries who do not buy a computer during a given year is 85%.
Fit via Maximum Likelihood a zero-modified Geometric Distribution to this data.
Les N. DeRisk is an actuary.
Using the fitted distribution, what is the probability that Les buys two computers next year?
A. 2.2% B. 2.4% C. 2.6% D. 2.8% E. 3.0%
6.8 (4 points) Verify that for a zero-truncated Binomial with m fixed, fitting to individual ungrouped
data is the same for Maximum Likelihood and the Method of Moments.
6.9 (3 points) Fit the following data for the number of persons injured in bodily injury accidents via
maximum likelihood to a zero-truncated Negative Binomial Distribution with r = 2.
Number of People 1 2 3 4
What is the fitted β?
A. 0.13 B. 0.15 C. 0.17 D. 0.19 E. 0.21
6.10 (3 points) Fit the following data via maximum likelihood to a zero-modified Poisson.
Number of claims 0 1 2 3 4 5
Count 1706 351 408 268 74 5
What is the density at three for the fitted distribution?
A. 6% B. 7% C. 8% D. 9% E. 10%
6.11 (3 points) The following data is for the number of automobiles insured on private passenger
automobile policies:
Number of Automobiles: 1 2 3 4
Number of Policies: 100 75 20 5
Fit the above data to a Logarithmic Distribution.
A. less than 1.2
E. at least 1.5
6.12 (4 points) Verify that for a zero-truncated Negative Binomial with r fixed, fitting to individual
ungrouped data is the same for Maximum Likelihood and the Method of Moments.
6.13 (3 points) The following data is for the number of strokes needed on the windmill hole of the
Gulliver Miniature Golf Course for 1000 golfers:
Number of Strokes 1 2 3 4 5 6
Number of Golfers 100 200 300 250 100 50
You model the number of strokes as a Zero-Truncated Geometric Distribution.
You fit β via maximum likelihood.
What is the coefficient of variation of this estimate?
A. 0.01 B. 0.02 C. 0.04 D. 0.08 E. 0.16
6.14 (3 points) Use the following data on the number of claimants for each of 500 accidents:
Number of Claimants Number of Accidents
1 400
2 70
3 25
4 5
Assume this data set is drawn from a Zero-Truncated Geometric Distribution.
A likelihood ratio test is applied to test whether β = 1/3.
What is the conclusion?
6.1. B. Mean is: {(200)(1) + (300)(2) + (150)(3) + (60)(4) + (30)(5) + (10)(6)}/750 = 2.267.
Set this equal to the mean of the zero-truncated Binomial:
2.267 = 6q/{1 - (1 - q)6 }. ⇒ 1 - (1 - q)6 = 2.647q.
Let x = 1 - q. ⇒ x6 + 2.647(1- x) = 1. ⇒ x6 - 2.647x +1.647 = 0.

Solving numerically x = 0.651 and q = 1 - 0.651 = 0.349.
6.2. E. For the zero-modified Poisson, Maximum Likelihood is equivalent to assigning to zero the
observed proportion of zero and matching the mean of the zero-modified distribution to the sample
mean. pM
0 = 0.85.
0.2 = mean of the zero-modified Poisson = (1 - 0.85)λ / (1 - e−λ) = (0.15)λ / (1 - e−λ).
1 - 0.75λ - e−λ = 0.
e−λ = 1 - λ + λ2/2 - λ3/6 ....
0 ≅ 0.25λ - λ2/2 + λ3/6. ⇒ 0 ≅ 1.5 - 3λ + λ2. ⇒ λ ≅ (3 - 3 )/2 = 0.634.
f(2) ≅ {e−λλ 2/2}(1 - pM −λ

0 )/{1 - e } = (e
-0.634 0.6342 /2)(0.15)/(1 - e-0.634) = 3.4%.
Comment: Solving numerically, λ = 0.606.

f(2) = (e-0.606 0.6062 /2}(0.15)/(1 - e-0.606) = 3.3%.
6.3. A. The mean of a Zero-Truncated Geometric Distribution is: β/{1 - 1/(1+β)} = 1 + β.

The mean of the data is: 307/138 = 2.2246.
Set this theoretical mean equal to the observed mean:
1 + β = 2.2246. ⇒ β = 1.2246.
Comment: Method of Maximum Likelihood gives the same answer as the method of moments.
6.4. C. Second moment of a Zero-Truncated Geometric Distribution is:

{β(1 + β) + β2 }/(1 - 1/(1+β)} = (1 + β)2 + β(1+β).
Therefore, the variance of a Zero-Truncated Geometric Distribution is:
(1 + β)2 + β(1+β) - (1 + β)2 = β(1+β).
^ ^
β = X - 1. ⇒ Var[ β ] = Var[ X ] = Var[X]/N = β(1+β)/N = (1.2246)(2.2246)/138 = 0.01974.
^
StdDev[ β ] = 0.01974 = 0.1405.
Comment: Due to the memoryless property of the Geometric Distribution, the zero-truncated
version has the same variance as the regular version.
6.5. D. The mean of a Zero-Truncated Poisson Distribution is: λ/(1 - e−λ).

λ/(1 - e−λ) = 3.692. Try values of lambda.
For example, 3.7/(1 - e-3.7) = 3.794. 3.6/(1 - e-3.6) = 3.701. 3.5/(1 - e-3.5) = 3.609.
Thus λ = 3.6.
More exactly, λ = 3.590.
6.6. B. The zero-truncated density is: h(x) = f(x)/(1 - e−λ).
ln h(x) = x ln(λ) - λ - ln[1 - e−λ] + constants.

∂ ln[h(x)]
= x/λ - 1 - e−λ/(1 - e−λ) = x/λ - 1 - 1/(eλ - 1).
∂λ
∂2 ln[h(x)]
= -x/λ2 + eλ/(eλ - 1)2 = -x/λ2 + e−λ/(1 - e−λ)2 .
∂λ2
∂2 ln[h(x)]
E[ ] = -{λ/(1 - e−λ)}/λ2 + e−λ/(1 - e−λ)2 = -1/{λ(1 - e−λ)} + e−λ/(1 - e−λ)2 =
∂λ2
-1/{(3.6)(1 - e-3.6) + e-3.6/(1 - e-3.6)2 = -0.2856 + 0.0289 = -0.2567.

^ ∂2 ln[h(x)]
Var[ λ ] = -1/{n E[ ]} = -1/{(500)(-0.2567)} = 0.00779.
∂λ2
^
StdDev[ λ ] = 0.00779 = 0.0883.
6.7. D. For the zero-modified Geometric, Maximum Likelihood is equivalent to assigning to zero
the observed proportion of zero and matching the mean of the zero-modified distribution to the
sample mean. pM
0 = 0.85.
0.2 = mean of the zero-modified Geometric = (1 - pM

0 ) (mean of zero-truncated Geometric) =
(1 - pM
0 ) (1+β) = (0.15)(1 + β).
⇒ β = 0.2/0.15 - 1 = 1/3.
M M
f(2) = p M
2 = (1 - p 0 ) p 2 = (1 - p 0 ) {β /(1+β) } = (0.15) {(1/3)/(4/3) } = 2.8%.
T 2-1 2 2
Comment: Mike Swaim's Les N. DeRisk actuarial cartoon appears in the “Actuarial Digest.”
6.8. f(x) = qx (1-q)m-x m!/(x! (m-x)!). ln f(x) = x ln(q) + (m-x)ln(1-q) + constants.

The zero-truncated density is: h(x) = f(x)/{1 - (1-q)m}.
ln h(x) = x ln(q) + (m-x)ln(1-q) - ln[{1 - (1-q)m}] + constants.
∂ ln h(x) / ∂ q = x/q + (m-x)/(1-q) - m(1-q)m-1/{1 - (1-q)m}.
Set the partial derivative of the loglikelihood with respect to q equal to zero:
0 = Σxi/q + Σ(m-xi)/(1-q) - Σm(1-q)m-1/{1 - (1-q)m}.
0 = n X /q + mn/(1-q) - n X /(1-q) - mn(1-q)m-1/{1 - (1-q)m}. ⇒
0 = X /q + m/(1-q) - X /(1-q) - m(1-q)m-1/{1 - (1-q)m}. ⇒
X /q - X /(1-q) = m/(1-q) - m(1-q)m-1/{1 - (1-q)m}. ⇒
X /{(q)(1-q)} = {m/(1-q)}(1 - (1-q)m/{1 - (1-q)m}) = {m/(1-q)}/{1 - (1-q)m}. ⇒ X = mq/{1 - (1-q)m}.

The righthand side of the equation is the mean of the zero-truncated Binomial.
Comment: See equation 14.11 in Loss Models, with pM
0 = 0.
6.9. A. The mean of a zero-truncated Negative Binomial with r = 2 is: 2β/{1 - 1/(1+β)2 }.
The observed mean is: 1205/100 = 1.205.
Set the theoretical and observed means equal:
1.205 = 2β/{1 - 1/(1+β)2 }. ⇒ 1.205 - 1.205/(1+β)2 = 2β. ⇒ 1.205(1+β)2 - 1.205 = 2β(1+β)2 .
⇒ 2.41β + 1.205β2 = 2β + 4β2 + 2β3. ⇒ 2β2 + 2.795β - 0.41 = 0. .

β = {-2.795 + 2.7952 - (4)(2)(-0.41) }/4 = 0.1339.
6.10. B. Let pM 0 = 1706/2812 = 0.6067.

Then fit via Method of Moments a zero-truncated distribution to the non-zero observations.
λ/(1 - e−λ) = {(351)(1) + (408)(2) + (268)(3) + (74)(4) + (5)(5)}/(351 + 408 + 268 + 74 + 5)
= 2292/1106 = 2.072.
Trying values, for λ = 2, λ/(1 - e−λ) = 2.313.
For λ = 1.7, λ/(1 - e−λ) = 2.080.
For λ = 1.69, λ/(1 - e−λ) = 2.072.
The fitted zero-modified Poisson has: pM

0 = 0.6067, and λ = 1.69.
The density at three is: (1 - 0.6067)(1.693 e-1.69/6)/(1 - e-1.69) = 7.16%.

Comment: We want: λ/(1 - e−λ) = 2.072.
The denominator is less than one, so the function is greater than λ.
Depending on how good your first guess is, it may take you a little longer.
I tried λ = 2, since ignoring the denominator, that would be approximately okay.
One could instead for example start with λ = 1.
In my case, my first guess of λ = 2 resulted in a value that was too big by about 0.3,
so I reduced lambda by 0.3 and tried again.
The guessing part at the end of the solution is not a key skill for your exam.
The mean of the fitted distribution is: (1 - 0.6067)(1.69)/(1 - e-1.69) = 0.815.
This matches the observed mean of: 2292/2812 = 0.815.
Here is a comparison of the fitted distribution to the data:
Number of claims 0 1 2 3 4 5 6+
Observed Count 1706 351 408 268 74 5 0
Fitted Count 1706 423 357 201 85 29 11
6.11. E. X = 330/200 = 1.65.

The mean of the Logarithmic Distribution is: β / ln[1 + β].
Set the theoretical mean equal to the sample mean: β / ln[1 + β] = 1.65.
Try values of β: 1.6/ln[2.6] = 1.674. 1.55/ln[2.55] = 1.656. 1.535/ln[2.535] = 1.650.
β = 1.535.
Comment: Maximum Likelihood equals method of moments.
6.12. f(x) = {βx / (1+β)r+x} r (r+1) ... (r + x - 1)/x!.

ln f(x) = x ln(β) - (r+x)ln(1+β) + constants.
The zero-truncated density is: h(x) = f(x)/{1 - (1+β)-r}.
ln h(x) = x ln(β) - (r+x)ln(1+β) - ln[{1 - (1+β)-r}] + constants.
∂ ln h(x) / ∂ β = x/β - (r+x)/(1+β) - r(1+β)-(r+1)/{1 - (1+β)-r}.

Set the partial derivative of the loglikelihood with respect to β equal to zero:
0 = Σxi/β - Σ(r+xi)/(1+β) - Σr(1+β)-(r+1)/{1 - (1+β)-r}.
0 = n X /β - nr/(1+β) - n X /(1+β) - nr(1+β)-(r+1)/{1 - (1+β)-r}. ⇒
0 = X /β - r/(1+β) - X /(1+β) - r(1+β)-(r+1)/{1 - (1+β)-r}. ⇒
X /β - X /(1+β) = r/(1+β) - r(1+β)-(r+1)/{1 - (1+β)-r}. ⇒
X /{(β)(1+β)} = {r/(1+β)}(1 - (1+β)-r/{1 - (1+β)-r}) = {r/(1+β)}/{1 - (1+β)-r}. ⇒

X = rβ/{1 - (1+β)-r}.
The righthand side of the equation is the mean of the zero-truncated Negative Binomial.
6.13. C. Method of Maximum Likelihood gives the same answer as the method of moments.
The mean of a Zero-Truncated Geometric Distribution is: β/{1 - 1/(1+β)} = 1 + β.
1 + β = 3.2. ⇒ β = 2.2.
^ ^
β = X - 1. Var[ β ] = Var[ X ] = Var[X]/1000 = β(1+β)/1000 = (2.2)(3.2)/1000 = 0.00704.
^ ^
StdDev[ β ] = 0.00704 = 0.0839. Coefficient of variation of β is: 0.0839/2.2 = 0.038.
Alternately, the zero-truncated density is: h(x) = f(x)/{1 - 1/(1+β)} = f(x)(1+β)/β = βx-1/(1+β)x.
ln h(x) = (x-1) ln(β) - x ln(1+β). ∂ ln h(x) / ∂ β = (x-1)/β - x/(1+β).
∂2 ln h(x) / ∂ β2 = -(x-1)/β2 + x/(1+β)2 .
E[∂2 ln h(x) / ∂ β2] = -(E[x] - 1)/β2 + E[x]/(1+β)2 = -β/β2 + (1+β)/(1+β)2 = -1/β + 1/(1+β) =
-1/2.2 + 1/3.2 = -0.014205.
^
Var[ β ] = -1 / {n E[∂2 ln h(x) / ∂ β2]} = -1 / {(1000)(-0.014205)} = 0.00704. Proceed as before.
Comment: The density, mean, and variance of the Zero-Truncated Geometric Distribution are shown
in Appendix B attached to the exam.
6.14. D. The maximum likelihood fit is the same as the method of moments:
1 + β = X = 635/500 = 1.27. ⇒ β = 0.27.
The loglikelihood is: 400 lnf(1) + 70 lnf(2) + 25 lnf(3) + 5 lnf(4) =
400 ln[1/(1+β)] + 70 ln[β/(1+β)2 ]+ 25 ln[β2/(1+β)3 ] + 5 ln[β3/(1+β)4 ] = 135 ln[β] - 635 ln[1+β].
Thus the maximum loglikelihood is: 135 ln[0.27] - 635 ln[1.27] = -328.5357.
The loglikelihood for β = 1/3 is: 135 ln[1/3] - 635 ln[4/3] = -330.9908.
β = 1/3 is a special case of that with β varying; the test has: 1 - 0 = 1 degree of freedom.
Thus the Iikelihood ratio test statistic is: (2){-328.5357 - (-330.9908)} = 4.910.
3.841 < 4.910 < 5.024. Reject at 5%, do not reject at 2.5%.
Comment: Using a computer, the p-value is 2.67%.
2016-C-5, Fitting Frequency § 7 Important Ideas, HCM 10/27/15, Page 189
the exam.
Method of Moments (Section 2):
If one has a single parameter, then one matches the observed mean to the theoretical
mean of the distribution. In the case of two parameters, one matches the first two
moments, or equivalently one matches the mean and variance.
In order to estimate the variance of a single parameter fit by the Method of Moments:
Write the estimated parameter as a function of the observed mean, X , and use the fact that
Var( X ) = Var(X) / n.
Method of Maximum Likelihood (Section 3):
For ungrouped data: Likelihood = Π f(xi). Loglikelihood = Σln f(xi).

Find the set of parameters such that the likelihood or the loglikelihood is maximized.
For the Poisson, Binomial with m fixed, or the Negative Binomial with r fixed, the method
of maximum likelihood is equal to the method of moments.
(Fisherʼs) Information = -n E [∂2 ln f(x) / ∂θ2].

−1 1
For a single parameter, Var[ θ^ ] = = .
nE [∂2 ln f(x) / ∂ θ]
2 the information
The variance of the estimate of a function of a single parameter θ, h(θ), is:

⎛ ∂h ⎞ 2
Var[h(θ)] ≅ ⎜ ⎟ Var[ θ^ ].
⎝ ∂θ ⎠
For grouped data: Likelihood = Π {F(bi ) − F(a i )}ni . Loglikelihood = Σ ni ln(F(bi) - F(ai)).
produces the same result as the Method of Moments.
Chi-Square Test (Section 4):
Chi-Square Statistic is computed as a sum of terms, for each interval one computes:
(observed number - expected number)2 / expected number.
A small Chi-Square Statistic indicates a good match between the data and the
distribution.

1. Determine the groups to use in computing the Chi-Square statistic. Unless the exam
question has specifically told you which groups to use, use the groups for the data
given in the question.
The degrees of freedom gives the proper row; find the columns that bracket the Chi-Square
Statistic and then reject to the left and do not reject (accept) to the right.
The p-value is the value of the Survival Function of the Chi-Square Distribution.
A large p-value indicates a good fit.
If applying the Chi-Square Goodness of Fit Test to data with total claims and exposures by year,
then the number of degrees of freedom is the number of years minus the number of fitted
parameters, and χ2 = ∑(nk - E k)2 / Vk .
k
Likelihood Ratio Test (Section 5):
The Likelihood Ratio Test proceeds as follows:

1. One has two distributions, one with more parameters than the other, both fit to the same data via
Maximum Likelihood.
2. One of the distributions is a special case.
3. One computes twice the difference in the loglikelihoods.
4. One compares the result of step 3 to the Chi-Square Distribution, with a number of degrees of
5. One draws a conclusion as to whether the more general distribution fits significantly better than its
special case. H0 is that the distribution with fewer parameters is appropriate.
Fitting to the (a, b, 1) Class (Section 6):
When applied to individual ungrouped data, for the zero-truncated Poisson,

zero-truncated Binomial with m fixed, zero-truncated Negative Binomial with r fixed, and
Logarithmic Distribution, Maximum Likelihood is equivalent to Method of Moments.
When applied to individual ungrouped data, for the zero-modified Poisson, zero-modified
Binomial with m fixed, and zero-modified Negative Binomial with r fixed, Maximum
Likelihood is equivalent to assigning to zero the observed proportion of zeros and
matching the mean of the zero-modified distribution to the sample mean.
Set pM
0 = the proportion of zeros in the data.
Mahlerʼs Guide to
Fitting Loss Distributions
Exam C
prepared by
Study Aid 2016-C-6
Howard Mahler
hmahler@mac.com
2016-C-6, Fitting Loss Distributions, HCM 10/22/15, Page 1
Mahlerʼs Guide to Fitting Loss Distributions

The Fitting Loss Distribution concepts on Exam C from Loss Models,

by Klugman, Panjer, and WiIlmot, are demonstrated.
Information in bold and sections whose titles are in bold, are more important to pass your exam.
Larger bold type indicates it is extremely important. Information presented in italics (and sections
whose titles are in italics) should not be needed to directly answer exam questions and should be
skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to
be useful in practical applications.

Note that problems include both some written by me and some from past exams1. The latter are
copyright by the Casualty Actuarial Society and the SOA and are reproduced here solely to aid
students in studying for exams.2
Greek letters used in Loss Models:

α = alpha, β = beta, γ = gamma, θ = theta, λ = lambda, µ = mu, σ = sigma, τ = tau
β = beta, used for the Beta and incomplete Beta functions.
Γ = Gamma, used for the Gamma and incomplete Gamma functions.
Φ = Phi, used for the Normal distribution. φ = phi, used for the Normal density function.
Π = Pi is used for the continued product just as Σ = Sigma is used for the continued sum
1
In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus. In some cases
2
The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for
their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you
in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to
produce quality exams.

2 7-8 Ungrouped Data
3 9 Grouped Data
4 10-12 The Modeling Process
5 13-46 Ogives and Histograms
B 6 47-133 Kernel Smoothing
7 134-144 Estimation of Percentiles
8 145-178 Percentile Matching
C 9 179-227 Method of Moments
10 228-327 Fitting to Ungrouped Data by Maximum Likelihood
11 328-351 Fitting to Grouped Data by Maximum Likelihood
D 12 352-412 Chi-Square Test
13 413-447 Likelihood Ratio Test
E 14 448-491 Hypothesis Testing
15 492-501 Schwarz Bayesian Criterion
16 502-566 Kolmogorov-Smirnov Test, Basic
F 17 567-598 Kolmogorov-Smirnov Test, Advanced
18 599-622 p-p Plots
19 623-651 Anderson-Darling Test
G 20 652-658 Percentile Matching Applied to Truncated Data
21 659-670 Method of Moments Applied to Truncated Data
22 671-725 Maximum Likelihood Applied to Truncated Data
H 23 726-742 Single Parameter Pareto Distribution, Data Truncated from Below
24 743-769 Fitting to Censored Data
25 770-794 Fitting to Data Truncated and Censored
I 26 795-864 Properties of Estimators
27 865-881 Variance of Estimates, Method of Moments
28 882-916 Variance of Estimated Single Parameters, Maximum Likelihood
29 917-946 Information Matrix and Covariance Matrix
J 30 947-1012 Variance of Functions of Maximum Likelihood Parameters
31 1013-1026 Non-Normal Confidence Intervals
32 1027-1038 Minimum Modified Chi-Square
33 1039-1052 Important Ideas & Formulas
Exam 4 Questions by Section of this Study Aid3 4
Section Sample 5/00 11/00 5/01 11/01 11/02 11/03 11/04 5/05 11/05 11/06 5/07
1
2
3
4
5 26 33 35
6 4 20 22 9 24 16
7 2 2 2
8 32 39 37 2 30 3 1 24, 28
9 8 36 2 39 33 8 24 14 24 21 10
10 6 34 16 30 40 10 34 6
11 37 23 33
12 9 10 33 5
13 28 25 14
14
15 10 22
16 23 11 12 17 22, 38 1, 19 34 20
17
18 6 5 31
19
20
21
22 34 10 26 32 31
23 21 18
24 22 7 40 6 24 31 5
25 18 27 1
26 14 31 40 16 28
27
28 22 18
29 13 18
30 33 25 34 25 9, 10 14 34
31
32 30
The CAS/SOA did not release the 5/02, 5/03, 5/04, 5/06, 11/07 and subsequent exams.
I have put past exam questions in the study guide and section where they seem to fit best.
However, exam questions often rely on material in more than one section or more than one study
guide. Therefore, one should use this chart to direct your study efforts with appropriate caution.
2014 Sample Exam question #300 is in Section 6.

3
4
Some former Exam 4 questions that cover more basic material are in “Mahlerʼs Guide to Loss Distributions.”
2016-C-6, Fitting Loss Distributions §1 Introduction, HCM 10/22/15, Page 4
The material in this study guide uses the ideas in “Mahlerʼs Guide to Loss Distributions.”
It is strongly recommended you review the important idea section of that study guide before
proceeding any further.
It would also be worthwhile to look through again that portion of Appendix A of Loss Models that will
be attached to your exam.
In this study guide are discussed a number of related topics on Fitting Loss Distributions:
1. Graphical Techniques to Display or Smooth Data:

Ogives, Histograms, Kernel Smoothing, p-p plots
2. Methods of Fitting Distributions:

Percentile Matching, Method of Moments, Maximum Likelihood
3. Statistical Tests of Fits:

Chi-Square Goodness-of-Fit, Likelihood Ratio Test, Schwarz Bayesian Criterion,
Kolmogorov-Smirnov
4. Properties of Estimators:
Section 26
5. Variances of Estimates Derived From Fitting Distributions:

Sections 27-30
Loss Distributions as per Loss Models
Distribution Distribution Probability Density

Name Function Function
Exponential 1 - e-x/θ e-x/θ / θ
⎛ θ⎞ α α θα
Single Parameter Pareto 1 - ⎜ ⎟ , x > θ.
⎝ x⎠ xα + 1
⎛ x ⎞τ ⎡ ⎛ x ⎞ τ⎤
τ ⎜ ⎟ exp ⎢-⎜ ⎟ ⎥
⎡ ⎛ x ⎞ τ⎤ ⎝ θ⎠ ⎣ ⎝ θ⎠ ⎦
Weibull 1 - exp⎢-⎜ ⎟ ⎥
⎣ ⎝ θ⎠ ⎦ x
(x / θ)α e- x/ θ x α -1 e - x / θ
Gamma Γ[α ; x/θ] =
LogNormal
⎡ ln(x) − µ ⎤
Φ⎢
exp -[( ln(x) − µ)2
2σ2 ]
⎣ ⎥⎦
σ x σ 2π
Pareto 1 - ⎜ ⎟
⎝ θ + x⎠ (θ + x ) α + 1
⎛x ⎞2
θ⎜ - 1⎟
⎛x ⎞ ⎛x ⎞ exp - [⎝µ ⎠
]
[
Inverse Gaussian Φ ⎜ − 1⎟
⎝µ ⎠
θ
x
] [
+ e2θ / µ Φ − ⎜ + 1⎟
⎝µ ⎠
θ
x
] θ
2π
2x
x1 .5
θα e - θ / x
Inverse Gamma 1 - Γ[α ; θ/x]
xα + 1 Γ[α]
Moments of Loss Distributions as per Loss Models
Distribution
Name Mean Variance Moments
Exponential θ θ2 n! θn
αθ α θ2 α θn
Single Parameter Pareto , α> n
α −1 (α − 1)2 (α − 2) α−n
Weibull θ Γ[1 + 1/τ] θ2 {Γ([1 + 2/τ] − Γ[1 + 1/τ]2 } θn Γ[1 + n/τ]
Gamma αθ αθ2
n−1
Γ[α + n]
θn ∏(α + i) = θn (α )...(α + n -1) = θn
i=0 Γ[α]
LogNormal exp[µ + σ2/2] exp[2µ + σ2] (exp[σ2] -1) exp[nµ + n2 σ2/2]
θ α θ2 n! θn n! θ n
Pareto = , α> n
α −1 (α − 1)2 (α − 2) n
(α − 1)...(α − n)
∏ (α − i)
i=1
2θ n
Inverse Gaussian µ µ3/θ eθ/µ µ Kn - 1/2 (θ/µ)
µπ
θ θ2 θn θn
Inverse Gamma = ,α>n
α −1 (α − 1)2 (α − 2) n
(α − 1)...(α − n)
∏ (α − i)
i=1
2016-C-6, Fitting Loss Distributions §2 Ungrouped Data, HCM 10/22/15, Page 7
Section 2, Ungrouped Data
There are 130 losses of sizes:
300 37,300 86,600 150,300 423,200

400 39,500 88,600 171,800 437,900
2,800 39,900 91,700 173,200 442,700
4,500 41,200 96,600 177,700 457,800
4,900 42,800 96,900 183,000 463,000
5,000 45,900 106,800 183,300 469,300
7,700 49,200 107,800 190,100 469,600
9,600 54,600 111,900 209,400 544,300
10,400 56,700 113,000 212,900 552,700
10,600 57,200 113,200 225,100 566,700
11,200 57,500 115,000 226,600 571,800
11,400 59,100 117,100 233,200 596,500
12,200 60,800 119,300 234,200 737,700
12,900 62,500 122,000 244,900 766,100
13,400 63,600 123,100 253,400 846,100
14,100 66,400 126,600 261,300 852,700
15,500 66,900 127,300 261,800 920,300
19,300 68,100 127,600 273,300 981,100
19,400 68,900 127,900 276,200 988,300
22,100 71,100 128,000 284,300 1,078,800
24,800 72,100 131,300 316,300 1,117,600
29,600 79,900 132,900 322,600 1,546,800
32,200 80,700 134,300 343,400 2,211,000
32,500 83,200 134,700 350,700 2,229,700
33,700 84,500 135,800 395,800 3,961,000
34,300 84,600 146,100 406,900 4,802,200
Each individual value is shown, rather than the data being grouped into intervals.
The type of data shown here is called individual or ungrouped data.
Some students will find it helpful to put this data set on a computer and follow along with the
computations in the study guide to the best of their ability.5 The best way to learn is by doing.
5
Even this data set is far bigger than would be presented on an exam. In many actual applications, there are many
thousands of claims, but such a large data set is very difficult to present in a Study Aid. It is important to realize that
with modern computers, actuaries routinely deal with such large data sets. There are other situations where all that is
available is a small data set such as presented here.
2016-C-6, Fitting Loss Distributions §2 Ungrouped Data, HCM 10/22/15, Page 8
This ungrouped data set is used in many examples throughout this study guide:
300, 400, 2800, 4500, 4900, 5000, 7700, 9600, 10400, 10600, 11200, 11400, 12200, 12900,
13400, 14100, 15500, 19300, 19400, 22100, 24800, 29600, 32200, 32500, 33700, 34300,
37300, 39500, 39900, 41200, 42800, 45900, 49200, 54600, 56700, 57200, 57500, 59100,
60800, 62500, 63600, 66400, 66900, 68100, 68900, 71100, 72100, 79900, 80700, 83200,
84500, 84600, 86600, 88600, 91700, 96600, 96900, 106800, 107800, 111900, 113000,
113200, 115000, 117100, 119300, 122000, 123100, 126600, 127300, 127600, 127900,
128000, 131300, 132900, 134300, 134700, 135800, 146100, 150300, 171800, 173200,
177700, 183000, 183300, 190100, 209400, 212900, 225100, 226600, 233200, 234200,
244900, 253400, 261300, 261800, 273300, 276200, 284300, 316300, 322600, 343400,
350700, 395800, 406900, 423200, 437900, 442700, 457800, 463000, 469300, 469600,
544300, 552700, 566700, 571800, 596500, 737700, 766100, 846100, 852700, 920300,
981100, 988300, 1078800, 1117600, 1546800, 2211000, 2229700, 3961000, 4802200
2016-C-6, Fitting Loss Distributions §3 Grouped Data, HCM 10/22/15, Page 9
Section 3, Grouped Data
Unlike the ungrouped data in Section 2, often one is called upon to work with data grouped into
intervals.6 In this example, both the number of losses in each interval and the dollars of loss on those
losses are shown. Sometimes the latter information is missing or sometimes additional information
may be available.
Interval ($000) Number of Losses Total of Losses in the Interval ($000)

0-5 2208 5,974
5 -10 2247 16,725
10-15 1701 21,071
15-20 1220 21,127
20-25 799 17,880
25-50 1481 50,115
50-75 254 15,303
75-100 57 4,893
100 - ∞ 33 4,295
SUM 10,000 157,383
The estimated mean is $15,738.
As will be seen, in some cases one has to deal with grouped data in a somewhat different manner
than ungrouped data. With modern computing power, the actuary is usually better off working
with the data in an ungrouped format if available. The grouping process discards valuable
information. The wider the intervals, the worse is the loss of information.
6
Note that in this example, for simplicity I have not made a big deal over whether for example the 10-15 interval
includes 15 or not. In many real world applications, in which claims cluster at round numbers, that can be important.
2016-C-6, Fitting Loss Distributions §4 Modeling Process, HCM 10/22/15, Page 10
Section 4, The Modeling Process and Parameters of Distributions
Actuaries construct and use many mathematical models of real world situations of interest.
Model selection is based on a balance between fit to the observed data and simplicity.
Six Steps of the Modeling Process:7
Loss Models lists six steps to the modeling process.8
1. Model Choice: Choose one or more models to investigate.
2. Model Calibration: Fit the model to the data.9
3. Model Validation: Using statistical tests or other techniques to determine if the fit(s) are
good enough to use.
4. Other Models: Possibly add additional models, and in that case return to step 1.
5. Model Selection: Select which model to use.
6. Modify for the Future: Make any changes needed to the selected model, so that it is
appropriate to apply to the future.10
7
See Section 1.1.1 of Loss Models, not on the syllabus. As with all such general lists of steps, any real world
application may be more closely or less closely approximated by this list. One would not always go through an
elaborate procedure, particularly if one desires a rough estimate of something that will only be used once.
8
The actuary makes use of his prior knowledge and experience. Prior to “step one”, the actuary should understand
the purpose of the model, read any relevant actuarial literature, talk to his colleagues, and investigate what
data/information is available or can be obtained.
9
The actuary should examine the quality and reasonableness of any data before it is used.
10
For example, one might need to take into account inflation.
An Example of the Modeling Process:11
An actuary is interested in estimating excess ratios12 for Workers Compensation Insurance in

Massachusetts.13
0. The available data is examined and it is determined that several years of Unit Statistical
Plan data for Massachusetts at third, fourth, and fifth report, will be appropriate to use.
This data has already been used for other purposes and therefore has already been
checked for reasonableness and validity.
1. The mean, Coefficient of Variation, Skewness, and Kurtosis are calculated for each year of data.
Mean excess losses are also examined.
Based on this information, several heavier tailed distributions such as the LogNormal
and Pareto are investigated.
2. These models are fit via maximum likelihood to the data.14
3. These fits are tested via the Kolmogorov-Smirnov Statistic and are all rejected.15
4. Various mixtures are considered.16
1. Several 2-point mixtures are chosen.
2. These models are fit via maximum likelihood to the data.
3. These fits are tested via the Kolmogorov-Smirnov Statistic and are all rejected.
4. Splices are considered.17
11
A somewhat simplified version of what I did to develop the method described in “Workers Compensation Excess
Ratios: An Alternative Method of Estimation,” by Howard C. Mahler, PCAS 1998.
12
The excess ratio is one minus the loss elimination ratio.
13
These excess ratios will be used to determine Excess Loss Factors used in Retrospective Rating.
14
See “Mahlerʼs Guide to Statistics.” How to fit via maximum likelihood will be discussed subsequently.
15
The Kolmogorov-Smirnov Statistic will be discussed subsequently.
16
Mixtures are discusses in a subsequent section.
17
Splices are discusses in “Mahlerʼs Guide to Loss Distributions.”
1. Several splices between the empirical distribution and continuous distributions are chosen.
3. These fits are tested via the Kolmogorov-Smirnov Statistic and are all rejected.
4. A combination of mixtures and splices are considered.
1. Several splices between the empirical distribution and 2-point mixtures are chosen.
3. These fits are tested via the Kolmogorov-Smirnov Statistic.
5. A splice between the empirical distribution and a 2-point mixture of an Exponential and a
Pareto is selected.18
6. Incorporate the effect of inflation and law amendments.
7. The model was then compared to more recent data not used in the selection process,
but otherwise similar to the data used in the selection process.
The selected model displayed a good fit to this more recent data.
18
The actual method uses something similar in concept to a splice, but not a traditional splice.
2016-C-6, Fitting Loss Distributions §5 Ogives & Histograms, HCM 10/22/15, Page 13
Section 5, Ogives and Histograms19
There are several graphical techniques one can use to display size of loss data.
Ogives:
An ogive is an approximate graph of the Distribution function.

We assume a uniform distribution on each interval, as discussed previously.
The ogive is made up of straight line segments.
For example, for the grouped data in Section 3, an ogive would look something like this:20
Ogive of Grouped Data in Section 3 .9656 .9910 .9967
.8175
.7376
.6156
.4455
.2208
0 5 10 15 20 25 50 75 100
Accident Size ($000)
Note that each of the points is connected by a straight line segment.21 There is no specific loss size
at which the distribution function reaches unity.22 The ogive is not unique, since it is an approximate
graph of the Distribution function.23 In this example, one could draw an ogive that connected fewer of
the points.
19
20
Note that there are 33 accidents larger than $100,000. There is no unique way to represent this since the interval
stretches to infinity.
21
For example, F(10) = .4455 and F(15) = .6156, so there is a straight line from (10,.4455) to (15,.6156).
22
Since for this set of ungrouped data, the last interval extends to infinity.
23
For ungrouped data, the more detailed information allows more possible choices of ogives.
Connect by straight lines the points: (xi, empirical distribution function at xi) .24
Exercise: What is the height of the above ogive at $30,000?

[Solution: (80%)(.8175) + (20%)(.9656) = .8471.]
Histograms:
A histogram is an approximate graph of the probability density function.

We assume a uniform distribution on each interval, as discussed previously.
For the grouped data in Section 3, a histogram would look as follows:25
.0000442 .0000449 Histogram of Grouped Data in Section 3
.0000340
.0000244
.0000160
.0000059
.0000010 .0000002
0 5 10 15 20 25 50 75 100
Accident Size ($000)
For example, for the interval from $25,000 to $50,000 of length $25,000, there are 1481 losses out
of a total of 10,000, so that the height is: (1481 / 10000) / 25000 = 0.0000059.
One has to remember to divide by both the total number of losses, 10,000, as well as the width of
the interval 25,000, so that the p.d.f. will integrate to unity. In other words, the total area under the
histogram should equal unity.
# losses in the interval

The height of each rectangle = .26
(total # losses) (width of interval)
The histogram is the derivative of the ogive.
24
In Definition 11.8 in Loss Models, Fn (x) = Fn (cj-1) (cj - x)/(cj - cj-1) + Fn (cj) (x - cj-1)/(cj - cj-1), cj-1 ≤ x ≤ cj.
25
Note that there are 33 accidents larger than $100,000. There is no unique way to represent this as a probability
density, since the interval stretches to infinity.
26
In Definition 11.9 in Loss Models, fn (x) = {Fn (cj) - Fn (cj-1)}/(cj - cj-1) = nj/{n(cj - cj-1)}, cj-1 ≤ x < cj,
where each interval is closed at its bottom and open at its top.
Exercise: Draw a histogram of the following grouped data: 0 -10: 6, 10-20: 11, 20-25: 3.
[Solution: The heights are: 6/{(20)(10)} = 0.03, 11/{(20)(10)} = 0.055, and 3/{(20)(5)} = 0.03.
0.055
0.03
10 20 25
Comparing a Histogram to a Continuous Distribution:27
It can sometimes be useful to compare a histogram to a continuous distribution.

For example, here is a comparison between the histogram of the grouped data in Section 3, and an
Exponential Distribution with θ = 15,636:28
Prob.
0.00006
0.00005
0.00004
0.00003
0.00002
0.00001
$000
20 40 60 80 100
This Exponential Distribution is not a good match to this data.29

27
Figure 16.2 of Loss Models is a comparison to a histogram. Figures 16.1 and 16.3 are comparisons to the empirical
distribution function. Graphs of the difference between the empirical distribution and a continuous distribution are
very useful and are discussed in a subsequent section.
28
As discussed in a subsequent section, this is the maximum likelihood Exponential Distribution fit to this data.
29
As discussed in a subsequent section, one can perform a Chi-Square Test.
It turns out a Burr Distribution with parameters α = 3.9913, θ = 40,467, and γ = 1.3124, is a good
match to this data.30 Here is a comparison between the histogram of the grouped data in Section 3,
and this Burr Distribution:
Prob.
0.00005
0.00004
0.00003
0.00002
0.00001
$000
20 40 60 80 100
Variances:31
In the histogram of the grouped data from Section 3, the height at $12,000 is .00003402.
Thus we have an estimated density at $12,000 of 0.00003402. This was calculated as follows:
(# of losses in interval 10,000 to 15,000)/{(total # of losses)(15000 - 10000)} =
1701/{(10000)(5000)} = 0.00003402.
The number of losses observed in the interval 10,000 to 15,000 is random. Assuming the number
of such losses is Binomial with m = 10,000 and q = 1701/10000 = 0.1701, it has variance:
(10000)(0.1701)(1 - 0.1701) = (10,000)(0.1412).
The estimate of the density is the number divided by: (10,000)(5000). Thus the estimate of the
density has variance: (10000)(0.1412)/{(10000)(5000)}2 = 0.1412 / {(10000)(50002 )}.
The estimate of the density has standard deviation of: 0.00000075.
30
As discussed in a subsequent section, this is the maximum likelihood Burr Distribution fit to this data.
For the Burr Distribution, F(x) =1 - (1/(1+(x/θ)γ))α, and f(x) = αγ(x/θ)γ(1+(x/θ)γ)−(α + 1) /x.
As discussed in a subsequent section, one can perform a Chi-Square Test.
31
In general, an estimate of the density from the histogram, fn (x), has variance:32
(losses in interval / N) {1 - (losses in interval / N)}
.
N (width of the interval)2
Exercise: Given the following grouped data:

0 -10: 6, 10-20: 11, 20-25: 3.
Based on the histogram, what is the estimated density at 18 and what is the variance of that
estimate?
[Solution: f20(18) = 11 / {(20)(10)} = .055.
The variance of this estimate is: (11/20)(1 - 11/20) / {20 (102 )} = 0.00012375.
Comment: Thus an approximate 95% confidence interval is: 0.055 ± 0.022.]
In the ogive of the grouped data from Section 3, the height at $30,000 is:
(80%)(0.8175) + (20%)(0.9656) = 0.8741. This is an estimate of the Distribution Function at
$30,000. Since the number of losses observed less than $25,000 and less than $50,000 are each
random, there is a variance of this estimate.
Let A = # of losses less than $25,000 and let B = # of losses from $25,000 to $50,000.
Then we can write the estimate as: Fn (30000) = (80%)(A/N) + (20%)(A+B)/N.

Var[Fn (30000)] = {0.82 Var[A] + 0.22 Var[A + B] + (2)(0.2)(0.8)Cov[A, A + B]} / N2 .
A is assumed Binomial with q = 0.8175. Var[A] = (0.8175)(1 - 0.8175)N = 0.1492N.

B is assumed Binomial with q = 0.9656 - 0.8175 = 0.1481.
Var[B] = (0.1481)(1 - 0.1481)N = 0.1262N.
A + B is assumed Binomial with q = 0.9656. Var[A + B] = (0.9656)(1 - 0.9656)N = 0.0332N.
Var[A + B] = Var[A] + Var[B] + 2Cov[A, B]. ⇒ Cov[A, B] = (Var[A + B] - Var[A] - Var[B])/ 2 =

(0.0332N - 0.1492N - 0.1262N)/2 = -0.1211N.
In fact, A and B are jointly multinomial distributed, with covariance:33
-N(A/N)(B/N) = -N(0.8175)(0.1481) = -0.1211N.
Cov[A, A + B] = Var[A] + Cov[A, B] = 0.1492N - 0.1211N = 0.0281N.

Therefore, Var[Fn (30000)] = {0.82 (.1492N) + 0.22 (0.0332N) + (2)(0.2)(0.8)(0.0281N)} / N2 =
0.1058/N = 0.1058/10,000 = 0.00001058.
32
In general, the variance of the estimated probability covered by an interval is:
(Probability in the interval) (1 - Probability in the interval) / N. The estimated density is this estimated probability
divided by the width of the interval; its variance is divided by the width squared.
33
See for example, A First Course in Probability by Sheldon Ross.
In general, if x is in the interval from ai to bi, then

Fn (x) = Fn (ai) (bi - x)/(bi - ai) + Fn (bi) (x - ai)/(bi - ai).
Var[Fn (ai)] = Fn (ai)Sn (ai)/N. Var[Fn (bi)] = Fn (bi)Sn (bi)/N.
Cov[Fn (ai), Fn (bi)] = Cov[A/N, (A+B)/N] = Var[A]/N2 + Cov[A, B]/N2 =

N Fn (ai)Sn (ai)/N2 - N Fn (ai){Fn (bi) - Fn (ai)}/N2 = Fn (ai)Sn (bi)/N.
Var[Fn (x)] = {Var[Fn (ai)](bi - x)2 + Var[Fn (bi)](x - ai)2 + 2Cov[Fn (ai), Fn (bi)](bi - x)(x - ai)}/(bi - ai)2 .
Var[Fn (x)] = {Fn (ai)Sn (ai)(bi - x)2 + Fn (bi)Sn (bi)(x - ai)2 + 2Fn (ai)Sn (bi)(bi - x)(x - ai)}/{N (bi - ai)2 }.
Note that since Fn (x) + Sn (x) = 1, Var[Sn (x)] = Var[Fn (x)].
Exercise: Given the following grouped data:

0 -10: 6, 10-20: 11, 20-25: 3.
Based on the height of the ogive at 18 and what is the variance of that estimate?
[Solution: F20(18) = (0.2)(6/20) + (0.8)(17/20) = (0.2)(0.3) + (0.8)(0.85) = 0.74.
The variance of this estimate is:
{0.22 (0.3)(0.7) + 0.82 (0.85)(0.15) + (2)(0.2)(0.8)(0.3)(0.15)} / 20 = 0.00522.
Alternately, F(18) = (X + 0.8Y)/20, where X = number in interval from 0 to 10, and Y is the
number in the interval from 10 to 20.
Var[X] = (20)(6/20)(14/20) = 4.2. Var[Y] = (20)(11/20)(9/20) = 4.95.
Cov[X, Y] = -(20)(6/20)(11/20) = -3.3.
Var[F(18)] = Var[X + 0.8Y]/400 = {Var[X] + 0.82 Var[Y] + (2)(0.8) Cov[X,Y]} / 400 =
{4.2 + (0.64)(4.95) + (1.6)(-3.3)} / 400 = 0.00522.
Comment: Thus an approximate 95% confidence interval is: 0.74 ± 0.14.]
Comparing Ogives:
One can compare ogives.34
The following data is taken from “Rating by Layer of Insurance,” by Ruth E. Salzmann, PCAS 1963.
For four different classes of building, shown are the number of fire losses of size less than or equal to
a given percent of value of the building.
For example, for Frame Protected Buildings, 4636 out of 4862 fire claims resulted in damage of
10% or less of the value of the building.
Percent Frame Brick Frame Brick

of Value Protected Protected Unprotected Unprotected
0.1 546 210 169 54
0.2 1157 398 383 120
0.3 1659 561 547 155
0.4 2041 670 662 191
0.5 2338 762 733 218
0.6 2610 840 811 237
0.7 2833 916 867 248
0.8 3003 964 902 257
0.9 3151 998 937 272
1 3310 1047 968 280
2 3981 1243 1095 323
3 4256 1307 1170 344
4 4388 1330 1203 349
5 4474 1344 1217 351
6 4520 1353 1224 353
7 4554 1361 1237 356
8 4585 1370 1239 356
9 4605 1373 1240 358
10 4636 1381 1254 362
20 4730 1400 1272 366
30 4767 1406 1280 370
40 4794 1411 1287 372
50 4810 1415 1294 373
60 4818 1421 1298 374
70 4828 1424 1300 374
80 4837 1427 1305 374
90 4843 1428 1308 375
100 4862 1432 1333 378
34
See Exercise 11.6 of Loss Models. I find comparing ogives to usually be of little value in practical applications.
Exercise: Draw ogives comparing the protected to the unprotected distributions.

Put both axis on a log scale.
[Solution: For each distribution, we need to divide the given numbers by the total.
For example, for frame protected, at 1% of value, the empirical distribution function is:
3310/4862 = 0.685.
Here is a comparison of Frame Protected (solid) to Frame Unprotected (dashed):
Distrib.
0.7
0.5
0.3
0.2
0.15
% of Value
0.1 0.5 1 5 10 50 100
Here is a comparison of Brick Protected (solid) to Brick Unprotected (dashed):
Distrib.
0.7
0.5
0.3
0.2
0.15
% of Value
0.1 0.5 1 5 10 50 100
While the Protected and Unprotected distributions appear to differ, it is not very clear.]
Difference graphs, discussed subsequently, would be a better way to display this comparison.
Salzmann graphically compares the percent of total loss costs from losses of size less than or
equal to a certain value, which is much better at showing any differences between the
distributions than any comparison of ogives. The unprotected buildings have a much larger
percent of total loss costs from large losses than do the protected buildings, something that is
not visible from the ogives.
Problems:
Use the following grouped data for each of the next six questions:
Range($) # of losses loss ($000)
0-100 6300 300
100-200 2350 350
200-300 850 200
300-400 320 100
400-500 110 50
over 500 70 50
10000 1050
5.1 (1 point) What is the value of the histogram at $230?

A. less than 0.0009
E. at least 0.0012
A. 0.000002 B. 0.000005 C. 0.00001 D. 0.00002 E. 0.00003
5.3 (1 point) What is the value of the ogive at $120?

A. 64% B. 66% C. 68% D. 70% E. 72%
A. less than 0.0030
E. at least 0.0045
5.5 (1 point) Use the ogive in order to estimate the 90th percentile of the size of loss distribution.
A. 230 B. 240 C. 250 D. 260 E. 270
5.6 (2 points) Use the ogive in order to estimate the probability of a loss being of size between 60
and 230.
A. 50% B. 51% C. 52% D. 53% E. 54%
5.7 (2 points) 100 losses are observed in intervals:

cj-1 cj nj
0 1 20
1 2 15
2 5 25
5 25 40
Let f100(x) be the histogram corresponding to this data.
Determine f100(0.5) + f100(1.5) + f100(2.5) + f100(5.5).
A. 0.41 B. 0.43 C. 0.45 D. 0.47 E. 0.49
5.8 (8 points) The following data is taken from "Comprehensive Medical Insurance - Statistical
Analysis for Ratemaking" by John R. Bevan, PCAS 1963. Shown are the empirical distribution
functions of the severity of loss at selected values. (The data has been grouped into intervals) The
data is truncated from below by a $25 deductible.
Size of Male Female
Loss Employees Employees Spouse Child
$49 19.3% 15.4% 14.2% 17.2%
99 47.6 39.4 38.5 38.2
199 64.3 59.7 59.7 70.5
299 73.5 72.3 71.2 80.9
399 80.1 80.3 78.6 87.9
499 85.0 85.6 83.1 91.4
999 91.4 95.2 93.8 96.7
1999 95.1 98.3 96.7 99.0
2999 97.0 99.2 98.3 99.2
3999 98.4 99.5 98.8 99.5
4999 98.6 99.6 99.0 99.7
6667 99.1 99.8 99.5 99.9
7499 99.4 99.9 99.5 99.9
10000 100.0 100.0 100.0 100.0
There are four separate distributions shown, based on the person incurring the medical expenses.
(They were based on the following numbers of claims: 955, 1291, 915, 994).
Draw ogives for each of these distributions, with however the x-axis on a log scale.
Does it appear as if some or all of this data came from the same distribution?
Use the following grouped data for each of the next 6 questions:
A random sample of 500 losses is distributed as follows:
Loss Range Frequency
[0, 10] 150
(10, 25] 90
(25, 100] 260
5.9 (1 point) What is the value of the histogram at 30?

A. 0.003 B. 0.004 C. 0.005 D. 0.006 E. 0.007
A. 0.0002 B. 0.0003 C. 0.0004 D. 0.0005 E. 0.0006
5.11 (1 point) What is the value of the ogive at 15?

A. 34% B. 35% C. 36% D. 37% E. 38%
A. 0.02 B. 0.03 C. 0.04 D. 0.05 E. 0.06
5.13 (1 point) Use the ogive in order to estimate the 70th percentile of the size of loss distribution.
A. less than 50
E. at least 65
5.14 (2 points) Use the ogive in order to estimate the probability of a loss being of size between
13 and 42.
A. 20% B. 22% C. 24% D. 26% E. 28%
5.15 (4 points) You are given the following data on the size of 954 physician professional liability
claims, censored from above at 100,000:
0-1000 234
1001-5000 416
5001-10,000 134
10,001-25,000 101
25,001-50,000 36
50,001-100,000 33
Use an ogive to estimate the variance of the size of claims limited to 40,000, Var[X ∧ 40,000].

• Twelve losses have been recorded as follows:
1050, 1100, 1300, 1500, 1900, 2100, 2200, 2400, 3000, 3200, 4100, 4400.
• An ogive and histogram have been fitted to this data grouped using endpoints:
1000, 2500, 5000.
5.16 (1 point) Determine the height of the corresponding relative frequency histogram at
x = 4000.
A. 0.00009 B. 0.00010 C. 0.00011 D. 0.00012 E. 0.00013
5.17 (1 point) Using the ogive, what is the estimate of the 81st percentile of the distribution function
underlying the empirical data?
A. 3000 B. 3200 C. 3400 D. 3600 E. 3800
5.18 (2 points) Using the ogive, estimate the hazard rate at 3000, h(3000).
A. 0.0001 B. 0.0002 C. 0.0003 D. 0.0004 E. 0.0005
5.19 (4 points) The following data is from the mortality study of Edmond Halley published in 1693.
x 0 5 10 15 20 25 30
S(x) 1 0.710 0.653 0.622 0.592 0.560 0.523
x 35 40 45 50 55 60 65
S(x) 0.481 0.436 0.387 0.335 0.282 0.232 0.182
x 70 75 80 85
S(x) 0.131 0.078 0.034 0
Using this data, draw a histogram.
1) Two random variables are independent if their correlation coefficient is zero.
2) An ogive is an estimate of a sample's underlying continuous probability density function.
3) For any random variable X with distribution function F(x), Y = F(X) has a uniform
probability density function.
A. 2 B. 3 C. 1, 2 D. 1, 3 E. 2, 3
5.21 (4, 5/88, Q.51) (1 point) The ogive H(x) was fit to empirical data. H(x) consists of line
segments. and the values of H(x) at the endpoints of these segments are defined below:
x H(x) x H(x)
1 0.0 20 0.6
5 0.2 25 0.7
10 0.3 27 0.8
15 0.4 30 1.0
17 0.5
Using the ogive H(x), what is the estimate of the 55th percentile of the distribution function
underlying the empirical data?
A. Less than 18.0
E. 18.9 or more
5.22 (2, 5/88, Q.21) (1.5 points) Observations are drawn at random from a continuous distribution.
The following histogram is constructed from the data.
Frequency per unit x

0.3
0.2
0.1
Oberservation Values (x)

1 2 3 4 5 6
Which set of frequencies could yield this histogram for the intervals 0 < x ≤ 3, 3 < x ≤ 5, and
5 < x ≤ 6, respectively?
A. 1; 2; 3 B. 3; 2; 1 C. 3; 3; 3 D. 3; 4; 3 E. 3; 7; 10
1. There is one and only one ogive that fits a given empirical distribution.
2. The Central Limit Theorem applies only to continuous distributions.
3. If the absolute deviation is used as the loss function for a Bayesian point estimate,
then the resulting estimator is the median of the posterior distribution.
A. 1 B. 2 C. 3 D. 1, 3 E. 2, 3
5.24 (160, 11/89, Q.6) (2.1 points) The observed number of failures in each week are:
Week: 1 2 3 4 5
Failures: 3 2 3 1 1
A histogram is constructed, with 5 intervals, one for each week.
The probability density function at the midpoint of week 3 is estimated from this histogram.
Calculate the estimated variance of this estimated probability density function.
(A) 0.016 (B) 0.021 (C) 0.024 (D) 0.035 (E) 0.048
5.25 (4, 5/91, Q.34) (2 points) The following relative frequency histogram depicts the expected
distribution of policyholder claims. The policyholder pays the first $1 of each claim, the insurer pays
the next $9 of each claim, and the reinsurer pays the remaining amount if the claim exceeds $10.
What is the insurer's average payment per (non-zero) payment by the insurer? Assume the claims
are distributed uniformly in each interval, (with probability densities of .10 from 0 to 3, .20 from 3 to
5, and .03 from 5 to 15.)
A. 3.7 B. 3.9 C. 4.1 D. 4.3 E. 4.5
0.20
0.19
0.18
0.17
0.16
0.15
Probability Density
0.14
0.13
0.12
0.11
0.10
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00 0 2 4 6 8 10 12 14 16
Size of Policyholder Loss
5.26 (4B, 5/93, Q.31) (2 points)

The following 20 wind losses, recorded in millions of dollars,occurred in 1992:
1, 1, 1, 1, 1, 2, 2, 3, 3, 4
6, 6, 8, 10, 13, 14, 15, 18, 22, 25
To construct an ogive H(x), the losses were segregated into four ranges:
(0.5, 2.5), (2.5, 8.5), (8.5, 15.5), (15.5, 29.5).
Determine the values of the probability density function h(x), corresponding to H(x), for the values
x1 = 4 and x2 = 10.
A. h(x1 ) = 0.300, h(x2 ) = 0.200
B. h(x1 ) = 0.050, h(x2 ) = 0.050
C. h(x1 ) = 0.175, h(x2 ) = 0.050
D. h(x1 ) = 0.500, h(x2 ) = 0.700
E. h(x1 ) = 0.050, h(x2 ) = 0.029
Nine observed losses have been recorded in thousands of dollars and are grouped as follows:
Interval [0,2) [2,5) [5,∞)
Number of claims 2 4 3
Determine the value of the relative frequency histogram (p.d.f) for those losses at x = 3.
A. Less than 0.15
E. At least 0.45
5.28 (4B, 5/95, Q.1) (1 point) 50 observed losses have been recorded in millions and grouped
by size of loss as follows:
Size of Loss (X) Number of Observed Losses
( 0.5, 2.5] 25
( 2.5, 10.5] 10
( 10.5, 100.5] 10
(100.5, 1000.5] 5
__
50
What is the height of the relative frequency histogram, h(x), at x = 50?
A. Less than 0.05
E. At least 0.20
• Ten losses (X) have been recorded as follows:
1000, 1000, 1000, 1000, 2000, 2000, 2000, 3000, 3000, 4000.
• An ogive, H(x), has been fitted to this data using endpoints for the connecting
line segments with x-values as follows:
x = c0 = 500, x = c1 = 1500, x = c2 = 2500, x = c3 = 4500
Determine the height of the corresponding relative frequency histogram, h(x), at x = 3000.
A. 0.00010 B. 0.00015 C. 0.00020 D. 0.00025 E. 0.00030
5.30 (4B, 5/99, Q.30) (1 point)

The derivative of an ogive is an estimate of which of the following functions?
A. Probability density function
B. Cumulative distribution function
C. Limited expected value function
D. Mean residual life function
E. Loss function
5.31 (1, 11/00, Q.17) (1.9 points) A stock market analyst has recorded the daily sales revenue for
two companies over the last year and displayed them in the histograms below.
Numberof Occurences
CompanyA
92.5 96.5 98.5 100 101.5 103.5 107.5
Daily SalesRevenue
Numberof Occurences
CompanyB
92.5 96.5 98.5 100 101.5 103.5 107.5
Daily SalesRevenue
The analyst noticed that a daily sales revenue above 100 for Company A was always
accompanied by a daily sales revenue below 100 for Company B, and vice versa.
Let X denote the daily sales revenue for Company A and let Y denote the daily sales
revenue for Company B, on some future day.
Assuming that for each company the daily sales revenues are independent and identically
distributed, which of the following is true?
(A) Var(X) > Var(Y) and Var(X + Y) > Var(X) + Var(Y).
(B) Var(X) > Var(Y) and Var(X + Y) < Var(X) + Var(Y).
(C) Var(X) > Var(Y) and Var(X + Y) = Var(X) + Var(Y).
(D) Var(X) < Var(Y) and Var(X + Y) > Var(X) + Var(Y).
(E) Var(X) < Var(Y) and Var(X + Y) < Var(X) + Var(Y).
5.32 (4, 5/05, Q.26 & 2009 Sample Q.195) (2.9 points)
You are given the following information regarding claim sizes for 100 claims:
0 - 1,000 16
1,000 - 3,000 22
3,000 - 5,000 25
5,000 - 10,000 18
10,000 - 25,000 10
25,000 - 50,000 5
50,000 - 100,000 3
over 100,000 1
Use the ogive to estimate the probability that a randomly chosen claim is between 2,000 and
6,000.
(A) 0.36 (B) 0.40 (C) 0.45 (D) 0.47 (E) 0.50
5.33 (4, 11/05, Q.33 & 2009 Sample Q.243) (2.9 points)
For 500 claims, you are given the following distribution:
[0, 500) 200
[500, 1,000) 110
[1,000, 2,000) x
[2,000, 5,000) y
[5,000, 10,000) ?
[10,000, 25,000) ?
[25,000, ∞) ?
You are also given the following values taken from the ogive:
F500(1500) = 0.689
F500(3500) = 0.839
Determine y.
(A) Less than 65
(E) At least 80
(i) A random sample of payments from a portfolio of policies resulted in the following:
Interval Number of Policies
(0, 50] 36
(50, 150] x
(150, 250] y
(250, 500] 84
(500, 1000] 80
(1000, ∞) 0
Total n
(ii) Two values of the ogive constructed from the data in (i) are:
Fn (90) = 0.21, and Fn (210) = 0.51
Calculate x.
(A) 120 (B) 145 (C) 170 (D) 195 (E) 220
5.1. A. In the interval 200 to 300 of length 100, there are 850 claims out of a total of 10000,
so that the density function is (850 / 10000) / 100 = 0.00085.
5.2. E. Variance = (losses in interval / N){1 - (losses in interval / N)}/{N (width of the interval)2 }
(0.085)(1 - 0.085) / {10000(1002 )}. Standard Deviation = 0.0000279.
5.3. C. F(100) = 0.6300 and F(200) = 0.8650, thus by linear interpolation the ogive at 120 is:
(0.8)(0.6300) + (0.2)(0.8650) = 0.6770.
5.4. D. Variance =
{0.82 (0.63)(1 - 0.63) + 0.22 (0.865)(1 - 0.865) + (2)(0.8)(0.2)(0.63)(1 - 0.865) }/ 10,000.
Standard Deviation = 0.00426.
5.5. B. F(200) = 0.8650 and F(300) = 0.9500. Thus the estimate of the 90th percentile is
between 200 and 300. We want: 0.90 = (0.8650)(300 - x)/100 + (0.9500)(x - 200)/100.
⇒ 90 = 259.5 - 0.8650x + 0.95x - 190. ⇒ x = 20.5/0.085 = 241.2.
Check: (0.588)(0.8650) + (0.412)(0.9500) = 0.900.
5.6. B. F(200) = 0.8650 and F(300) = 0.9500, thus by linear interpolation the ogive at 230 is:
(0.7)(0.8650) + (.3)(0.9500) = 0.8905.
F(0) = 0 and F(100) = 0.6300, thus by linear interpolation the ogive at 60 is:
(0.4)(0) + (0.6)(0.6300) = 0.3780.
Prob[between 60 and 230] = 0.8905 - 0.3780 = 0.5125.
5.7. C. fn (x) = nj/{n(cj - cj-1)}, cj-1 ≤ x < cj.

f100(0.5) = 20/{(100)(1 - 0)} = 0.20. f100(1.5) = 15/{(100)(2 - 1)} = 0.15.
f100(2.5) = 25/{(100)(5 - 2)} = 0.0833. f100(5.5) = 40/{(100)(25 - 5)} = 0.02.
f100(0.5) + f100(1.5) + f100(2.5) + f100(5.5) = 0.20 + 0.15 + 0.0833 + 0.02 = 0.4533.
Comment: Here is a graph of the histogram:
0.2
0.15
0.0833
0.02
1 2 5 25
5.8. Male Employees:

1
0.8
0.6
0.4
0.2
0
50 100 500 1000 5000 10000
Female Employees:
1
0.8
0.6
0.4
0.2
0
50 100 500 1000 5000 10000
Spouses:
1
0.8
0.6
0.4
0.2
0
50 100 500 1000 5000 10000
Children:
1
0.8
0.6
0.4
0.2
0
50 100 500 1000 5000 10000
The incidence of smaller size claims is greater for children than for adults. All of the adult ogives look
somewhat similar to me. However, if one looks carefully, each of these distributions is somewhat
different than the others.
Comment: Similar to Exercise 11.6 in Loss Models. A portion of the data in my question is in Table
16.21 and analyzed in Example 16.17 in Loss Models. In the early 1960s, spouses of employees
who were covered under the employers health insurance plan were all or almost all female.
Some of the differences between the adult distributions may be due to age. Comparing ogives
is not a very useful way for most of us to distinguish between somewhat similar distributions.
One could compare the difference of distribution functions. For example, here is a graph of the
difference between the male and female employee distributions:
0.075
0.05
0.025
50 100 500 1000 5000 10000

- 0.025
-0 . 0 5
- 0.075
- 0.1
5.9. E. In the interval 25 to 100 of length 75, there are 260 claims out of a total of 500,
so that the density function is: (260 / 500) / 75 = 13/1875 = 0.00693.
Comment: A graph of the histogram:
3
100
3
250
13
1875
10 25 100

5.10. B. Variance = =
(52%)(1 - 52%) / {(500)(752 )}. Standard Deviation = 0.0002979.

Alternately, the probability in the interval is estimated as: 260/500 = 52%.
The variance of the estimated probability in the interval is: (52%)(1 - 52%) / 500 = 0.0004992.
The estimate of the histogram is: (probability in the interval) / 75.
Thus the variance of the estimate the density at 30 is: 0.0004992 / 752 .
5.11. C. F(10) = 150/500 = 0.30, and F(25) = 240/500 = 0.48, thus by linear interpolation the
ogive at 15 is: (2/3)(0.30) + (1/3)(0.48) = 0.36.
Comment: A graph of the ogive:
1
12
25
3
10
10 25 100
5.12. A. The estimate is: (2/3)Fn (10) + (1/3) Fn (25).

Therefore, its variance is:
(2/3)2 Var[Fn (10)] + (1/3)2 Var[Fn (25)] + (2)(2/3)(1/3) Cov[Fn (10), Fn (25)] =
(4/9) (0.3)(1 - 0.3)/500 + (1/9) (0.48)(1 - 0.48)/500 + (4/9) (0.3)(1 - 0.48)/500 = 0.0003808.
Alternately, in general, if x is in the interval from ai to bi, then
Var[Fn (x)] = {Fn (ai)Sn (ai)(bi - x)2 + Fn ( bi)Sn ( bi)(x - ai)2 + 2Fn (ai)Sn ( bi)(bi - x)(x - ai)}/{N (bi - ai)2 }.
Variance = {(0.30)(1 - 0.30)102 + (0.48)(0.52)52 + (2)(0.30)(0.52)(10)(5)}/ {(500) 152 } =

0.0003808. Standard Deviation = 0.01951.
5.13. C. F(25) = 0.48 and F(100) = 1. Thus the estimate of the 70th percentile is between 25 and
100. We want: 0.70 = (0.48)(100 - x)/75 + (1)(x - 25)/75.
⇒ 52.5 = 48 - 0.48x + x - 25. ⇒ x = 29.5/0.52 = 56.73.
Check: (0.48)(100 - 56.73)/75 + (1)(56.73 - 25)/75 = 0.700.
5.14. D. F(10) = 150/500 = 0.30, and F(25) = 240/500 = 0.48.

Thus by linear interpolation the ogive at 13 is: (0.3)(12/15) + (0.48)(3/15) = 0.336.
F(25) = 240/500 = 0.48, and F(100) = 1.
Thus by linear interpolation the ogive at 42 is: (0.48)(58/75) + (1)(17/75) = 0.5979.
Prob[between 13 and 42] = 0.5979 - 0.336 = 0.2619.
5.15. An Ogive assumes that the data is uniform on each interval.

Of the 36 claims in the interval from 25,001 to 50,000, we assume 40% are of size 40,000 or more;
there are (60%)(36) = 21.6 claims in the interval from 25,001 to 40,000,
and (40%)(36) = 14.4 claims in the interval from 40,001 to 50,000.
Thus there are total of 33 + 14.4 = 47.4 claims of size more than 40,000.
E[X ∧ 40,000] =
{(234)(500) + (416)(2500) + (134)(7500) + (101)(17,500) + (21.6)(32,500) + (47.4)(40000)} /
954 = 7060.
For each interval from a to b, the second moment of the uniform is: (b3 - a3 ) / {(b-a)(3)}.
0 1000 234 500 333,333
1000 5000 416 3,000 10,333,333
5000 10000 134 7,500 58,333,333
10000 25000 101 17,500 325,000,000
25000 40000 21.6 32,500 1,075,000,000
40000 40000 47.4 40,000 1,600,000,000
954 7,060 151,025,507
Var[X ∧ 40,000] = 151,025,507 - 70602 = 101,181,907.
Comment: Data summarized from Table 3 of Sheldon Rosenbergʼs discussion of “On the Theory
of Increased Limits and Excess of Loss Pricing”, PCAS 1977.
The final interval includes all of the large losses that have been limited to 40,000.
5.16. E. The first interval 1000 to 2500 includes 8/12 = 2/3 of the losses, while the second interval
contains 4/12 = 1/3 of the losses. 4000 is in the second interval, and in the second interval the
histogram has height (1/3)/(5000-2500) = 0.000133.
5.17. D. The first interval 1000 to 2500 includes 8/12 = 2/3 of the losses, while the second interval
contains 4/12 = 1/3 of the losses. Thus the second line segment of the Ogive goes
from (2500, 2/3) to (5000, 1). It has a slope of (1/3)/(5000 - 2500).
Thus for y = .81, x = 2500 + (.81 - .6667)/{(1/3)/(5000 - 2500)} = 3575.
5.18. E. The second line segment of the Ogive goes from (2500, 2/3) to (5000, 1).
Thus f(3000) = (1 - 2/3) / (5000 - 2500) = 1/7500.
F(3000) = (2/3)(4/5) + (1)(1/5) = 11/15.
h(3000) = f(3000) / S(3000) = (1/7500) / (4/15) = 0.0005.
5.19. Take the difference of survival functions to get the probability in each interval.
For example, the height of the rectangle from 5 to 10 years old is: (0.710 - 0.653) = 1.14%.
Prob.
0.05
0.04
0.03
0.02
0.01
Age
10 20 30 40 50 60 70 80
Comment: This was the first published mortality study.
5.20. B. 1. False. Two random variables are independent if and only if their joint probability density
function is the product of their individual probability density functions. If X and Y are independent,
then E[XY] = E[X]E[Y], and thus both the covariance = E[XY] - E[X]E[Y] and the correlation =
Covar[X,Y] / Var[X] Var[Y] are zero. However, the converse is not true.
There are cases where the correlation is zero yet X and Y are dependent.
2. False. An Ogive is an estimate of the cumulative distribution function. A Histogram is an estimate
of the probability density function.
3. True. F(x) is uniformly distributed on the interval [0,1].
5.21. C. We wish to estimate the point at which F(x) = 0.55. F(17) = 0.5 and F(20) = 0.6.
Linearly interpolating we estimate F(18.5) = 0.55.
5.22. D. The area of the rectangles are: (3)(0.1) = 0.3, (2)(0.2) = 0.4, and (1)(0.3) = 0.3.
This is consistent with a frequency of 3, 4, 3, which has probabilities of 0.3, 0.4, 0.3.
5.23. C. 1. False. One can choose different ways to group the data, which results in different xi at
which you graph points, which produces somewhat different looking ogives. 2. False. The Central
Limit Theorem applies to either discrete or continuous distributions. The sum of many independent,
identically distributed variables (with finite mean and variance) approaches a Normal Distribution.
3. True.
5.24. B. The estimate of f(3) is the height of the histogram at 3: 3/{(1)(10)} = .3

Variance of this estimate is: (.3)(1 - .3)/10 = 0.021.
5.25. C. Size of Loss Amount Paid by Insurer

x<1 0
1 ≤ x ≤ 10 x-1
x > 10 9
The average amount per loss is:
10 3 5 10
∫1 (x - 1) f(x) dx +9S(10) = ∫1 (x- 1) f(x) dx + ∫3 (x- 1) f(x) dx + ∫5 (x - 1) f(x) dx + (9)(0.15)

3 5 10
∫1 ∫3
= (0.10) (x- 1) dx + (0.20) (x- 1) dx + (0.03) ∫5 (x - 1) dx + 1.35
= (0.10)(2) + (0.20)(6) + (0.03)(32.5) + 1.35 = 3.725.
The average payment per non-zero payment is: 3.725 / 0.9 = 4.139.
Comment: The area under the histogram is: (.10)(3) + (2)(.20) + (10)(.03) = 30% + 40% + 30% =
100%. Thus this is indeed a probability density function. The average size of loss equals 5.05. The
average amount paid by an insured per loss is .95. The average amount paid by the reinsurer per
loss (whether or not the reinsurer makes a payment) is .375.
Note that .95 + 3.725 + .375 = 5.05.
5.26. E. The ogive is an approximation to the distribution function; the question asked for the
corresponding probability density function or histogram. x1 = 4 is in the second interval of width 6.
There are 6 out of 20 claims in this interval. Therefore, h(x1 ) = (6/20)/6 = 0.050.
x2 = 10 is in the third interval of width 7. There are 4 out of 20 claims in this interval.
Therefore h(x2 ) = (4/20)/7 = 0.029.
5.27. A. The interval [2, 5) of length 3 has 4 claims out of the total of 9 claims.
Thus the empirical p.d.f. at x = 3 is (4/3)/9 = 0.148.
5.28. A. The histogram is an approximation to the Probability Density Function (p.d.f.).

The interval that contains claims of size 50 has 10 claims out of 50 claims, 20% of the total.
This interval has a width of 100.5 - 10.5 = 90. So the histogram is .2 / 90 = 0.0022.
5.29. B. The interval 2500 to 4500 has a width of 2000. Since, 3 out of 10 observed claims are in
this interval, the probability covered by this interval is: 3/10. Thus the height of the histogram in this
interval is (3/10)/2000 = 0.00015.
5.30. A. Since the Ogive is an approximate cumulative Distribution Function, its derivative is an
approximate probability density function.
5.31. E. Company Aʼs share price X is less dispersed about the mean share price of 100 than
Company Bʼs share price Y. ⇒ Var(X) < Var(Y).
A daily sales revenue above 100 for Company A was always accompanied by a daily sales
revenue below 100 for Company B, and vice versa. ⇒ Corr[X, Y] < 0. ⇒ Cov[X, Y] < 0.
⇒ Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X,Y) < Var(X) + Var(Y).

5.32. B. (22/2 + 25 + 18/5)/100 = 0.396.

The empirical distribution function is 0.16 at 1000, 0.38 at 3000, 0.63 at 5000, and 0.81 at 10000.
Therefore, the height of the ogive at 2000 is: (0.16 + 0.38)/2 = 0.27.
The height of the ogive at 6000 is: (4/5)(0.63) + (1/5)(0.81) = 0.666.
Prob[2000 < X < 6000] = 0.666 - 0.27 = 0.396.
Comment: The ogive is an approximate distribution function.
Here is this ogive, shown up to 10,000:
0.8
0.6
0.4
0.2
2000 4000 6000 8000 10000

The desired probability is the difference of the height of the ogive at 6000, 0.666, and the height of
the ogive at 2000, 0.270.
5.33. E. At 1000 the empirical distribution function is: (200 + 110)/500 = 310/500.
At 2000 the empirical distribution function is: (310 + x)/500.
At 5000 the empirical distribution function is: (310 + x + y)/500.
Therefore, linearly interpolating, the height of the ogive at 1500 is:
(310/500)(0.5) + {(310 + x)/500}(0.5) = (310 + 0.5x)/500.
Similarly, the height of the ogive at 3500 is:
{(310 + x)/500}(0.5) + {(310 + x + y)/500}(0.5) = (310 + x + 0.5y)/500.
(310 + 0.5x)/500 = 0.689. ⇒ 310 + 0.5x = 344.5. ⇒ x = 69.
(310 + x + 0.5y)/500 = 0.839 ⇒ 310 + x + 0.5y = 419.5. ⇒ y = 219 - 2x = 81.

Comment: Given two outputs, solve for two missing inputs.
The ogive is a series of straight lines between the values of the empirical distribution function at the
endpoints of the intervals.
5.34. A. Fn (50) = 36/(200 + x + y). Fn (150) = (36 + x)/(200 + x + y).
Fn (90) = 0.6 Fn (50) + 0.4 Fn (150) = (36 + 0.4x)/(200 + x + y).
Fn (250) = (36 + x + y)/(200 + x + y).
Fn (210) = 04 Fn (150) + 0.6 Fn (250) = (36 + x + 0.6y)/(200 + x + y).
0.21 = (36 + 0.4x)/(200 + x + y). ⇒ (0.21)(200 + x + y) = 36 + 0.4x. ⇒ 0.19x - 0.21y = 6.
0.51 = (36 + x + 0.6y)/(200 + x + y). ⇒ (0.51)(200 + x + y) = 36 + x + 0.6y.
⇒ 0.49x + 0.09y = 66.
(6)(0.09) + (66)(0.21)
Solving these 2 equations in 2 unknowns: x = = 120,
(0.19)(0.09) + (0.49)(0.21)
and y = {(0.19)(120) - 6}/0.21 = 80.

2016-C-6, Fitting Loss Distributions §6 Kernel Smoothing, HCM 10/22/15, Page 47
Section 6, Kernel Smoothing35
As discussed previously, the empirical distribution function and corresponding empirical

model assigns probability 1/n to each of n observed values. For example, with the following
observations: 81, 157, 213, the probability function (pdf) of the corresponding empirical model is:
p(81) = 1/3, p(157) = 1/3, p(213) = 1/3.36
In a Kernel Smoothing Model, such a discrete model is smoothed using a “kernel” function. We
create a continuous random variable that is an approximation to the discrete empirical model.
Examples are the uniform kernel, the triangular kernel, and the gamma kernel.
In each case, the mean of the smoothed model is the same as the original empirical mean.
Uniform Kernel:
The simplest case uses the uniform kernel. In general, for a uniform kernel of bandwidth b, we have a
uniform distribution from yj - b to yj + b, centered at each of the data points yj:37
height 1/ 2b
width 2b
Then one weights these uniform distributions together in order to get the smoothed model. In the
case of the uniform kernel, the wider the bandwidth of each uniform distribution, the more smoothing.
One could smooth the above empirical model using a uniform kernel with for example a bandwidth
of 50. Rather than a point mass of probability at each data point, we spread the probability over an
interval. For example, the 1/3 point mass at 81 is spread over the interval from 81 - 50 = 31 to
81 + 50 = 131, of width (2)(50) = 100 centered at 81.
For the uniform kernel, the kernel density is: ky(x) = 1 / (2b), y - b ≤ x ≤ y + b.38
So for example, for a bandwidth of 50, for the kernel centered at 157,
k157(x) = 1/100, 107 ≤ x ≤ 207, and zero elsewhere.
The Uniform Kernel smoothing model is:
Uniform[31, 131] / 3 + Uniform[107, 207] / 3 + Uniform[163, 263] / 3.
35
36
Note that the empirical model depends only on the data set, and does not depend on which type of kernel we use
for smoothing.
37
“Bandwidth” here differs from the the somewhat similar concept of the “span” as using in the method of rounding.
38
Note that the endpoints are included in the uniform kernel.
It has pdf of: 0 for x < 31, 1/300 for 31 ≤ x < 107, 2/300 for 107 ≤ x ≤ 131,
1/300 for 131 < x < 163, 2/300 for 163 ≤ x ≤ 207, 1/300 for 207 < x ≤ 263, 0 for x > 263.39
The three separate uniform kernels, centered at 81, 157 and 213, look as follows:
0.01
0.008
0.006
0.004
0.002
50 100 150 200 250 300
0.01
0.008
0.006
0.004
0.002
50 100 150 200 250 300
0.01
0.008
0.006
0.004
0.002
50 100 150 200 250 300
39
Each uniform has density of 1/100. The empirical model is 1/3 at each of the observed points. So each
contribution is 1/300. The first uniform starts to contribute at 31. So starting at 31 we have 1/300.
The second uniform starts to contribute at 107. So starting at 107 we have: 1/300 + 1/300 = 2/300.
The first uniform stops contributing after 131. So after 131 we have 1/300, etc.
Note that each uniform kernel is discontinuous at its endpoints.
This uniform kernel smoothed density is an average of the individual uniform kernels, and looks as
follows:40
0.006
0.005
0.004
0.003
0.002
0.001
50 100 150 200 250 300
Note that the kernel smoothed density has jump discontinuities at: 31, 107, 131, 163, 207, and 263.
One could apply smoothing using the uniform kernel to larger data sets in a similar manner.
Exercise: What is the density at 95,000 of the kernel smoothed density for the ungrouped data in
Section 2, using a uniform distribution with a bandwidth of 5,000?
[Solution: The loss of size 91,700 contributes: Uniform Distribution[86,700, 96,700]/130.
The loss of size 96,600 contributes: Uniform Distribution[91,600, 101,600]/130.
The loss of size 96,900 contributes: Uniform Distribution[91,900, 101,900]/130.
Thus the density at 95,000 of the kernel smoothed density is: (3/130)/10,000 = 0.000002308.
Comment: The bandwidth is 5000. Since these are the only three loss sizes within 5000 of 95,000,
these are the only three that contribute to the kernel smoothed density at 95,000.
The ungrouped data in Section 2 has 130 losses.]
In general, the larger the bandwidth, the more smoothing. In practical applications, one wants to
smooth out the noise (random fluctuation) and retain the signal (useful information). An actuary will try
several different bandwidths, and choose one that appears to have an appropriate balance of these
two competing goals, resulting in a kernel smoothed density that hopefully is a good approximation
to the density from which the data were drawn.
40
In general we would weight the kernels together, with for example a point that appeared twice in the data set
getting twice the weight.
For example, shown out to 200,000, here is 100,000 times the kernel smoothed density for the
ungrouped data in Section 2, using a uniform kernel with a bandwidth of 5,000:41
density
1.0
0.8
0.6
0.4
0.2
size
50000 100000 150000 200000
Here is 100,000 times the uniform kernel smoothed density with a wider bandwidth of 10,000,
and thus more smoothing:
density
0.7
0.6
0.5
0.4
0.3
0.2
0.1
size
50000 100000 150000 200000
41
Note that a very small probability has been assigned to negative loss sizes. This could be avoided by using a more
complicated kernel which is not on the Syllabus.
Here is 100,000 times the uniform kernel smoothed density with an even wider bandwidth of
25,000, and thus even more smoothing:42
density
0.5
0.4
0.3
0.2
0.1
size
50000 100000 150000 200000
Based on this graph, one would estimate that the mode, the place where the density is largest, is
somewhere near 25,000.
42
In practical applications, one would want enough smoothing in order to remove most of the effects of random
fluctuations, while avoiding too much smoothing which removes the informational content.
In this case, a bandwidth of 25,000 seems to produce a little too much smoothing , while a bandwidth of 5000
seems to produce not enough smoothing.
Variance of the Estimate of the Density at a Single Point:
For the ungrouped data in Section 2, out of 130 losses, 21 are in the interval [100,000, 150,000].
Thus using a bandwidth of 25,000, the kernel smoothed density at 125,000 is:
(21/ 130) / 50,000 = 0.000003231.
This is an estimate of f(125,000) for the distribution from which this data was drawn. It is
mathematically equivalent to the estimate from a histogram with an interval [100,000, 150,000].

The histogram has variance:
= (21/130) (109/130) / {(130)(50,0002 )}. The standard deviation of the estimate is 0.000000646.
Thus a 95% confidence interval for f(125,000) is: 0.000003231 ± (1.96)(0.000000646).

For 100,000 times the uniform kernel smoothed density with a bandwidth of 25,000,
here is a graph of the point estimate plus or minus 1.96 standard deviations of that estimate:
density
0.6
0.5
0.4
0.3
0.2
0.1
size
50000 100000 150000 200000
In general, the variance of the estimate of the density at x from using a uniform kernel is:
(z / N) {1 - (z / N)}
, where z is the number of items in the interval [x - b, x + b].
N (2b) 2
Kernels Related to the Uniform Distribution:
The uniform kernel is a member of a family of kernels:43

n
2n ) ⎧ ⎛ x - y ⎞2⎫
ky(x) = (1 - 0.5 ⎨1 - ⎬ / b, y - b ≤ x ≤ y + b.
⎩ ⎝ b ⎠ ⎭
n = 0 ⇔ Uniform Kernel. ky(x) = 1 / (2b), y - b ≤ x ≤ y + b.
3 ⎧ ⎛ x - y ⎞2⎫
n = 1 ⇔ Epanechnikov Kernel. ky(x) = ⎨1 - ⎬ , y - b ≤ x ≤ y + b.
4b ⎩ ⎝ b ⎠ ⎭
Here is a graph of an Epanechnikov Kernel, centered at 0 with bandwidth 1,

k0 (x) = (3/4) (1 - x2 ) , - 1 ≤ x ≤ 1:
density
0.7
0.6
0.5
0.4
0.3
0.2
0.1
size
- 1.0 - 0.5 0.5 1.0
We can think of the uniform kernel as coming from the density: 1/2, - 1 ≤ x ≤ 1.
We center each kernel at an observed point, and introduce a scale via the bandwidth b.
Similarly, we can think of an Epanechnikov Kernel as coming from the density:

(3/4) (1 - x2 ) , - 1 ≤ x ≤ 1.
43
The area under the kernel is 1. Do not memorize the formulas for this family of kernels.
See Klein and Moeshberger, Survival Analysis, no longer on the Syllabus.
Exercise: Verify that as with all kernels, the Epanechnikov Kernel has an area of 1.
y+b
⎧ ⎛ x - y ⎞2⎫ b3 (-b)3
∫
3 3
[Solution: ⎨1 - ⎬ dx = {2b - + } = 1.]
4b ⎩ ⎝ b ⎠ ⎭ 4b 3 b2 3 b2
y-b
Exercise: You are given four losses: 0.3, 1.8, 2.2, 3.6.
Using an Epanechnikov kernel with bandwidth 1, estimate the density function at 1.5.
[Solution: Only those kernels centered at data points within 1 of 1.5 contribute.
k1.8(1.5) = (3/4) (1 - 0.32 ) = 0.6825.
k2.2(1.5) = (3/4) (1 - 0.72 ) = 0.3825.

0 + 0.6825 + 0.3825 + 0
The estimate of f(1.5) is: = 0.26625.
4
Comment: Here is a graph of the kernel smoothed density:
density
0.35
0.30
0.25
0.20
0.15
0.10
0.05
size
1 2 3 4 ]
Using an Epanechnikov kernel with bandwidth 1, estimate the distribution function at 1.5.
[Solution: All of the kernel centered at 0.3 is to left of 1.5; it contributes 1 to F(1.5).
None of the kernel centered at 3.6 is to left of 1.5; it contributes 0 to F(1.5).
The contribution of the kernel centered at 1.8 is:
1.5
(3/4) ∫0.8 1 - (x - 1.8)2 dx = (3/4) {0.7 - (-0.3)3/3 + (-1)3/3} = 0.28175.
1.5
(3/4) ∫1.21 - (x - 2.2)2 dx = (3/4) {0.3 - (-0.7)3/3 + (-1)3/3} = 0.06075.
1 + 0.28175 + 0.06075 + 0
The estimate of F(1.5) is: = 0.335625.
4
Comment: In general, for an Epanechnikov kernel centered at y, the contribution to F(x) is:
⎧ 0, for x ≤ y - b
⎪
⎪⎪
1 3 (x - y) (x - y)3
⎨ + - , for y - b < x < y + b . ]
⎪2 4b 4 b3
⎪
⎩⎪ 1, for x ≥ y + b
2
15 ⎧ ⎛ x - y ⎞2⎫
n = 2 ⇔ Biweight or Quartic Kernel. ky(x) = ⎨1 - ⎬ , y - b ≤ x ≤ y + b.
16 b ⎩ ⎝ b ⎠ ⎭
We can think of a Biweight Kernel as coming from the density: (15/16) (1 - x2 )2 , - 1 ≤ x ≤ 1.

Exercise: Verify that as with all kernels, the Biweight Kernel has an area of 1.
y+b y+b
2 4
⎧ ⎛ x - y ⎞2⎫ ⎛ x - y⎞ 2 ⎛ x - y⎞
∫ ∫
15 15
[Solution: ⎨1 - ⎬ dx = 1 - 2 + dx =
16 b ⎩ ⎝ b ⎠ ⎭ 16 b ⎝ b ⎠ ⎝ b ⎠
y-b y-b
15 b3 (-b)3 b5 (-b)5
{2b - 2 + 2 + - } = (15/16) (2 - 4/3 + 2/5) = 1.]
16 b 3 b2 3 b2 5 b4 5 b4
Here is a graph comparing a Biweight Kernel and an Epanechnikov Kernel,

each with bandwidth of one, each centered at zero:
density
BiWeight
0.8
Epanechnikov
0.6
0.4
0.2
size
- 1.0 - 0.5 0.5 1.0
The BiWeight kernel is more highly peaked than the Epanechnikov kernel, which in turn is more
highly peaked than the uniform kernel (which has no peak).
Using a BiWeight kernel with bandwidth 1, estimate the density function at 1.5.
[Solution: Only those kernels centered at data points within 1 of 1.5 contribute.
k1.8(1.5) = (15/16) (1 - 0.32 )2 = 0.77634.
k2.2(1.5) = (15/16) (1 - 0.72 )2 = 0.24384.

0 + 0.77634 + 0.24384 + 0
The estimate of f(1.5) is: = 0.25505.]
4
Using a Biweight kernel with bandwidth 1, estimate the distribution function at 1.5.
[Solution: All of the kernel centered at 0.3 is to left of 1.5; it contributes 1 to F(1.5).
None of the kernel centered at 3.6 is to left of 1.5; it contributes 0 to F(1.5).
1.5 1.5
(15/16) ∫0.8 {1 - (x - 1.8)2}2 dx = (15/16) 0.8∫ 1 - 2(x - 1.8)2 + (x - 1.8)4 dx =
(15/16) {0.7 - (2)(-0.3)3 /3 + (2)(-1)3 /3 + (-0.3)5 /5 + (-1)5 /5 } = 0.23517.
1.5 1.5
(15/16) ∫1.2 {1 - (x - 2.2)2 }2 dx = (15/16) ∫1.21 - 2(x - 2.2)2 + (x - 2.2)4 dx =
(15/16) {0.3 - (2)(-0.7)3 /3 + (2)(-1)23/3 + (-0.7)5 /5 + (-1)5 /5 } = 0.02661.
1 + 0.23517 + 0.02661 + 0
The estimate of F(1.5) is: = 0.31545.]
4
One can generalize this family by allowing n to be non-integer:44

n
1 ⎧ ⎛ x - y ⎞2⎫
ky(x) = ⎨1 - ⎬ , y - b ≤ x ≤ y + b.
b β(n+ 1, 1/ 2) ⎩ ⎝ b ⎠ ⎭
2
1 - {(x - y) / b}2 / b, y - b ≤ x ≤ y + b.
45
For n = 1/2, ky(x) =
bπ
Here is a graph of an example of this kernel, a semicircle, centered at 0 with bandwidth 1:
density
0.6
0.5
0.4
0.3
0.2
0.1
size
- 1.0 - 0.5 0.5 1.0
44
The Beta Function is discussed in "Mahler's Guide to Loss Distributions." The area under the kernel is 1.
45
See 4, 5/05, Q.22.
Triangular Kernel:
The triangular kernel centers a triangular density at each of the observed points.
The triangular density has width 2b and height 1/b, where b is the bandwidth.
height
1/b
width 2b
The area of this triangle is 1; the area under any density is 1.
One could smooth the above empirical model, p(81) = 1/3, p(157) = 1/3, p(213) = 1/3, using a
triangular kernel with a bandwidth of for example 50. Rather than a point mass of probability at each
data point, we spread the probability using the triangular density. The 1/3 point mass at 81 is
spread over the interval 31 to 131, of width (2)(50) = 100 centered at 81, with more weight to
values near 81 and less weight to those near the endpoints of the interval.
Exercise: What is the density at 61 for a triangular density centered at 81 with bandwidth 50?
[Solution: The density is 0 at 81 - 50 = 31 and 1/50 = .02 at 81.
0.02
0.012
31 61 81 131
Linearly interpolating, the density at 61 is: (3/5)(0.02) = 0.012.]
Thus the kernel smoothed density at 61 is: .012/3 = 0.004. Only the triangle centered at 81
contributes at 61. However at 120, both the triangles centered at 81 and 157 contribute.
The density at 120 of the first triangular kernel centered at 81 is: (0.02)(131 - 120)/50 = 0.0044.
0.02
0.0044
31 81 120 131
Instead of drawing a diagram, one can use the following formula for the triangular kernel:
b - |x - y |
ky(x) = , y - b ≤ x ≤ y + b.
b2
For example, k81(120) = (50 - |120 - 81|)/502 = 11/502 = 0.0044.
(b - |x - y|) is the distance of x from the closer endpoint, y - b or y + b.

(b - |x - y|)/b is the ratio of the distance of x from the closer endpoint to the distance from the center
to this endpoint.
{(b - |x - y|)/b} (1/b) = (b - |x - y|) / b2 , is the linearly interpolated height of the triangle at x.
The density at 120 of the 2nd triangular kernel centered at 157 is: (0.02)(120 - 107)/50 = 0.0052.
Therefore, the smoothed density at 120 is: (0.0044)/3 + (0.0052)/3 = 0.0032.
This smoothing model is: Triangle[31, 131] / 3 + Triangle[107, 207] / 3 + Triangle[163, 263] / 3.
It has pdf of: 0 for x < 31, (x - 31)/7500 for 31 ≤ x < 81, (131 - x)/7500 for 81 ≤ x < 107,
(131 - x)/7500 + (x - 107)/7500 = 0.0032 for 107 ≤ x ≤ 131, (x - 107)/7500 for 131 < x < 157,
(207 - x)/7500 for 157 < x < 163, (207 - x)/7500 + (x - 163)/7500 = 0.00587 for 163 ≤ x ≤ 207,
(x - 163)/7500 for 207 < x ≤ 213, (263 - x)/7500 for 213 < x ≤ 263, 0 for x > 263.
The three separate triangular kernels, centered at 81, 157 and 213, look as follows:
Density
0.02
x
31 81 131
Density
0.02
x
107 157 207
Density
0.02
x
163 213 263
Note that the slope of each triangular kernel changes at its endpoints and peak. So for example, the
first triangular kernel is not differentiable at: 31, 81, and 131.
This triangular kernel smoothed density is an average of the individual triangular kernels, and looks as
follows:46
Density
0.006
0.005
0.004
0.003
0.002
0.001
x
31 81 107 131 157 213 263
Note that this triangular kernel smoothed density is not differentiable at:
31, 81, 107, 131, 157, 163, 207, 213, and 263.
It is level between 107 and 131, as well as between 163 and 207;
in these intervals the decreasing contribution from one triangle kernel is offset by the increasing
contribution from another triangle kernel.
46
In general we would weight the kernels together, with for example a point that appeared twice in the data set
getting twice the weight.
Shown out to 200,000, here is 100,000 times the kernel smoothed density for the ungrouped data
in Section 2, using a triangular kernel with a bandwidth of 10,000:47
0.8
0.6
0.4
0.2
50000 100000 150000 200000
For the ungrouped data in Section 2, here is 100,000 times the triangular kernel smoothed density
with a wider bandwidth of 50,000, and thus more smoothing:
0.40
0.35
0.30
0.25
0.20
0.15
50000 100000 150000 200000
47
The computer in graphing this density, has made the triangles less obvious than they in fact are. Note that a very
small probability has been assigned to negative loss sizes. This can be avoided by using a more complicated kernel,
not on the Syllabus.
Gamma Kernel:
One can turn a size of loss distribution into a kernel. For example, the Gamma Kernel has a mean
equal to an observed point. Specifically, if y is the observed point, the Gamma Kernel is a Gamma
density with parameters α and θ = y/α, mean y and coefficient of variation 1/ α .
The smaller α, the larger the CV and the more smoothing.
For example, here is a Gamma Kernel with α = 10, θ = 8.1, and mean 81:
0.015
0.0125
0.01
0.0075
0.005
0.0025
50 100 150 200
Unlike the previous kernels, the gamma kernel has support 0 to ∞. Therefore, all of the individual
gamma distributions contribute something to the gamma kernel smoothed density at any point
greater than zero.48
One could smooth the above empirical model, p(81) = 1/3, p(157) = 1/3, p(213) = 1/3, using a
gamma kernel with for example α = 10. Rather than a point mass of probability at each data point,
we spread the probability using the gamma density.
Exercise: What is the density at 120 for a Gamma with mean 81 and α = 10?
[Solution: α = 10 and θ = 81/10 = 8.1. The density of a Gamma is: f(x) = θ−αxα−1 e−x/θ / Γ(α)
= 8.1-10 x9 e-x/8.1/9! = 2.267 x 10-15 x9 e-x/8.1. f(120) = 0.00431.]
Similarly, the density at 120 for a gamma density with mean 157 and α = 10 is 0.00749,
and the density at 120 for a gamma density with mean 213 and α = 10 is 0.00264.
The smoothed density at 120 is: (0.00431 + 0.00749 + 0.00264)/3 = 0.00481.
48
Also, there is no density assigned to negative values as can be the case with either the uniform or triangular
kernels.
The Gamma Kernel smoothing model is:

Gamma[10, 8.1] / 3 + Gamma[10, 15.7] / 3 + Gamma[10, 26.3] / 3.
This gamma kernel smoothed density looks as follows:
0.006
0.005
0.004
0.003
0.002
0.001
100 200 300 400

Shown out to 200,000, here is 100,000 times the kernel smoothed density for the ungrouped data
in Section 2, using a gamma kernel with α = 4:
1.0
0.8
0.6
0.4
0.2
0.0
0 50000 100000 150000 200000
For the ungrouped data in Section 2, 100,000 times the gamma kernel smoothed density with a
smaller α of 2, and thus more smoothing:
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0 50000 100000 150000 200000
Distribution Function, Uniform Kernel:49
One can also use kernels to smooth distribution functions. One sees how much of the area from a
kernel is to the left of the value at which one wants the smoothed distribution.
For the data 81, 157, 213, and a uniform kernel with a bandwidth of 50, here is how one computes
the smoothed distribution at 120.
The uniform kernel centered at 81 goes from 31 to 131.

Thus (120 - 31)/100 = 0.89 of its area is to the left of 120.50
0.89
31 81 120 131
The uniform kernel centered at 157 goes from 107 to 207.

Thus (120 - 107)/100 = 0.13 of its area is to the left of 120.
0.13
107 120 157 207
The uniform kernel centered at 213 goes from 163 to 263. Thus none of its area is to the left of 120.
The empirical model assigns 1/3 probability to each of the three data points.
Thus the smoothed distribution at 120 is: 0.89/3 + 0.13/3 + 0/3 = 0.340.
49
See 4, 11/04, Q.20, and 4, 11/06, Q.24, and 4, 5/07, Q.16.
50
Recall that every kernel has an area of one.
A kernel density estimator of the distribution function corresponding to a discrete p(yj) is:
n
F(x) = ∑ p(yj) K yj(x) .
^
j=1
⎧ 0 for x < y - b
⎪
⎪⎪
x - (y - b)
For the uniform kernel, Ky(x) = ⎨ for y - b ≤ x ≤ y + b .
⎪ 2b
⎪
⎩⎪ 1 for x > y + b
For the uniform kernel with bandwidth 50 and centered at 81:

⎧ 0 for x < 31
⎪
⎪⎪
x - 31
K81(x) = ⎨ for 31 ≤ x ≤ 131.
⎪ 100
⎪
⎩⎪ 1 for x > 131
For the uniform kernel centered at 157:

⎧ 0 for x < 107
⎪
⎪⎪
x - 107
K157(x) = ⎨ for 107 ≤ x ≤ 207 .
⎪ 100
⎪
⎩⎪ 1 for x > 207
For the uniform kernel centered at 213:

⎧ 0 for x < 163
⎪
⎪⎪
x - 163
K213(x) = ⎨ for 163 ≤ x ≤ 263
⎪ 100
⎪
⎩⎪ 1 for x > 263
Then the uniform kernel smoothed Distribution Function is:
⎧ 0, for x < 31
⎪
⎪
⎪ x - 31
, for 31 ≤ x < 107
⎪ 300
⎪
⎪ x - 31 x - 107
⎪ + , for 107 ≤ x < 131
300 300
⎪
⎪⎪ 1 x - 107
K81(x)/3 + K157(x)/3 + K213(x)/3 = ⎨ + , for 131 ≤ x < 163
⎪ 3 300
⎪
⎪ 1 + x - 107 + x - 163 , for 163 ≤ x < 207
⎪3 300 300
⎪
⎪ 2 x - 163
⎪ + , for 207 ≤ x < 263
⎪ 3 300
⎪
⎩⎪ 1, for 263 ≤ x
Exercise: Using this algebraic form, determine the smoothed distribution at 120.
[Solution: (120 - 31)/ 300 + (120 - 107)/300 = 0.340, matching the previous result.]
Here is a graph of the uniform kernel smoothed distribution function:
F(x)
1
61
75
13
25
31
75
19
75
x
31 107 131 163 207 263
The uniform kernel smoothed distribution is a series of connected line segments, where the slope
changes where the smoothed density has jump discontinuities.
F(131) = 31/75 < 0.5, and F(163) = 13/25 > 0.5, thus the median of the the uniform kernel
smoothed distribution function is between 131 and 163.
Linearly interpolating, the median is at:
{(0.5 - 31/75)(163) + (13/25 - 0.5)(131)} / (13/25 - 31/75) = 157.
Alternately, use the algebraic form of F(x) for x between 131 and 163.
Set 0.5 = 1/3 + (x - 107)/300. ⇒ x = 157.
Distribution Function, Triangle Kernel:51
For the triangle kernel, it takes more work to compute the areas than for the uniform kernel.
However, the idea is the same; we need to determine the area to the left of a vertical line.
Exercise: For the data 81, 157, 213, and a triangular kernel with a bandwidth of 50, what is the
smoothed distribution at 180?
[Solution: The triangular kernel centered at 81 goes from 31 to 131. Thus all of its area is to the left of
180. The triangular kernel centered at 157 goes from 107 to 207:
density
0.02
x
107 157 180 207
The triangle to the right of 180 has width 27, height (27/50)(1/50), and area (27/50)2 /2.
Thus 1 - (27/50)2 /2 = 0.8542 of the large triangle's area is to the left of 180.
The triangular kernel centered at 213 goes from 163 to 263:
density
0.02
x
163 180 213 263
The triangle to the left of 180 has width 17, height (17/50)(1/50), and area (17/50)2 /2.
Thus (17/50)2 /2 = 0.0578 of the large triangle's area is to the left of 180.
The smoothed distribution at 180 is: 1/3 + 0.8542/3 + 0.0578/3 = 0.637.]
51
See 4, 11/05, Q.9.
n
F(x) = ∑ p(yj) K yj(x) .
^
j=1
⎧ 0, for x < y - b
⎪
⎪ 2
⎪ {x - (y - b)} , for y - b ≤ x < y
⎪ 2 b2
For the triangle kernel, Ky(x) = ⎨ .
⎪ {x - (y + b)}2
⎪1 - , for y ≤ x < y + b
2 b2
⎪
⎪
⎩ 1, for x ≥ y + b
For the triangle kernel with bandwidth 50 and centered at 81:

⎧ 0, for x < 31
⎪
⎪
⎪ (x - 31)2
, for 31 ≤ x < 81
⎪ 5000
K81(x) = ⎨ .
⎪ (x - 131) 2
⎪1 - , for 81 ≤ x < 131
5000
⎪
⎪
⎩ 1, for x ≥ 131
For the triangle kernel centered at 157:

⎧ 0, for x < 107
⎪
⎪ 2
⎪ (x - 107) , for 107 ≤ x < 157
⎪ 5000
K157(x) = ⎨ .
⎪ (x - 207)2
⎪1 - , for 157 ≤ x < 207
5000
⎪
⎪
⎩ 1, for x ≥ 207
For the triangle kernel centered at 213:

⎧ 0, for x < 163
⎪
⎪ 2
⎪ (x - 163) , for 163 ≤ x < 213
⎪ 5000
K213(x) = ⎨ .
⎪ (x - 263)2
⎪1 - , for 213 ≤ x < 263
5000
⎪
⎪
⎩ 1, for x ≥ 263
Then the triangle kernel smoothed Distribution Function is:

⎧ 0, for x < 31
⎪
⎪
⎪ (x - 31)2
, for 31 ≤ x < 81
⎪ 15,000
⎪
⎪ 1 (x - 131)2
⎪ - , for 81 ≤ x < 107
⎪ 3 15,000
⎪ 2 2
⎪ 1 - (x - 131) + (x - 107) , for 107 ≤ x < 131
⎪3 15,000 15,000
⎪
⎪ 1 (x - 107)2
⎪ - , for 131 ≤ x < 157
⎪ 3 15,000
K81(x)/3 + K157(x)/3 + K213(x)/3 = ⎨ .
⎪ 2 (x - 207)2
⎪ - , for 157 ≤ x < 163
3 15,000
⎪
⎪ 2 2
⎪2 - (x - 207) + (x - 163) , for 163 ≤ x < 207
⎪3 15,000 15,000
⎪
⎪ 2 (x - 163) 2
⎪ + , for 207 ≤ x < 213
⎪ 3 15,000
⎪
⎪ (x - 263)2
1 - , for 213 ≤ x < 263
⎪ 15,000
⎪
⎪
⎩ 1, for x ≥ 263
Exercise: Using this algebraic form, determine the smoothed distribution at 180.
[Solution: 2/3 - (180 - 207)2 / 15,000 + (180 - 163)2 /15,000 = 0.637, matching the previous result.]
Here is a graph of the triangle kernel smoothed distribution function:
F(x)
1.0
0.8
0.6
0.4
0.2
x
50 100 150 200 250 300
General Formulas:
n
A kernel density estimator of a discrete density p(yj) is: f (x) = ∑ p(yj) k yj(x) .52
^
j=1
1
For the uniform kernel, ky(x) = , y - b ≤ x ≤ y + b.53
2b
b - |x - y |
For the triangular kernel, ky(x) = 2 , y - b ≤ x ≤ y + b.54
b
(α / y)α xα - 1 e- x α/ y
For the gamma kernel, ky(x) = , 0 < x < ∞.
Γ(a)
n
F(x) = ∑ p(yj) K yj(x) .
^
j=1
x - (y - b)
For the uniform kernel, Ky(x) = 0 for x < y - b, for y - b ≤ x ≤ y + b, 1 for x > y + b.
2b
For the triangular kernel, Ky(x) = 0 for x < y - b,
{x - (y - b)}2 {x - (y + b)}2
for y - b ≤ x ≤ y, 1 - for y ≤ x ≤ y + b, 1 for x > y + b.
2b 2 2b2
Variances:
For the three observations: 81, 157, 213, we constructed the kernel smoothed density with a
uniform kernel with bandwidth 50. The Uniform Kernel smoothing model is:
Uniform[31, 131] / 3 + Uniform[107, 207] / 3 + Uniform[163, 263] / 3.
With pdf of: 0 for x < 31, 1/300 for 31 ≤ x < 107, 2/300 for 107 ≤ x ≤ 131,
1/300 for 131 < x < 163, 2/300 for 163 ≤ x ≤ 207, 1/300 for 207 < x ≤ 263, 0 for x > 263.
The mean of each kernel is its corresponding data point.55

Thus, the mean of the kernel smoothed density is the mean of the data.
In this case, the mean is: (81 + 157 + 213)/3 = 150.333.

52
53
Note that the density is positive at the endpoints; in other words the endpoint are included.
54
Due to the nature of the triangle kernel, the density is zero at the endpoints.
55
Exception, see 2014 Exam C Sample Q.300.
We can compute the second moment of the kernel smoothed density:

(1/3) (2nd moment of Uniform[31, 131]) + (1/3) (2nd moment of Uniform[107, 207])
+ (1/3) (2nd moment of Uniform[163, 263]) =
(1/3){(1313 - 313 )/300 + (2073 - 1073 )/300 + (2633 - 1633 )/300} =
(1/3){7394.33 + 25482.33 + 46202.33} = 26,359.7.
Therefore, the variance of the kernel smoothed density is: 26,359.7 - 150.3332 = 3759.7.
This is same manner in which we would get the variance of a mixture of three uniform distributions.
Alternately, we can think of the kernel smoothed density as 3 equally likely risk types.56
Then the process variance of each type of risk is that of a uniform distribution of width 100,
which is 1002 /12. Thus the expected value of the process variance is: 1002 /12 = 833.33.
The mean of each uniform kernel is the corresponding observed point. Thus the variance of the
hypothetical means is the variance of the observed data. In this case, the variance of the data is:57
(1/3){(81 - 150.33)2 + (157 - 150.33)2 + (213 - 150.33)2 } = 2926.22.
The variance of the kernel smoothed density is: EPV + VHM = 2926.22 + 833.33 = 3759.6.58
The variance of the kernel smoothed density is the (biased) variance of the data (the
variance of the empirical distribution function) plus the (average) variance of the kernel.59
Each uniform kernel has variance: (2b)2 /12 = b2 /3.
A triangular density from 0 to 2b has mean b, and second moment:

b 2b
∫0 x2 x / b2 dx + ∫b x2 (2b - x) / b2 dx = b2/4 + 11b2/12 = 7b2/6.

Therefore, the variance of this triangular kernel is: 7b2 /6 - b2 = b2 /6.
Each triangular kernel has variance: b2 /6.
Exercise: For this same example, what is the variance of the triangle kernel smoothed density for a
bandwidth of 50?
[Solution: 2926.22 + 502 /6 = 3342.9.]
56
See "Mahler's Guide to Buhlmann Credibility."
57
We do not take the sample variance.
58
See "Mahler's Guide to Buhlmann Credibility."
59
In the case of the Gamma kernel, each kernel has a variance of αθ2 = y2 /α. So in this case one must take an
average of these variances, which is the second moment of the data divided by α.
Pareto Kernel:60
One can use other size of loss distributions as the basis of kernels. For example, a Pareto with
parameters α > 1 and θ = y(α - 1), has a mean of y, and density: (αθα)(θ + x) − (α + 1).
α {y(α - 1)}α
Therefore, for the Pareto kernel, ky(x) = , 0 < x < ∞.
{y(α - 1) + x}α + 1
Shown out to 200,000, here is the kernel smoothed density for the ungrouped data in Section 2,
using a Pareto kernel with α = 10:
Density times one million
15
10
Size
50000 100000 150000 200000
60
Limited Expected Values, Uniform Kernel:
First, let us assume for simplicity that we have a sample of size one; one value at 213.
Let us use a uniform kernel with bandwidth 50:
0.01
0.008
0.006
0.004
0.002
50 100 150 200 250 300
The mean of the kernel is the midpoint, 213, the observed value.
Let us calculate some limited expected values for this kernel.
E[X ∧ 300] = 213, since all of the values are less than 300 the limit in this case had no effect.
E[X ∧ 150] = 150, since all of the values become 150 after we apply the limit.
With instead a limit of 200, some of the values are capped at 200 and some are unaffected.
200 200
∫ ∫163 (x / 100) dx + 126 =

263 - 200
E[X ∧ 200] = x f(x) dx + (200) =
100
163
x = 200
2
x ⎤
+ 126 = 67.155 + 126 = 193.155.
200 ⎥⎦
x = 163
We note that: E[X ∧ 200] = 193.155 ≤ 213 = E[X],

and E[X ∧ 200] = 193.155 ≤ 200.
In order to compute E[X ∧ 200] geometrically, we can divide this kernel by a vertical line at 200:
density
0.01
y
163 200 213 263
200 - 163
There is an area of: = 0.37 to the left of 200,
100
263 - 200
and an area of: = 0.63 to the right of 200.61
100
The values to the left of 200 each contribute their values to E[X ∧ 200].
The average of these small values is: (163 + 200)/2 = 181.5.
Thus the contribution of the small losses is: (0.37)(181.5) = 67.155.
The values to the right of 200 each contribute 200 to E[X ∧ 200].
Thus the contribution of the large losses is: (0.63)(200) = 126.
Therefore, E[X ∧ 200] = 67.155 + 126 = 193.155, matching the previous result.
61
The area under a kernel is one.
In general, let us assume that we have a uniform kernel centered at y, with bandwidth b:
f(x) = 1 / (2b), y-b ≤ x ≤ y+b.
Then for L ≥ y + b, since all of the values are less than L the limit in this case had no effect.
Thus E[X ∧ L] = y.
For L ≤ y - b, all of the values become L after we apply the limit.

Thus E[X ∧ L] = L.
If y + b > L > y - b, some of the values are capped at L and some are unaffected.
L L
∫ ∫y-b 2b dx + L
y + b - L x y + b - L
E[X ∧ L] = x f(x) dx + L = =
2b 2b
y-b
x =L
2
x ⎤ y + b - L 2L (y + b) - (y - b)2 - L2
+L = .
4b ⎥⎦ 2b 4b
x =y -b
Thus for a uniform kernel centered at y with bandwidth b:62

L, L ≤ y - b
⎧
⎪2L (y + b) - (y - b)2 - L2
E[X ∧ L] = ⎨ , y - b < L < y + b.
⎪ 4b
⎩ y, L ≥ y + b
Exercise: Use the above formula to compute E[X ∧ 200] for the uniform kernel centered at 213 with
bandwidth 50.
2L (y + b) - (y - b)2 - L2 (2) (200) (213 + 50) - (213 - 50)2 - 2002
[Solution: = = 193.155.
4b (4)(50)
Comment: Matching the previous result.]
Exercise: Compute E[X ∧ 200] for the uniform kernel centered at 157 with bandwidth 50.
2L (y + b) - (y - b)2 - L2 (2) (200) (157 + 50) - (157 - 50)2 - 2002
[Solution: = = 156.755.
4b (4)(50)
Comment: You can get the same result from first principles, either algebraically or geometrically.]
E[X ∧ 200] for the uniform kernel centered at 81 with bandwidth 50 is its mean of 81.
62
Let us return to the example, where we have a data set of size three: 81, 157, and 213.
We apply a uniform kernel with bandwidth 50.
The mean of each kernel is the point at which it is centered.

Therefore, the mean of the kernel smoothed density is the mean of the data:
(1/3)(81) + (1/3)(157) + (1/3)(213) = 150.333.
E[X ∧ 200] for the kernel smoothed density is the average of the limited expected values for each
of the individual kernels computed previously:
(1/3)(81) + (1/3)(156.755) + (1/3)(193.155) = 143.637.
In general, the limited expected value for a kernel smoothed density is a weighted average of the
limited expected values of each of the individual kernels, with weights equal to the number of times
each value appears in the original data set.
For this example, here is a graph of the limited expected values of the uniform kernel smoothed
density as a function of the limit:
LEV
150.3
143.6
100
50
L
100 200 263
A limit of more than 213 + 50 = 263 has no effect; so that for such a limit the limited expected value
is just the mean of: (81 + 157 + 213) / 3 = 150.333.
Limited Expected Values, Triangle Kernel:
Working with the Triangle Kernel is harder than working with the Uniform Kernel.
First, let us assume for simplicity that we have a sample of size one; one value at 213.
Let us use a triangle kernel with bandwidth 50:
density
0.02
x
163 200 213 263
The mean of this kernel is the midpoint, 213, the observed value.
E[X ∧ 300] = 213, since all of the values are less than 300 the limit in this case had no effect.
E[X ∧ 150] = 150, since all of the values become 150 after we apply the limit.
With instead a limit of 200, some of the values are capped at 200 and some are unaffected.
x - 163
For 163 ≤ x ≤ 213, f(x) = (0.02) = 0.0004x - 0.0652.63
213 - 163
b - |x - y |
Alternately, ky(x) = , y - b ≤ x ≤ y + b.
b2
50 - | x - 213 |
Here, k213(x) = , 163 ≤ x ≤ 213.
502
50 - (213 - x)
Thus f(x) = = 0.0004x - 0.0652, 163 ≤ x ≤ 213,
50 2
50 - (x - 213)
and f(x) = = -0.0004x + 0.1052, 213 ≤ x ≤ 263.
50 2
63
One can verify that f(163) = 0 and f(213) = 0.02.
The area to the left of 200 is a fraction of the area of 1/2 that is to the left of 213:
⎛ 200 - 163 ⎞ 2
(1/2) = 0.2738.
⎝ 213 - 163 ⎠
Alternately, the area to the left of 200 is a triangle, with base of 37,
200 - 163
and height of: (0.02) = 0.0148.
213 - 163
Thus the area of this triangle is: (1/2)(37)(0.0148) = 0.2738.
Therefore, the area to the right of 200 is: 1 - 0.2738 = 0.7262.
200 200
E[X ∧ 200] = ∫163 x f(x) dx + (200) (1 - 0.2738) = 163∫ x (0.0004x - 0.0652) dx + 145.24 =
x = 200
0.0004 x3 ⎤
- 0.0326x2 ⎥ + 145.24 = 51.383 + 145.24 = 196.623.
3 ⎦
x = 163
We note that: E[X ∧ 200] = 196.623 ≤ 213 = E[X],

and E[X ∧ 200] = 196.623 ≤ 200.
One can instead use a geometric approach. In order to do so, I will use a result for right triangles.
Assume we have a right triangle with base from c to d:
c d
Then the average value is: d - (d-c)/3.

In other words, the average is one third of the way from the "high end" to the "low end".
This result does not depend on the ratio of the base to the height.
Instead, assume we have a right triangle as follows:
d c
Then the average value is: d + (d-c)/3; again one third of the way from the high end to the low end.
density
0.02
x
163 200 213 263
⎛ 200 - 163 ⎞ 2
As determined before, the area of the triangle to the left of 200 is: (1/2) = 0.2738.
⎝ 213 - 163 ⎠
The average of this small right triangle is: 200 - (200 - 163)/3 = 187.667.
In computing the limited expected value at 200, any value greater than 200 contributes 200.
In contrast, all of the values less than 200 contribute their value to E[X ∧ 200].
Thus E[X ∧ 200] = (0.2738)(187.667) + (200)(1 - 0.2738) = 196.623,
matching the previous result.
Exercise: Compute E[X ∧ 200] for the triangle kernel centered at 157 with bandwidth 50:
density
0.02
x
107 157 207
200
[Solution: The area to the right of 200 is a fraction of the area of 1/2 that is to the left of 213:
⎛ 207 - 200 ⎞ 2
(1/2) = 0.0098.
⎝ 207 - 157 ⎠
The average for this triangle is: 200 + (207 - 200)/3 = 202.333.
For the whole triangle kernel, E[X] = 157.
In computing the limited expected value at 200, any value greater than 200 contributes 200.
Thus while the small right triangle contributes to E[X] its area times its average, it only contributes
200 times its area to E[X ∧ 200].
In contrast, all of the values less than 200 contribute their value to both E[X] and E[X ∧ 200].
Thus E[X ∧ 200] = 157 - (202.333)(0.0098) + (200)(0.0098) = 156.977.
x - 107
Alternately, for 107 ≤ x ≤ 157, f(x) = (0.02) = 0.0004x - 0.0428.
157 - 107
207 - x
For 157 ≤ x ≤ 207, f(x) = (0.02) = 0.0828 - 0.0004x.
207 - 157
200
E[X ∧ 200] = ∫107 x f(x) dx + (200) (0.0098) =
157 200
∫107 x (0.0004x - 0.0428) dx + 157∫ x (0.0828 - 0.0004x) dx + 1.96 =

x = 157 x = 200
0.0004 x3 ⎤ 0.0004 x3 ⎤
- 0.0214x2 ⎥ + 0.0414x2 - ⎥⎦ + 1.96
3 ⎦ 3
x = 107 x = 157
= 70.167 + 84.850 + 1.96 = 156.977.]

For the triangle kernel centered at 81 with bandwidth 50, all of the area is to the left of 200, and thus
E[X ∧ 200] is just the average of this kernel: 81.
Let us return to the example, where we have a data set of size three: 81, 157 and 213.
We apply a triangle kernel with bandwidth 50.
The mean of each kernel is the point at which it is centered.

Therefore, the mean of the kernel smoothed density is the mean of the data:
(1/3)(81) + (1/3)(157) + (1/3)(213) = 150.333.
E[X ∧ 200] for the kernel smoothed density is the average of the limited expected values for each
of the individual kernels computed previously:
(1/3)(81) + (1/3)(156.977) + (1/3)(196.623) = 144.867.
In general, the limited expected value for a kernel smoothed density is a weighted average of the
limited expected values of each of the individual kernels, with weights equal to the number of times
each value appears in the original data set.
For this example, here is a graph of the limited expected values of the triangle kernel smoothed
density as a function of the limit:
LEV
150.3
144.9
100
50
L
100 200 263
In general, let us assume that we have a triangle kernel centered at y, with bandwidth b.
Then for L ≥ y + b, since all of the values are less than L the limit in this case had no effect.
Thus E[X ∧ L] = y.
For L ≤ y - b, all of the values become L after we apply the limit.

Thus E[X ∧ L] = L.
If y + b > L > y - b, some of the values are capped at L and some are unaffected.
First let us assume that y + b > L ≥ y:

1/b
y-b y L y+b
1 ⎛ y + b - L⎞ 2
The area of the triangle to the right of L is: .
2 ⎝ b ⎠
y + b - L
The average of this right triangle is: L + .
3
Each of the values in this small triangle contributes L to E[X ∧ L] rather than its value.
1 ⎛ y + b - L⎞ 2 y + b - L
Thus E[X ∧ L] = E[X] - {L+ - L}
2 ⎝ b ⎠ 3
(y + b - L)3
=y- .
6 b2
If instead y ≥ L > y - b:
1/b
y-b L y y+b
1 ⎛ L - y + b⎞ 2
The area of the triangle to the left of L is: .
2 ⎝ b ⎠
L - y + b
The average of this right triangle is: L - .
3
Each of the values in the triangle contributes its value to E[X ∧ L].
Those values to the right of L contribute L to E[X ∧ L].
1 ⎛ L - y + b⎞ 2 L - y + b 1 ⎛ L - y + b⎞ 2
Thus E[X ∧ L] = {L - } + L {1 - }
2 ⎝ b ⎠ 3 2 ⎝ b ⎠
(L - y + b)3
=L- .
6 b2
Putting all the cases together, we have for a triangle kernel centered at y with bandwidth b:64
⎧ L, for L ≤ y - b
⎪
⎪ 3
⎪L - (L - y + b) , for y - b ≤ L ≤ y
⎪ 6 b2
E[X ∧ L] = ⎨ .
⎪ (y + b - L) 3
⎪y - , for y ≤ L ≤ y + b
2
⎪ 6 b
⎪
⎩ y, for L ≥ y + b
64
Exercise: Use the above formula to compute E[X ∧ 200] for the triangle kernel centered at 213 with
bandwidth 50.
(L - y + b)3 (200 - 213 + 50)3
[Solution: L - = 200 - = 196.623.
6 b2 (6) (50 2)
Exercise: Use the above formula to compute E[X ∧ 200] for the triangle kernel centered at 157 with
bandwidth 50.
(y + b - L)3 (157 - 200 + 50)3
[Solution: y - = 157 - = 156.977.
6 b2 (6) (50 2)
Problems:

A random sample of five claims yields the values:
100, 500, 1000, 2000, and 5000.
6.1 (1 point) Using a uniform kernel with bandwidth 1000,

what is the smoothed probability density function at 2500?
A. 0.0001 B. 0.0002 C. 0.0003 D. 0.0004 E. 0.0005
A. 0.00006 B. 0.00007 C. 0.00008 D. 0.00009 E. 0.00010
6.3 (2 points) Using a triangular kernel with bandwidth 2000,

A. 0.000025 B. 0.000050 C. 0.000075 D. 0.000100 E. 0.000125
6.4 (4 points) Using a gamma kernel with α = 3,

A. 0.00004 B. 0.00005 C. 0.00006 D. 0.00007 E. 0.00008
6.5 (2 points) Using a uniform kernel with bandwidth 1000,

what is the smoothed distribution function at 1700?
A. 0.56 B. 0.58 C. 0.60 D. 0.62 E. 0.64

what is the smoothed distribution function at 3000?
A. 0.76 B. 0.78 C. 0.80 D. 0.82 E. 0.84
6.7 (2 points) Using a uniform kernel with bandwidth 3000,

what is the variance of the kernel smoothed density?

what is the variance of the kernel smoothed density?

A random sample of 20 observations of a random variable X yields the following values:
0.5, 1.1, 1.6, 2.2, 2.4, 3.0, 3.6, 4.2, 4.5, 5.1, 6.3, 6.9, 8.2, 8.8, 9.9, 11.1, 12.5, 13.3, 14.2, 15.4.
6.9 (2 points) Using a uniform kernel with bandwidth 1.5, what is the smoothed pdf at 5.2?
A. Less than 0.03
E. At least 0.06
6.10 (2 points) Using a uniform kernel with bandwidth 3, what is the smoothed pdf at 8?
A. 0.03 B. 0.04 C. 0.05 D. 0.06 E. 0.07
6.11 (3 points) Using a triangular kernel with bandwidth 2, what is the smoothed pdf at 10?
A. 0.03 B. 0.04 C. 0.05 D. 0.06 E. 0.07
For the next two questions, use the following information on the time required to close claims:
Time(weeks), tj Number of closures, sj Number of open claims, rj
1 30 100
2 20 70
3 10 50
4 8 40
5 3 32
6 4 29
7 5 25
8 2 20
^
6.12 (2 points) Using a uniform kernel with a bandwidth of 1.5, determine f (4).
A. 0.07 B. 0.08 C. 0.09 D. 0.10 E. 0.11
^
6.13 (3 points) Using a triangular kernel with a bandwidth of 1.5, determine f (5).
A. Less than 0.045
E. At least 0.060

There are four losses of size: 500, 1000, 1000, 1500.
6.14 (1 point) Draw the kernel smoothed density for a uniform kernel with bandwidth 200.
6.17 (2 points) Draw the triangular kernel smoothed density with bandwidth 200.
6.20 (4 points) You use a Pareto kernel with α = 4.

(For each data point have the mean of the kernel equal to that data point.)
Determine the standard deviation of the kernel smoothed density.
6.21 (3 points) From a population having survival function S, you are given the following sample of
size six: 20, 20, 32, 44, 50, 50.
Colonel Klink uses a uniform kernel with bandwidth 14 in order to estimate S(35).
Colonel Sanders uses a uniform kernel with bandwidth 8 in order to estimate S(35).
What is the difference between the estimates of Colonel Klink and Colonel Sanders?
A. Less than -0.03
E. At least 0.03
6.22 (2 points) For a data set of size n, what is the difference between the variance of the empirical
distribution function and the variance of the triangular kernel smoothed density with bandwidth b?
6.23 (2 points) From a population having density function f, you are given the following sample:
6, 10, 10, 15, 18, 20, 23, 27, 32, 32, 35, 40, 40, 46, 46, 55, 55, 55, 63, 69, 82.
Calculate the kernel density estimate of f, using the uniform kernel with bandwidth 10.
At which of the following values of x is the kernel density estimate of f(x) the largest?
A. 10 B. 20 C. 30 D. 40 E. 50
6.24 (3 points) You are given the following sample: 9, 15, 18, 20, 21.
Using the uniform kernel with bandwidth 4, estimate the median.
(A) 17.0 (B) 17.5 (C) 18.0 (D) 18.5 (E) 19.0
6.25 (2 points) For a data set of size n, what is the difference between the variance of the empirical
distribution function and the variance of the uniform kernel smoothed density with bandwidth b?
6.26 (1.5 points) For a data set of size 10, using a uniform kernel with bandwidth 4, the kernel
smoothed distribution function at 33 is 0.7375.
If 35 is added to the data set, what is the new kernel smoothed distribution function at 33?
6.27 (2 points) Use a kernel which does not have the same mean as the empirical estimate and that
has the following distribution function:
⎧ 0, x < y - 30
⎪ x + 30 - y
Ky(x) = ⎨ , y - 30 ≤ x ≤ y + 20 .
⎪ 50 1, x > y + 20
⎩
You observe 3 values: 60, 100, 200.

Determine the variance of the kernel density estimator of the distribution function.
(A) 2900 (B) 3100 (C) 3300 (D) 3500 (E) 3700
6.28 (2 points) From a population having distribution function F, you are given the following sample:
30, 50, 70, 70, 120, 200.
Calculate the kernel density estimate of S(100), using an exponential kernel.
(A) 0.24 (B) 0.26 (C) 0.28 (D) 0.30 (E) 0.32

1. For a triangular kernel, as the bandwidth increases, the amount of smoothing increases.
2. For a gamma kernel, as α increases, the amount of smoothing increases.
3. For an exponential kernel, the smoothed density always has a mode of zero.
A. 1 B. 2 C. 3 D. 1, 3 E. None of A, B, C, or D

For eight widgets, their times until failure were in months: 7, 11, 13, 17, 19, 22, 28, 31.
6.30 (1 point) Using a uniform kernel with bandwidth 5, what is the smoothed pdf at 20?
A. 0.038 B. 0.040 C. 0.042 D. 0.044 E. 0.046
6.31 (2 points) Using a triangular kernel with bandwidth 5, what is the smoothed pdf at 20?
A. 0.039 B. 0.041 C. 0.043 D. 0.045 E. 0.047
6.32 (2 points) What is the variance of the uniform kernel smoothed density function, using a
bandwidth of 10?
A. 55 B. 65 C. 75 D. 85 E. 95
6.33 (2 points) What is the variance of the triangle kernel smoothed density function, using a
bandwidth of 10?
A. 70 B. 75 C. 80 D. 85 E. 90
6.34 (2 points) Calculate the kernel density estimate of F(15), using the uniform kernel with
bandwidth 5.
A. 0.38 B. 0.40 C. 0.42 D. 0.44 E. 0.46
6.35 (3 points) Calculate the kernel density estimate of F(16), using the triangle kernel with
bandwidth 5.
A. 0.38 B. 0.40 C. 0.42 D. 0.44 E. 0.46
6.36 (4 points) Using a gamma kernel with α = 5, what is the smoothed pdf at 20?
A. 0.025 B. 0.027 C. 0.029 D. 0.031 E. 0.033
6.37 (4 points) Using a pareto kernel with α = 5, what is the smoothed pdf at 20?
A. 0.012 B. 0.014 C. 0.016 D. 0.018 E. 0.020
6.38 (3 points) Calculate the kernel density estimate of F(15), using the pareto kernel with α = 5.
A. 0.68 B. 0.70 C. 0.72 D. 0.74 E. 0.76
6.39 (3 points) Using a uniform kernel with a bandwidth of 5,

what is the median of the kernel smoothed distribution?
A. 17.50 B. 17.75 C. 18.00 D. 18.25 E. 18.50
6.40 (3 points) Using a Normal kernel with σ = 5, estimate F(20).

A. 49% B. 51% C. 53% D. 55% E. 57%
Use the following information for the next 4 problems.

You observe 4 sizes of loss: 200, 200, 300, 500.
6.41 (3 points) Using a uniform kernel with a bandwidth of 150,

for the kernel smoothed density, determine E[X ∧ 250].
A. Less than 210
E. At least 225
6.42 (4 points) Using a triangle kernel with a bandwidth of 150,

for the kernel smoothed density, determine E[X ∧ 250] .
A. Less than 210
E. At least 225
6.43 (4 points) Determine the algebraic form of F(x) for a uniform kernel with a bandwidth of 150.
6.44 (6 points) Determine the algebraic form of F(x) for a triangle kernel with a bandwidth of 150.

• You observe 5 values: 20, 30, 70, 120, 200.
• Define the Epanechnikov kernel with bandwidth b as:
3 ⎧ ⎛ x - y ⎞2⎫
ky(x) = ⎨1 - ⎬ , y - b ≤ x ≤ y + b.
4b ⎩ ⎝ b ⎠ ⎭
6.45 (2 points) Using an Epanechnikov kernel with bandwidth 50,

estimate the density function at 100.
A. 0.0040 B. 0.0042 C. 0.0044 D. 0.0046 E. 0.0048
6.46 (3 points) Using an Epanechnikov kernel with bandwidth 50,

calculate the kernel density estimate of F(100).
A. 0.54 B. 0.56 C. 0.58 D. 0.60 E. 0.62

• You observe 4 values: 4, 6, 13, 17.
• g(t) = 0.1 + t3 /2000, -5 ≤ t ≤ 5.
• ky(x) = g(x - y).
6.47 (2 points) Using this kernel, estimate the density function at 10.
A. 0.040 B. 0.045 C. 0.050 D. 0.055 E. 0.060
6.48 (3 points) Using this kernel, estimate the distribution function at 10.
A. 0.40 B. 0.45 C. 0.50 D. 0.55 E. 0.60

• You observe 6 losses: 20, 20, 20, 30, 30, 45.
• g(t) = {1 - (t/15)2 }2 / 16, -15 ≤ t ≤ 15.
• ky(x) = g(x - y).
6.49 (2 points) Using this kernel, estimate the density function at 28.
A. 0.028 B. 0.030 C. 0.032 D. 0.034 E. 0.036
6.50 (3 points) Using this kernel, estimate the distribution function at 40.
A. 0.80 B. 0.82 C. 0.84 D. 0.86 E. 0.88
6.51 (3 points) You observe 5 losses: 2, 5, 5, 6, 10.

You use a triangle kernel with bandwidth 1.
Determine the median of the kernel smoothed distribution.
6.52 (4, 11/03, Q.4 & 2009 Sample Q. 3) (2.5 points) You study five lives to estimate the time
from the onset of a disease to death. The times to death are: 2 3 3 3 7.
Using a triangular kernel with bandwidth 2, estimate the density function at 2.5.
(A) 8/40 (B) 12/40 (C) 14/40 (D) 16/40 (E) 17/40
6.53 (4, 11/04, Q.20 & 2009 Sample Q.147) (2.5 points) From a population having distribution
function F, you are given the following sample: 2.0, 3.3, 3.3, 4.0, 4.0, 4.7, 4.7, 4.7.
Calculate the kernel density estimate of F(4), using the uniform kernel with bandwidth 1.4.
(A) 0.31 (B) 0.41 (C) 0.50 (D) 0.53 (E) 0.63
6.54 (2 points) In the previous question, calculate the kernel density estimate of F(3), using the
uniform kernel with bandwidth 1.4.
(A) 0.22 (B) 0.24 (C) 0.26 (D) 0.28 (E) 0.30
6.55 (4, 5/05, Q.22 & 2009 Sample Q.192) (2.9 points) You are given the kernel:
ky(x) = (2/π) 1 - (x - y)2 , y - 1 ≤ x ≤ y + 1.
You are also given the following random sample:
1 3 3 5
Determine which of the following graphs shows the shape of the kernel density estimator.
(i) The sample:
1 2 3 3 3 3 3 3 3 3
^
(ii) F 1 (x) is the kernel density estimator of the distribution function using a uniform
kernel with bandwidth 1.
^
(iii) F 2 (x) is the kernel density estimator of the distribution function using a triangular
kernel with bandwidth 1.
^ ^
Determine which of the following intervals has F 1 (x) = F 2 (x) for all x in the interval.
(A) 0 < x < 1
(B) 1 < x < 2
(C) 2 < x < 3
(D) 3 < x < 4
(E) None of (A), (B), (C) or (D)
6.57 (4, 11/06, Q.24 & 2009 Sample Q.268) (2.9 points)
You are given the following ages at time of death for 10 individuals:
25 30 35 35 37 39 45 47 49 55
Using a uniform kernel with bandwidth b = 10, determine the kernel density estimate of the
probability of survival to age 40.
(A) 0.377 (B) 0.400 (C) 0.417 (D) 0.439 (E) 0.485
6.58 (4, 5/07, Q.16) (2.5 points) You use a uniform kernel density estimator with b = 50 to smooth
the following workers compensation loss payments:
82 126 161 294 384
^
If F (x) denotes the estimated distribution function and F5 (x) denotes the empirical distribution
^
function, determine | F (150) - F5 (150) | .
(A) Less than 0.011
(E) At least 0.044
6.59 (2014 Exam C Sample Q.300) You are given:

i) Three observations:
2 5 8
ii) The selected kernel, which does not have the same mean as the empirical estimate,
has distribution function:
⎧ 0, x < y-1
⎪
⎪⎪
x-y+1
Ky(x) = ⎨ , y -1 ≤ x ≤ y + 2
⎪ 3
⎪
⎩⎪ 1, x > y +2
Calculate the coefficient of variation of the kernel density estimator.

(A) 0.47 (B) 0.50 (C) 0.52 (D) 0.57 (E) 0.58
6.1. A. Only those losses within 1000 of 2500 contribute.

There is 1 such loss, at 2000, which contributes Uniform[1000, 3000]/5.
The p.d.f. at 2500 is: (1/2000)/5 = 0.0001.
6.2. D. The density estimate is based on the number of items in the interval [1500, 3500].
Let z = number of items in the interval [1500, 3500].
Then the estimated p.d.f. at 2500 is: (z/5)/2000.
Now the variance of the probability in an interval is a generalization of the variance of the empirical
distribution function: (probability in the interval) (1 - probability in the interval) / N.
Thus Var[z/5] = (1/5)(4/5)/5 = 4/125.
Var[(z/5)/2000] = (4/125)/20002 = 0.000000008.
Standard deviation of the estimate is: 0.0000894.
Equivalently, an estimate of the density from the histogram has variance:
= (1/5)(4/5) / {(5)(20002 )} = 0.000000008.
Standard deviation of the estimate is: 0.0000894.
Comment: We only have 5 data points, so the standard deviation of the estimate of 0.0000894 is
almost as big as the estimate itself of 0.0001.
6.3. B. Only those losses within 2000 of 6000 contribute. There is 1 such loss.
The triangular kernel centered at 5000 has density at 6000 of: (2000 - |6000 - 5000|)/20002 =
0.00025. Thus the smoothed pdf at 6000 is: .00025/5 = 0.00005.
Comment: Here is the triangular kernel smoothed density:
density
0.0014
0.0012
0.0010
0.0008
0.0006
0.0004
0.0002
x
- 2000 2000 4000 6000 8000
6.4. D. The density of a Gamma is: f(x) = θ−αxα−1 e−x/θ/Γ(α) = θ-3x2 e−x/θ / 2.
f(3000) = 4,500,000e−3000/θ /θ3 . For the kernel with mean at point y, θ = y/α = y/3.
f(3000; y) = 121,500,000e−9000/y /y3 . f(3000; 100) = 1.0 x 10-37. f(3000; 500) = 1.5 x 10-8.
f(3000; 1000) = 0.000015. f(3000; 2000) = 0.000169. f(3000; 5000) = 0.000161.
The smoothed density at 3000 is:
(1.0 x 10-37 + 1.5 x 10-8 + 0.000015 + 0.000169 + 0.000161)/5 = 0.000069.
Comment: Here is the gamma kernel smoothed density:
density
0.008
0.006
0.004
0.002
x
1000 2000 3000 4000 5000 6000
6.5. E. The uniform kernel centered at 100 goes to 1100, thus all of it is to the left of 1700.
The uniform kernel centered at 500 goes to 1500, thus all of it is to the left of 1700.
The uniform kernel centered at 1000 goes from 0 to 2000, thus 0.85 of it is to the left of 1700.
The uniform kernel centered at 2000 goes from 1000 to 3000, thus 0.35 of its area is to the left of
1700. The uniform kernel centered at 5000 goes from 4000, thus none of it is to the left of 1700.
The smoothed distribution at 1700 is: 1/5 + 1/5 + 0.85/5 + 0.35/5 + 0/5 = 0.64.
6.6. B. The triangular kernel centered at 100 goes to 2100, thus all of it is to the left of 3000.
The triangular kernel centered at 500 goes to 2500, thus all of it is to the left of 3000.
The triangular kernel centered at 1000 goes to 3000, thus all of it is to the left of 3000.
The triangular kernel centered at 2000 goes from 0 to 4000, thus 1 - (1/2)2 /2 = 7/8 of its area is to
the left of 1700.
The triangular kernel centered at 5000 goes from 3000, thus none of it is to the left of 3000.
The smoothed distribution at 3000 is: 1/5 + 1/5 + 1/5 + (7/8)/5 + 0/5 = 0.775.
6.7. D. The mean of the data is 1720.

The second moment of the data is: (1002 + 5002 + 10002 + 20002 + 50002 ) / 5 = 6,052,000.
The variance of the data is: 6,052,000 - 17202 = 3,093,600.
The variance of the uniform kernel is: b2 /3 = 30002 /3 = 3,000,000.
The variance of the kernel smoothed density is the variance of the data plus the variance of the
kernel: 3,093,600 + 3,000,000 = 6,093,600.
Comment: A uniform kernel with bandwidth b, and width 2b, has variance: (2b)2 /12 = b2 /3.
6.8. A. The mean of the data is 1720. The second moment of the data is 6,052,000.
The variance of the data is: 6,052,000 - 17202 = 3,093,600.
The variance of the triangular kernel is: b2 /6 = 30002 /6 = 1,500,000.
kernel: 3,093,600 + 1,500,000 = 4,593,600.
Comment: A triangular density from 0 to 2b has mean b, and second moment:
b 2b
∫0 x2 x / b2 dx + ∫b x2 (2b - x) / b2 dx = b2/4 + 11b2/12 = 7b2/6.

Therefore, the variance of this triangular kernel is: 7b2 /6 - b2 = b2 /6.
6.9. E. Only those losses within 1.5 of 5.2 contribute. There are 4 such losses, and they each
contribute 1/{(20)(3)}, so the p.d.f. at 5.2 is: 4/60 = 0.067.
6.10. C. Only those losses within 3 of 8 contribute. There are 6 such losses, and they each
contribute 1/{(20)(6)}, so the p.d.f. at 8 is: 6/120 = 0.05.
6.11. C. Only those losses within 2 of 10 contribute. There are 4 such losses.
The triangular kernel centered at 8.2 has density at 10 of: (2 - |10 - 8.2|)/22 = 0.05.
Thus the smoothed pdf at 10 is: (.05 + .2 + .475 + .225)/20 = 0.0475.
6.12. A. Only those times within 1.5 of 4 contribute. There are 3 such times: 3, 4, and 5.
The uniform kernel of bandwidth 1.5 has density 1/3.
p(3) = 10/100 = .1. p(4) = 8/100 = .08. p(5) = 3/100 = .03.
f (4) = Σ p(yj) kyi (4) = (.1)(1/3) + (.08)(1/3) + (.03)(1/3) = 0.07.

^
Comment: The percentage of the original 100 that are closed in year 3 is 10/100 = 10%.
We divide by 100 rather than the 82 claims that are closed by time 8.
We assume that the remaining 18 claims will be closed after time 8.
This is a similar idea to having 18 out 100 lives still alive at age 80.
We would assume they would die at some time after age 80.
6.13. B. Only those times within 1.5 of 5 contribute. There are 3 such times.
The triangular kernel centered at 4 has density at 5 of: (1.5 - |5 - 4|)/1.52 = 0.2222.
p(4) = 8/100 = .08. p(5) = 3/100 = .03. p(6) = 4/100 = .04.
f (5) = Σ p(yj) kyi (5) = (.08)(.2222) + (.03)(.6667) + (.04)(.2222) = 0.0467.

^
6.14. The value 1000 appears twice, so the corresponding kernel gets twice the weight.
Uniform[300, 700]/4 + Uniform[800, 1200]/2 + Uniform[1300, 1500]/4.
0.0012
0.001
0.0008
0.0006
0.0004
0.0002
500 1000 1500 2000
Comment: With a small bandwidth, the three uniform kernels do not overlap. At each of 6 points:
300, 700, 800, 1200, 1300, and 1700, the kernel smoothed density is discontinuous.
6.15.
0.0007
0.0006
0.0005
0.0004
0.0003
0.0002
0.0001
500 1000 1500 2000

Comment: At 4 points: 0, 500, 1500, and 2000, the kernel smoothed density is discontinuous.
6.16.
0.0006
0.0005
0.0004
0.0003
0.0002
0.0001
-5 0 0 500 1000 1500 2000 2500

Comment: At each of 6 points: -300, 200, 700, 1300, 1800, and 2300, the kernel smoothed
density is discontinuous.
6.17. The value 1000 appears twice, so the corresponding kernel gets twice the weight.
Triangle[300, 700]/4 + Triangle[800, 1200]/2 + Triangle[1300, 1500]/4.
0.0025
0.002
0.0015
0.001
0.0005
500 1000 1500 2000
Comment: With a small bandwidth, the three triangular kernels do not overlap.
The slope changes at: 300, 500, 700, 800, 1000, 1200, 1300, 1500, and 1700.
At each of these 9 points the kernel smoothed density is continuous, but not differentiable.
6.18. The three separate triangles are as follows:

0.002 0.002
0.0015 0.0015
0.001 0.001
0.0005 0.0005
500 1000 1500 2000 500 1000 1500 2000

0.002
0.0015
0.001
0.0005
500 1000 1500 2000

Weighting these three densities together, with weights 1/4, 1/2, and 1/4 gives:
0.001
0.0008
0.0006
0.0004
0.0002
500 1000 1500 2000

Comment: It turns out that in this special case, the three triangles when weighted together form one
large triangle. The slope changes at: 0, 1000, and 2000. At each of these 3 points the kernel
smoothed density is continuous, but not differentiable.
6.19.
0.0008
0.0006
0.0004
0.0002
-5 0 0 500 1000 1500 2000 2500

Comment: The slope changes at: -300, 200, 500, 700, 1000, 1300, 1500, 1800, and 2300.
At each of these 9 points the kernel smoothed density is continuous, but not differentiable.
6.20. The mean of the data is 1000. Thus the biased estimate of the variance is:
{(500 - 1000)2 + (2)(1000 - 1000)2 + (1500 -1000)2 } / 4 = 125,000.
The first Pareto is such that: θ/(4 -1) = 500. ⇒ θ = 1500.
The second moment is: (2)(15002 ) / {(4-1)(4-2)} = 750,000.
Variance is: 750,000 - 5002 = 500,000.
The second Pareto is such that: θ/(4 -1) = 1000. ⇒ θ = 3000.
The second moment is: (2)(30002 ) / {(4-1)(4-2)} = 3,000,000.
Variance is: 3,000,000 - 10002 = 2,000,000.
The third Pareto is such that: θ/(4 -1) = 1500. ⇒ θ = 4500.
The second moment is: (2)(45002 ) / {(4-1)(4-2)} = 6,750,000.
Variance is: 6,750,000 - 15002 = 4,500,000.
Average variance of the Pareto kernels is:
{500,000 + (2)(2,000,000) + 4,500,000} / 4 = 2,250,000.
The variance of the kernel smoothed density is the variance of the data (the variance of the empirical
distribution function) plus the (average) variance of the kernel:
125,000 + 2,250,000 = 2,375,000. ⇒ Standard deviation is: 1541.
Alternately, for the Pareto Distribution the CV2 = α / (α-2) = 4/(4-2) = 2. Thus variance = 2 mean2 .
Thus the Paretos have variances of: (2)(5002 ), (2)(1002 ), (2)(15002 ). Proceed as before.
6.21. B. The empirical model is: 1/3 at 20, 1/6 at 32, 1/6 at 44, and 1/3 at 50.
For Colonel Klink, using a bandwidth of 14: The uniform kernel centered at 20 stretches from 6 to 34,
and is all to the left of 35, so it contributes nothing to S(35). The uniform kernel centered at 32
stretches from 18 to 46, and (46-35)/28 = 11/28 is to the right of 35 and contributes to S(35). The
uniform kernel centered at 44 stretches from 30 to 58, and (58-35)/28 = 23/28 is to the right of 35
and contributes to S(35). The uniform kernel centered at 50 stretches from 36 to 64, and all of it is to
the right of 35 and contributes to S(35).
Colonel Klink's estimate of S(35) is:
(0)(1/3) + (11/28)(1/6) + (23/28)(1/6) + (1)(1/3) = 90/168 = 15/28.
For Colonel Sanders, using a bandwidth of 8: The uniform kernel centered at 20 is all to the left of
35, so it contributes nothing to S(35). The uniform kernel centered at 32 stretches from 24 to 40, and
(40-35)/16 = 5/16 is to the right of 35 and contributes to S(35). The uniform kernel centered at 44
stretches from 36 to 52, and all of it is to the right of 35 and contributes to S(35). The uniform kernel
centered at 50 stretches from 42 to 58, and all of it is to the right of 35 and contributes to S(35). For
Colonel Sanders, his estimate of S(35) is: (0)(1/3) + (5/16)(1/6) + (1)(1/6) + (1)(1/3) = 53/96.
The difference between the estimates of Colonel Klink and Colonel Sanders is:
15/28 - 53/96 = (360 - 371)/672 = -11/672 = -0.0164.
6.22. The variance of the kernel smoothed density is the variance of the data (the variance of the
empirical distribution function) plus the variance of the kernel. Therefore, the difference between the
variance of the empirical distribution function and the variance of the triangular kernel smoothed
density with bandwidth b is minus the variance of the triangular kernel: -b2 /6.
Comment: If you do not remember the variance of the triangular kernel, one can compute it via
integration.
6.23. C. We want the largest number of values within 10 of x, since only these values contribute.
x = 10: [0, 20] has 6 values.
x = 20: [10, 30] has 7 values.
x = 30: [20, 40] has 8 values.
x = 40: [30, 50] has 7 values.
x = 50: [40, 60] has 7 values.
Comment: Values exactly 10 from x contribute to the uniform kernel density estimate of f(x).
6.24. B. In each case, we need to find what portion of the area of the uniform kernel centered at one
of the sample points is to the left of a certain value.
For each possibility, all of the area is included for the kernel centered at 9.
The kernel centered at 15 goes from 11 to 19.
For F(17.5), 6.5/8 of the area of the kernel centered at 15 is included, 3.5/8 of the area of the kernel
centered at 18 is included, 1.5/8 of the area of the kernel centered at 20 is included, and 0.5/8 of the
area of the kernel centered at 21 is included.
F(17.5) = (1/5)(1 + 6.5/8 + 3.5/8 + 1.5/8 + 0.5/8) = (1/5)(20/8) = 0.5.
Thus 17.5 is the estimated median.
Comment: Trying the other choices, the distribution function is either too small or too big.
For example, for F(17), 6/8 of the area of the kernel centered at 15 is included, 3/8 of the area of the
kernel centered at 18 is included, 1/8 of the area of the kernel centered at 20 is included, and none of
the area of the kernel centered at 21 is included.
F(17) = (1/5)(1 + 6/8 + 3/8 + 1/8) = (1/5)(18/8) = 0.45.
Similarly, F(18) = (1/5)(1 + 7/8 + 4/8 + 2/8 + 1/8) = (1/5)(22/8) = 0.55.
6.25. The variance of the kernel smoothed density is the variance of the data (the variance of the
empirical distribution function) plus the variance of the kernel. Therefore, the difference between the
variance of the empirical distribution function and the variance of the uniform kernel smoothed density
with bandwidth b is minus the variance of the uniform kernel: -(2b)2 /12 = -b2 /3.
Comment: If you remember that the result does not depend on the sample size n, then you can
take n = 1. For n = 1, the variance of the empirical distribution function is zero. For n = 1, the kernel
smoothed density is just a single uniform distribution with width 2b.
6.26. With 10 data points, the empirical model assigns 1/10 to each data point; with instead 11
data points, the empirical model assigns 1/11 to each data point. Thus the contribution to F(33) from
the old data points is now: (10/11)(0.7375).
2/8 of the uniform kernel centered at 35 is to the left of 33.
31 33 35 39
The contribution of 35 to F(33) is: (2/8)/11.
Thus, the new kernel smoothed distribution function at 33 is: (10/11)(0.7375) + (2/8)/11 = 0.6932.
6.27. E. The mean of the data is 120. The second moment of the data is 17,866.67.
The variance of the data is: 17,866.67 - 1202 = 3466.67.
The given kernel is a uniform distribution with a total width of 50, except not centered at the data
point y. Its variance is: 502 /12 = 208.33.
kernel: 3466.67 + 208.33 = 3675.
Comment: Similar to 2014 Exam C Sample Q.300.
In Section 12.3, Loss Models specifies that Ky(x) be a continuous distribution with mean y.
However, in this case, the mean of each kernel is y - 5.
Thus the kernel smoothed density at x - 5 is the same as it would be at x for a uniform kernel with the
same width. While the mean of the kernel smoothed density is affected, subtracting a constant from
a variable does not affect its variance.
6.28. C. Each Exponential Kernel has its mean equal to the corresponding data point.
The empirical model is: 1/6, 1/6, 1/3, 1/6, 1/6.
S(100) = e-100/30/6 + e-100/50/6 + e-100/70/3 + e-100/120/6 + e-100/200/6 = 0.282.
Comment: A Gamma kernel with α = 1.
6.29. D. 1. True. 2. False. As α increases, the amount of smoothing decreases.

3. True. Each exponential kernel (gamma kernel with α = 1) has a mode of zero, therefore so does
the smoothed density which is an average of the individual kernels.
6.30. A. Only those times within 5 of 20 contribute. There are 3 such times out of 8.
Each uniform kernel has a height of: 1/{(2)(5)} = 1/10. So the p.d.f. at 20 is: (3/8)(1/10) = 0.0375.
6.31. D. Only those times within 5 of 20 contribute. There are 3 such times out of 8.
The triangular kernel centered at 17 has density at 20 of: (5 - |20 - 17|)/52 = 0.08.
Thus the smoothed pdf at 20 is: (.08 + .16 + .12)/8 = 0.045.
Comment: The three triangles that contribute to the kernel smoothed density at 20 are:
0.2
0.15
0.1
0.05
15 20 25 30
0.2
0.15
0.1
0.05
15 20 25 30
0.2
0.15
0.1
0.05
15 20 25 30
Each triangular kernel has base of (2)(5) = 10 and height of 1/5 = .2, for an area of one.
For example, the triangle centered at 22, has height at 20 of: (3/5)(.2) = .12.
The entire kernel smoothed density is an average of 10 triangular kernels:
0.04
0.03
0.02
0.01
10 20 30 40
6.32. E. The mean of the data is: 18.5. The second moment of the data is: 402.25.
Variance of the data is: 402.25 - 18.52 = 60.
The variance of the uniform kernel is: width2 /12 = 202 /12 = 100/3.
Variance of the kernel smoothed density is: 60 + 100/3 = 93.33.
6.33. B. The mean of the data is: 18.5. The second moment of the data is: 402.25.
Variance of the data is: 402.25 - 18.52 = 60.
10
∫0 x2 (10 - x) dx
The variance of the triangle kernel is: = (10000/3 - 10000/4)/(100/2) = 50/3.
10
∫ (10 - x) dx
0
Variance of the kernel smoothed density is: 60 + 50/3 = 76.67.

Comment: The variance of the triangle kernel is: b2 /6 = 102 /6.
6.34. A. We need to determine how much of each kernel is to the left of 15.
The kernel centered at 7 is all to the left of 15.
The kernel centered at 11, goes from 6 to 16, and is 90% to the left of 15.
The remaining kernels are all to the right of 15.
The kernel density estimate of F(15) is: (1 + 0.9 + 0.7 + 0.3 + 0.1)/8 = 0.375.
6.35. C. We need to determine how much of each kernel is to the left of 16.
The kernel centered at 7 is all to the left of 16. The kernel centered at 11 is all to the left of 16.
The kernel centered at 13, goes from 8 to 18, and is: 1 - (2)(0.08)/2 = 92% to the left of 16.
0.2
0.08
13 16
8 18
The kernel centered at 17, goes from 12 to 22, and is: (4)(0.16)/2 = 32% to the left of 16.
0.2
0.16
16 17
12 22
The kernel centered at 19, goes from 14 to 24, and is: (2)(0.08)/2 = 8% to the left of 16.
0.2
0.08
16 19
14 24
The remaining kernels are all to the right of 16.
The kernel density estimate of F(16) is: (1 + 1 + 0.92 + 0.32 + 0.08)/8 = 0.415.
6.36. C. The density of a Gamma is: f(x) = θ−αxα−1 e−x/θ/Γ(α) = θ-5x4 e−x/θ / 24.
f(20) = 6666.67 e−20/θ /θ5 . For the kernel with mean at point y, θ = y/α = y/5.
f(20; y) = 20,833,333 e−100/y /y5 . f(20; 7) = 0.00077. f(20; 11) = 0.01458. f(20; 13) = 0.02560.
f(20; 17) = 0.04091. f(20; 19) = 0.04357. f(20; 22) = 0.04291. f(20; 28) = 0.03403.
f(20; 31) = 0.02891.
(0.00077 + 0.01458 + 0.02560 + 0.04091 + 0.04357 + 0.04291 + 0.03403 + 0.02891)/8 =
0.02891.
6.37. B. The density of a Pareto is: f(x) = α θα / (x + θ)α+1 = 5 θ5 / (x + θ)6 .

f(20) = 5 θ5 / (20 + θ)6 . For the kernel with mean at point y, θ = y(α−1) = 4y.
f(20; y) = 5 45 y5 / (20 + 4y)6 = 1.25 y5 / (5 + y)6 . f(20; 7) = 0.00704. f(20; 11) = 0.01200.
f(20; 13) = 0.01365. f(20; 17) = 0.01565. f(20; 19) = 0.01620. f(20; 22) = 0.01663.
f(20; 28) = 0.01666. f(20; 31) = 0.01644.
(0.00704 + 0.01200 + 0.01365 + 0.01565 + 0.01620 + 0.01663 + 0.01666 + 0.01644)/8 =
0.01428.
6.38. C. The distribution of a Pareto is: F(x) = 1 - θα / (x + θ)α = 1 - 1/ (1 + x/θ)5 .

S(20) = 1/ (1 + 20/θ)5 . For the kernel with mean at point y, θ = y(α−1) = 4y.
S(20; y) = 1/ (1 + 5/y)5 . S(20; 7) = 0.06754. S(20; 11) = 0.15359. S(20; 13) = 0.19650.
S(20; 17) = 0.27551. S(20; 19) = 0.31096. S(20; 22) = 0.35917. S(20; 28) = 0.43976.
S(20; 31) = 0.47347.
The smoothed survival function at 20 is:
(0.06754 + 0.15359 + 0.19650 + 0.27551 + 0.31096 + 0.35917 + 0.43976 + 0.47347)/8 =
0.28456. The smoothed distribution function at 20 is: 1 - 0.28456 = 0.71544.
6.39. B. Since there are an even number of items in the sample, the empirical median is:
(17 + 19)/2 = 18.
The contributions to F(18) are for the various uniform kernels:
1, 1, 1, 6/10, 4/10, 1/10, 0, 0.
^
Thus for the kernel smoothed distribution, F (18) = (1 + 1 + 1 + 0.6 + 0.4 + 0.1)/8 = 4.1/8 > 0.5.
Thus the median of the kernel smoothed distribution is less than 18.
^
For x = 18 - c, where c is small, the contributions to F (18-c) are for the various kernels:
1, 1, 1 - c/10, (6-c)/10, (4-c)/10, (1-c)/10, 0, 0.
^
Thus, F (18-c) = (4.1 - 4c/10)/8 = 0.5125 - c/20.
^
Setting F (18-c) equal to 0.5: 0.5 = 0.5125 - c/20. ⇒ c = 0.25.
^
Thus F (17.75) = 0.5, and 17.75 is the median of the kernel smoothed distribution.
^
Comment: F (17.75) = (1 + 1 + 0.975 + 0.575 + 0.375 + 0.075 + 0 + 0) / 8 = 0.5.
6.40. E. For each data point, we take a Normal Distribution with mean equal to that data point.
For example, for the data point 22, F(20) = Φ[(20 - 22)/5] = Φ[-0.4] = 0.3446.
The estimate of F(20) is:
(1/8) {Φ[20 - 7)/5] + Φ[(20 - 11)/5] + Φ[(20 - 13)/5] + Φ[(20 - 17)/5]
+ Φ[(20 - 19)/5] + Φ[(20 - 22)/5] + Φ[(20 - 28)/5] + Φ[(20 - 31)/5]} =
(1/8) {Φ[2.6] + Φ[1.8] + Φ[1.4] + Φ[0.6] + Φ[0.2] + Φ[-0.4] + Φ[-1.6] + Φ[-2.2]} =
(1/8) {0.9953 + 0.9641 + 0.9192 + 0.7257 + 0.5793 + 0.3446 + 0.0548 + 0.0139} =
0.5746.
Comment: Loss Models does not mention using the Normal Distribution as a kernel, but I have
used it in a manner parallel to the Gamma or the Pareto distributions.
One could use the density of the Normal to estimate the probability density function.
6.41. B.
The uniform kernel centered at 500 has all of its area to the right of 250, so E[X ∧ 250] = 250.
The uniform kernel centered at 300:
150 250 300 450

It has 1/3 of its area to the left of 250, and 2/3 to the right of 250.
Thus for this kernel, E[X ∧ 250] = (1/3)(200) + (2/3)(250) = 233.333.
The uniform kernel centered at 200:
50 200 250 350

It has 2/3 of its area to the left of 250, and 1/3 to the right of 250.
Thus for this kernel, E[X ∧ 250] = (2/3)(150) + (1/3)(250) = 183.333.
The overall limited expected value is a weighted average of that for the individual kernels:
(2)(183.333) + 233.333 + 250
E[X ∧ 250] = = 212.5.
2 + 1 + 1
6.42. C.
The triangle kernel centered at 500 has all of its area to the right of 250, so E[X ∧ 250] = 250.
The triangle kernel centered at 300:
450
150 250 300
The area of the triangle to the left of 250 is: (1/2)(100/150)2 = 2/9.
The average of this right triangle is: 250 - 100/3 = 216.667.
Thus E[X ∧ 250] = (2/9)(216.667) + (1 - 2/9)(250) = 242.593.
The triangle kernel centered at 200:
350
50 200 250
2
The area of the triangle to the right of 250 is: (1/2)(100/150) = 2/9.
The average of this right triangle is: 250 + 100/3 = 283.333.
Each value greater than 250 contributes 250 rather than its value to E[X ∧ 250] .
Thus E[X ∧ 250] = E[X] - (2/9)(283.333 - 250) = 200 - 7.407 = 192.593.
The overall limited expected value is a weighted average of that for the individual kernels:
(2)(192.593) + 242.593 + 250
E[X ∧ 250] = = 219.445.
2 + 1 + 1
⎧ 0, for x < 50
⎪
6.43. For the uniform kernel centered at 200, F(x) = ⎨(x - 50) / 300, for 50 ≤ x ≤ 350
⎪ 1, for x > 350
⎩
⎧ 0, for x < 150

⎪
For the uniform kernel centered at 300, F(x) = ⎨(x - 150) / 300, for 150 ≤ x ≤ 450
⎪ 1, for x > 450
⎩
⎧ 0, for x < 350

⎪
For the uniform kernel centered at 500, F(x) = ⎨(x - 350) / 300, for 350 ≤ x ≤ 650
⎪ 1, for x > 650
⎩
The sample is: 200, 200, 300, 500.

Thus the empirical model assigns 1/2 to 200, 1/4 to 300, and 1/4 to 500.
Thus the kernel smoothed distribution function is:
For x ≤ 50, F(x) = 0.
For 50 ≤ x ≤ 150, F(x) = (1/2)(x-50)/ 300 = x/600 - 1/12.
For 150 ≤ x ≤ 350, F(x) = (1/2)(x-50)/ 300 + (1/4)(x-150)/ 300 = x/400 - 5/24.
For 350 ≤ x ≤ 450, F(x) = (1/2)(1) + (1/4)(x-150)/ 300 + (1/4)(x-350)/ 300 = x/600 + 1/12.
For 450 ≤ x ≤ 650, F(x) = (1/2)(1) + (1/4)(1) + (1/4)(x-350)/ 300 = x/1200 + 11/24.
For x ≥ 650, F(x) = 1.
⎧ 0, for x ≤ 50
⎪ x / 600 - 1/ 12, for 50 ≤ x ≤ 150
⎪
⎪ x / 400 - 5 / 24, for 150 ≤ x ≤ 350
F(x) = ⎨ .
⎪ x / 600 + 1/ 12, for 350 ≤ x ≤ 450
⎪x / 1200 + 11/ 24, for 450 ≤ x ≤ 650
⎪
⎩ 1, for x ≥ 650
Comment: A graph of the uniform kernel smoothed distribution function:
F(x)
1
5
6
2
3
1
6
x
50 150 350 450 650
6.44. The triangle kernel centered at 200 has f(x) = (x - 50)/1502 , for 50 ≤ x ≤ 200.
Therefore, F(x) = (x - 50)2 /45,000, for 50 ≤ x ≤ 200.
F(200) = 1/2. f(x) = (350 - x)/1502 , for 200 ≤ x ≤ 350.
Therefore, F(x) = 1 - (350 - x)2 /45,000, for 200 ≤ x ≤ 350.
⎧ 0, for x ≤ 50
⎪ (x - 50)2 / 45,000, for 50 ≤ x ≤ 200
⎪
For the triangle kernel centered at 200, F(x) = ⎨
⎪1 - (350 - x) / 45,000, for 200 ≤ x ≤ 350
2
⎩⎪ 1, for x ≥ 350
⎧ 0, for x ≤ 150
⎪ (x - 150)2 / 45,000, for 150 ≤ x ≤ 300
⎪
⎪1 - (450 - x) / 45,000, for 300 ≤ x ≤ 450
2
⎩⎪ 1, for x ≥ 450
⎧ 0, for x ≤ 350
⎪ (x - 350)2 / 45,000, for 350 ≤ x ≤ 500
⎪
⎪1 - (650 - x) / 45,000, for 500 ≤ x ≤ 650
2
⎩⎪ 1, for x ≥ 650
The sample is: 200, 200, 300, 500.

Thus the empirical model assigns 1/2 to 200, 1/4 to 300, and 1/4 to 500.
Thus the kernel smoothed distribution function is:
For x ≤ 50, F(x) = 0.
For 50 ≤ x ≤ 150, F(x) = (x - 50)2 /90,000.
For 150 ≤ x ≤ 200, F(x) = (x - 50)2 /90,000 + (x - 150)2 /180,000.
For 200 ≤ x ≤ 300, F(x) = 1/2 - (350 - x)2 /90,000 + (x - 150)2 /180,000.
For 300 ≤ x ≤ 350, F(x) = 3/4 - (350 - x)2 /90,000 - (450 - x)2 /180,000.
For 350 ≤ x ≤ 450, F(x) = 3/4 - (450 - x)2 /180,000 + (x - 350)2 /180,000.
For 450 ≤ x ≤ 500, F(x) = 3/4 + (x - 350)2 /180,000.
For 500 ≤ x ≤ 650, F(x) = 1 - (650 - x)2 /180,000.
For x ≥ 650, F(x) = 1.
Comment: A graph of the triangle kernel smoothed distribution function:
F(x)
1
7
8
29
36
25
36
43
72
19
72
1
9
x
50 150 200 300 350 450 500 650
6.45. C. For a bandwidth of 50, only kernels centered at points within 50 of 100 contribute to the
estimate of f(100).
3 ⎧ ⎛ 100 - 70⎞ 2 ⎫
k70(100) = ⎨1 - ⎬ = 0.0096.
200 ⎩ ⎝ 50 ⎠ ⎭
3 ⎧ ⎛ 100 - 120 ⎞ 2 ⎫
k120(100) = ⎨1 - ⎬ = 0.0126.
200 ⎩ ⎝ 50 ⎠ ⎭
0 + 0 + 0.0096 + 0.0126 + 0
Thus the estimate of f(100) = = 0.00444.
5
Comment: Here is a graph of the kernel smoothed density:
density
0.007
0.006
0.005
0.004
0.003
0.002
0.001
size
0 50 100 150 200 250
6.46. E. For a bandwidth of 50, kernels centered at any point less than or equal to 100 - 50
contribute 1 to F(100). Kernels centered at any point greater than or equal to 100 + 50 contribute 0
to F(100).
3 ⎧ ⎛ x - 70⎞ 2 ⎫
k70(x) = ⎨1 - ⎬ , 20 ≤ x ≤ 120.
200 ⎩ ⎝ 50 ⎠ ⎭
The contribution from the kernel centered at 70 to F(100) is the area to the left of 100, which is:
100
⎧ ⎛ x - 70 ⎞ 2 ⎫ (100 - 70)3 (20 - 70)3
∫
3 3
⎨1 - ⎬ dx = {80 - + } = 0.896.
200 ⎩ ⎝ 50 ⎠ ⎭ 200 (3) (502 ) (3) (502)
20
3 ⎧ ⎛ x - 120 ⎞ 2 ⎫
k120(x) = ⎨1 - ⎬ , 70 ≤ x ≤ 170.
200 ⎩ ⎝ 50 ⎠ ⎭
The contribution from the kernel centered at 70 to F(100) is the area to the left of 100, which is:
100
⎧ ⎛ x - 120 ⎞ 2 ⎫ (100 - 120) 3 (70 - 120)3
∫
3 3
⎨1 - ⎬ dx = {30 - + } = 0.216.
200 ⎩ ⎝ 50 ⎠ ⎭ 200 (3) (50 2) (3) (502 )
70
1 + 1 + 0.896 + 0.216 + 0
Thus the estimate of F(100) = = 0.6224.
5
Comment: In general, for an Epanechnikov kernel centered at y, the contribution to F(x) is:
⎧ 0, for x ≤ y - b
⎪
⎪⎪
1 3 (x - y) (x - y)3
⎨ + - , for y - b < x < y + b .
⎪2 4b 4 b3
⎪
⎩⎪ 1, for x ≥ y + b
6.47. D. Only two of the four data points are within 5 of 10.
ky(x) = g(x - y) = 0.1 + (x-y)3 /2000, y-5 ≤ t ≤ y+5.
k6 (10) = 0.1 + (10 - 6)3 / 2000 = 0.132.
k13(10) = 0.1 + (10 - 13)3 / 2000 = 0.0865.

Thus the kernel smoothed density at 10 is: (0.132 + 0.0865) / 4 = 0.054625.
Comment: Not a sensible kernel to use for practical applications.
ky(x) is the density at x for a kernel “centered” at y.
In this case, the mean of the kernel is not y.
6.48. C. All of the kernel centered at the data point 4 is to the left of 10.
None of the kernel centered at the data point 17 is to the left of 10.
k6 (x) = 0.1 + (x - 6)3 / 2000, 1 < x < 11.
10 10
K6 (10) = ∫1 k6(t) dt = (9)(0.1) +
∫1 (t - 6)3 / 2000 dt = 0.9 - 0.46125 = 0.853875.
k13(x) = 0.1 + (x - 13)3 / 2000, 8 < x < 18.

10 10
K13(10) = ∫8 k13(t) dt = (2)(0.1) +
∫8 (t - 13)3 / 2000 dt = 0.2 - 0.0680 = 0.132.
Thus the kernel smoothed distribution at 10 is: (1 + 0.853875 + 0.132 + 0) / 4 = 0.4964.
6.49. E. The empirical model is: 1/2 @20, 1/3 @30, and 1/6 @45.
g(28 - 20) = g(8) = {1 - (8/15)2 }2 / 16 = 0.03200.
g(28 - 30) = g(-2) = {1 - (-2/15)2 }2 / 16 = 0.06030.
g(28 - 45) = g(-17) = 0.
Thus the kernel smoothed density at 28 is: (1/2)(0.03200) + (1/3)(0.06030) = 0.0361.
Comment: A Biweight kernel with bandwidth 15.
6.50. D. The empirical model is: 1/2 @20, 1/3 @30, and 1/6 @45.
g(t) = (1 - 2t2 /225 + t4 /50,625) / 16, -15 ≤ t ≤ 15.
Integrating g(s) from -15 to t:
G(t) = t/16 - t3 /5400 + t5 /4,050,000 + 15/16 - 153 /5400 + 155 /4,050,000 =
0.5 + t/16 - t3 /5400 + t5 /4,050,000.
The kernel centered at 20 is all to the left of 40, so its contribution is one to the distribution.
G(40 - 30) = 0.5 + 10/16 - 1000/5400 + 100,000/4,050,000 = 0.9645.
G(40 - 45) = 0.5 - 5/16 + 125/5400 - 3125/4,050,000 = 0.2099.
Thus the kernel smoothed distribution function at 40 is:
(1/2)(1) + (1/3)(0.9645) + (1/6)(0.2099) = 0.8565.
6.51. F(5) = (1 + 1/2 + 1/2)/5 = 0.4. F(6) = (1 + 1 + 1 + 1/2)/5 = 0.7.

Therefore, the median is in between 5 and 6.
For 5 < x < 6, the portion of the triangle centered at 5 that is to the right of x is: (6-x)2 /2.
For 5 < x < 6, the portion of the triangle centered at 6 that is to the left of x is: (x-5)2 /2.
Thus we want: 2.5 = 1 + 2{1 - (6-x)2 /2} + (x-5)2 /2.
⇒ 2.5 = 1 + 2 - 36 + 12x - x2 + x2 /2 - 5x + 25/2.
⇒ x2 /2 - 7x + 23 = 0. ⇒ x = 7 ± 72 - (4)(1/ 2)(23) = 8.732 or 5.268.
Since we want 5 < x < 6, the median is 5.268.
Comment: All of the triangle centered at 2 is to the left of 5.268. This contributes 1/5 to F(5.268).
None of the triangle centered at 10 to the left of 5.268. This contributes 0 to F(5.268).
There are 2 out of five points in the data that are 5.
They contribute to the kernel smoothed density the following triangle centered at 5:
2/5
4 5 6
5.268
Area A to the right of 5.268 is: (1/2) (6 - 5.268)2 (2/5) = 0.1072.

Thus the contribution to F(5.268) is: 2/5 - 0.1072 = 0.2928.
There is 1 out of five points in the data that is 6.
It contributes to the kernel smoothed density the following triangle centered at 6:
1/5
5 6 7
5.268
Area B to the left of 5.268 is: (1/2) (5.268 - 5)2 (1/5) = 0.0072, which is the contribution to F(5.268).
Thus F(5.268) = 1/5 + 0 + 0.2928 + 0.0072 = 0.5000.
5.268 is indeed the median of the kernel smoothed density.
6.52. B. The empirical model is: 1/5@2, 3/5@3, and 1/5@7.

The triangular kernel centered at 2 with bandwidth 2, stretches from 0 to 4.
It has height 1/2 at 2, and thus area (1/2)(4)(1/2) = 1.
Thus it is: (1/2)(3/4) = 3/8 at 2.5.
The triangular kernel centered at 3 with bandwidth 2, stretches from 1 to 5.
It has height 1/2 at 3, and thus area (1/2)(4)(1/2) = 1.
2.5 is 3/4 of the way from 1 to 3, and thus this triangle density is: (1/2)(3/4) = 3/8 at 2.5.
The triangular kernel at 7 with bandwidth 2, stretches from 5 to 9 and does not contribute at 2.5.
The height of the each triangular kernel at 2.5 is weighted by the empirical probability of the
associated point.
Thus the estimated density at 2.5 is: (1/5)(3/8) + (3/5)(3/8) + (1/5)(0) = 3/10 = 12/40.
Comment: For example, here is the triangular kernel centered at 3:
0.5
0.4
0.3
0.2
0.1
1 2 3 4 5 6
Here is the triangular kernel smoothed density:
0.35
0.3
0.25
0.2
0.15
0.1
0.05
2 4 6 8
6.53. D. 2 + 1.4 = 3.4 ≤ 4.

All of the area of the uniform kernel centered at 2.0 is to the left of 4.
2.0 - 1.4 = 0.6 2.0 2.0 + 1.4 = 3.4

Thus the uniform kernel at 2 contributes its full value to F(4).
4 - 1.4 ≤ 3.3 ≤ 4 + 1.4. (4 - 1.9)/2.8 = .75.
3/4 of the area of the uniform kernel centered at 3.3 is to the left of 4:
3.3 - 1.4 = 1.9 3.3 4.0 3.3 + 1.4 = 4.7

The uniform kernel centered at 3.3 contributes 75% of its value to F(4).
Half of the area of the uniform kernel centered at 4 is to the left of 4:
4.0 - 1.4 = 2.6 4.0 4.0 + 1.4 = 5.4

The uniform kernel centered at 4 contributes 50% of its value to F(4).
(4.0 - 3.3)/2.8 = .25 ⇒ 1/4 of the area of the uniform kernel centered at 4.7 is to the left of 4:
4.7 - 1.4 = 3.3 4.0 4.7 4.7 + 1.4 = 6.1

The uniform kernel centered at 4.7 contributes 25% of its value to F(4).
The sample was: 2.0, 3.3, 3.3, 4.0, 4.0, 4.7, 4.7, 4.7. For the empirical model we assign:
1/8 probability to 2, 2/8 probability to 3.3, 2/8 probability to 4, and 3/8 probability to 4.7.
The kernel density estimate of F(4) is: (1)(1/8) + (.75)(2/8) + (.5)(2/8) + (.25)(3/8) = 0.53125.
6.54. B. 3 - 1.4 ≤ 2 ≤ 3 + 1.4. (4.4 - 2)/2.8 = 6/7, so the uniform kernel centered at 2 contributes
6/7 of its value to F(3). (4.4 - 3.3)/2.8 = 11/28, so the uniform kernel centered at 3.3 contributes
11/28 of its value to F(3). (4.4 - 4)/2.8 = 1/7, so the uniform kernel centered at 4 contributes 1/7 of
its value to F(3). 4.7 > 3 + 1.4, so the the uniform kernel centered at 4.7 contributes nothing to F(3).
The kernel density estimate of F(3) is:
(6/7)(1/8) + (11/28)(2/8) + (1/7)(2/8) + (0)(3/8) = 0.241.
Comment: Here is a graph of the kernel smoothed distribution:
1
0.8
0.6
0.4
0.2
1 2 3 4 5 6 7
6.55. D. The kernel is the equation for a semicircle of radius 1. Thus the kernels are three semicircles
centered at 1, 3, and 5, each with radius 1. However, the value 3 appears twice in the sample. Thus
the kernels are multiplied by 1/4, 1/2, and 1/4, with the figure centered at 3 having twice the height of
the others, as in figure D.
6.56. B. The Empirical Model is 1/10 @ 1, 1/10 @ 2, and 8/10 @ 3.

Bandwidth 1, means each uniform kernel has height 1/2 and stretches ±1 from each point.
Using the uniform kernel with bandwidth 1, the contributions to the smoothed density are:
first kernel: (.1)(1/2) = 0.05, 0 ≤ x ≤ 2.
second kernel: (.1)(1/2) = 0.05, 1 ≤ x ≤ 3.
third kernel: (.8)(1/2) = 0.40, 2 ≤ x ≤ 4.
Therefore, using the uniform kernel with bandwidth 1, the smoothed density is:
0.05, for 0 ≤ x ≤ 1. 0.05 + 0.05 = 0.1, for 1 ≤ x ≤ 2. 0.05 + 0.40 = 0.45, for 2 ≤ x ≤ 3.
(0.8)(1/2) = 0.40, for 3 ≤ x ≤ 4.
f(x)
0.45
0.4
0.1
0.05
x
1 2 3 4
Thus the distribution function is:
0.05x for 0 ≤ x ≤ 1, 0.05 + 0.1(x - 1) = 0.1x - 0.05 for 1 ≤ x ≤ 2,
0.15 + 0.45(x - 2) = 0.45x - 0.75 for 2 ≤ x ≤ 3, 0.6 + 0.4(x - 3) = 0.4x - 0.6 for 3 ≤ x ≤ 4,
and 1 for x > 4.
F(x)
1.0
0.8
0.6
0.4
0.2
x
1 2 3 4 5
The triangular kernel with bandwidth one has height 1 and stretches ±1 from each point.
Using the triangular kernel with bandwidth 1, the contributions to the smoothed density are:
first kernel: 0.1x, 0 ≤ x ≤ 1, and 0.1(2 - x), 1 ≤ x ≤ 2.
second kernel: 0.1(x - 1), 1 ≤ x ≤ 2, and 0.1(3 - x), 2 ≤ x ≤ 3.
third kernel: 0.8(x - 2), 2 ≤ x ≤ 3, and 0.8(4 - x), 3 ≤ x ≤ 4.
Therefore, the smoothed density is:
0.1x for 0 ≤ x ≤ 1, 0.1(2 - x) + 0.1(x - 1) = 0.1 for 1 ≤ x ≤ 2,
0.1(3 - x) + 0.8(x - 2) = 0.7x - 1.3 for 2 ≤ x ≤ 3,
0.8(4 - x) = -0.8x + 3.2, 3 ≤ x ≤ 4.
f(x)
0.8
0.6
0.4
0.2
x
1 2 3 4
Therefore, integrating the smoothed density, the smoothed distribution function is:
0.05x2 for 0 ≤ x ≤ 1, 0.05 + 0.1(x-1) = 0.1x - 0.05 for 1 ≤ x ≤ 2, 0.35x2 - 1.3x + 1.35 for 2 ≤ x ≤ 3,
-0.4x2 + 3.2x - 5.4 , 3 ≤ x ≤ 4, and 1 for x > 4.
F(x)
1.0
0.8
0.6
0.4
0.2
x
1 2 3 4 5
^ ^
Therefore, F 1 (x) = F 2 (x) for 1 ≤ x ≤ 2.
Comment: The intervals in the choices should have included their endpoints. Note that on the interval
from 1 to 2, the decline in the contribution from the first triangular kernel is exactly canceled by the
increase in the contribution from the second triangular kernel. Therefore, on this interval the triangular
kernel smoothed density is constant. The triangular smoothed distribution function can only equal that
from the uniform kernel on such an interval where the triangular kernel smoothed density is constant. I
do not like this exam question; not only is it long, it seems to mostly test mathematical
cleverness rather than the application of kernel smoothing to actuarial work.
6.57. E. For the uniform kernel centered at each point, we need to compute what percent is to the
left of 40. For example, the kernel centered at 35, goes from 25 to 45, and 75% contributes to
F(40).
25 30 35 35 37 39 45 47 49 55
% contribution 100% 100% 75% 75% 65% 55% 25% 15% 5% 0
(1/10)(100% + 100% + 75% + 75% + 65% + 55% + 25% + 15% + 5% + 0) = 0.515.
S(40) = 1 - 0.515 = 0.485.
6.58. C. F5 (150) = (# ≤ 150)/(total #) = 2/5 = 0.4.
The uniform kernel from 82 - 50 = 32 to 82 + 50 = 132 is all to the left of 150.

The uniform kernel from 126 - 50 = 76 to 126 + 50 = 176 is 74% to the left of 150.
^
Therefore, F (150) = (1/5)(1) + (1/5)(74%) + (1/5)(39%) + (1/5)(0) + (1/5)(0) = 0.426.
^
| F (150) - F5 (150) | = | 0.426 - 0.4 | = 0.026.
6.59. A. The given distribution function of the kernel is zero at y-1, and one at y+2.
Differentiate and the density of the kernel is 1/3, a constant.
Thus the kernel is uniform from y-1 to y+2.
The mean of this kernel is y + 1/2, rather than y.
Therefore, the mean of the kernel smoothed density is the mean of the data plus 1/2:
5 + 1/2 = 5.5.
Each kernel is uniform of width 3.
Thus each kernel has variance: 32 /12 = 0.75.
The biased variance estimator for the data is: (32 + 02 + 32 )/3 = 6.
The variance of the kernel smoothed density is the (biased) variance of the data plus the (average)
variance of the kernel.
Thus the variance of the kernel smoothed density is: 6 + 0.75 = 6.75.
Thus the coefficient of variation of the kernel smoothed density is: 6.75 / 5.5 = 0.472.
Alternately, K2 (x) is uniform from 1 to 3, K5 (x) is uniform from 4 to 6,
and K8 (x) is uniform from 7 to 10.
The empirical model is 1/3 on each data point.
Thus the kernel smoothed density is:
(1/3) (uniform from 1 to 3) + (1/3) (uniform from 4 to 6) + (1/3) (uniform from 7 to 10) =
uniform from 1 to 10.
Thus the mean of the kernel smoothed density is: (1 + 10)/2 = 5.5.
The variance of the kernel smoothed density is: (10 - 1)2 / 12 = 6.75.
Thus the coefficient of variation of the kernel smoothed density is: 6.75 / 5.5 = 0.472.
Comment: In practical applications there would be no reason to have a kernel whose mean was not
equal to each data point.
It would have been better if the question instead read:
“Calculate the coefficient of variation of the kernel smoothed density.”
2016-C-6, Fitting Loss Dists. §7 Estimation of Percentiles, HCM 10/22/15, Page 134
Section 7, Estimation of Percentiles
For a continuous distribution function, the 95th percentile is x, such that F(x) = 0.95.
Similarly, the 80th percentile is x, such that F(x) = 0.80.
The median is the 50th percentile; at the median the distribution function is 0.5.
Exercise: Let F(x) = 1 - e-x/10. Find the 75th percentile of this distribution.
[Solution: 0.75 = 1 - e-x/10. x = -10ln(1 - 0.75) = 13.86. Check: 1 - e-13.86/10 = 1 - .250 = .75.
As shown in Appendix A: VaRp (X) = -θ ln(1-p). VaR0.75 = - (10) ln(0.25) = 13.86.]
An actuary often wants to estimate the 90th, 95th, or other percentile of the distribution that
generated a data set, without making an assumption as to the type of distribution. For example, for a
data set of 4 losses: 378, 552, 961, 2034, the percentiles are estimated as follows:
378 552 961 2034
20% 40% 60% 80%
Leaving 20% of the probability on either tail; we estimate a 1/(4+1) = 20% chance of a future loss
less than 378 and a 20% chance of a future loss greater than 2034. Similarly, we estimate a 20%
chance of a future loss of size between 961 and 2034.
These four losses divide the real line into 5 intervals, and we assign 20% probability to each of
these 5 intervals.
By using the number of losses plus one, 4 + 1 = 5, rather than the number of losses, 4, in the
denominator, room is left for a future loss bigger than 2034. Using 4 in the denominator, would result
in the 4th loss of 2034 being (incorrectly) used to estimate the 100th percentile; we would
(incorrectly) assume there could never be a loss greater than 2034 in the future. The use of N+1 in
the denominator leaves probability on either tail for future losses larger or smaller than those that
happen to appear in a finite sample.
This method is distribution free. While we are assuming that the data consists of independent
random draws from the same distribution, we have not assumed the losses came from any specific
type of distribution such as a Weibull or Pareto.
One can also estimate percentiles in between the 20th and 80th for this data set of 4 observed
losses of sizes: 378, 552, 961, 2034, as follows using linear interpolation:
30th 50th 70th
465 756.5 1497.5
378 552 961 2034
20th 40th 60th 80th

Exercise: Given 4 observed losses of sizes: 378, 552, 961, 2034, estimate the 65th percentile.
[Solution: The 60th percentile is estimated as 961 and the 80th percentile is estimated as 2034.
Linearly interpolating, the 65th percentile is estimated as:
(3/4)(961) + (1/4)(2034) = 1229.25.]
Alternately, one could have multiplied the percentile times the number of losses + 1 to get:
(0.65)(5) = 3.25. Then the “3.25th loss” is our estimate of the 65th percentile.
The 3.25th loss is linearly interpolated between the sizes of the 3rd and 4th losses:
(3/4)(961) + (1/4)(2034) = 1229.25.
This technique is referred to by Loss Models as the “smoothed empirical estimate” of the
percentile.65 Given a data set with N points, the smoothed empirical estimate of the pth
percentile is the p(N+1) loss from smallest to largest, linearly interpolating between two
loss amounts if necessary.
Exercise: Estimate the 55th percentile of the ungrouped data in Section 2.

[Solution: There are 130 losses. (0.55)(130 + 1) = 72.05. The 72nd loss is 128,000.
The 73rd loss is 131,300. Thus the estimate 55th percentile is by linear interpolation:
(0.95)(128,000) + (0.05)(131,300) = 128,165. ]
Exercise: Given 4 observed losses: 378, 552, 961, 2034, estimate the 90th percentile.
[Solution: The 80th percentile is estimated as 2034, the largest observed loss. Thus all we can say
is the 90th percentile is greater than 2034.]
In this case, one can not estimate percentiles less than 1/5 = 20% or greater than 4/5 = 80%.
In general, with N points, one can not estimate percentiles less than 1/(N+1) or greater than
N/(N+1). While this is significant limitation for the small data sets common in exam questions, it is not
significant for large data sets.
Exercise: Estimate the 95th percentile of the ungrouped data in Section 2.

[Solution: There are 130 losses. (0.95)(130 + 1) = 124.45. The 124th loss is 1,078,800.
The 125th loss is 1,117,600. Thus the estimate 95th percentile is by linear interpolation:
(0.55)(1,078,800) + (0.45)(1,117,600) = 1,096,260. ]
This is how one gets point estimates for percentiles.66
65
66
How to get interval estimates of percentiles is not on the syllabus.
See for example "Mahler's Guide to Statistics," for CAS Exam ST.
Empirical Distribution vs. Smoothed Empirical Estimate of Percentiles:
The smoothed empirical estimate of percentiles differs from the Empirical Distribution Function. The
Empirical Distribution Function at x is defined as:
(# losses ≤ x)/(total # of losses).
Exercise: What is the Empirical Distribution Function for a data set of 4 losses:
378, 552, 961, 2034?
[Solution: The Empirical Distribution Function is 0 for x < 378. It jumps up to 1/4 at 378. It is 1/4 for
378 ≤ x < 552. There is another jump discontinuity at the next observed value of 552. The Empirical
Distribution Function is: 1/2 for 552 ≤ x < 961,
3/4 for 961 ≤ x < 2034, and 1 for 2034 ≤ x. ]
Below are shown the Empirical Distribution Function (thinner horizontal lines) and the smoothed
empirical estimates of percentiles (thicker line segments) for the data set: 378, 552, 961, 2034.
0.75
0.5
0.25
378 552 961 2034

N versus N+1:
For a data set of size N, there is an N in the denominator of the Empirical Distribution Function.67
In contrast, in the smoothed empirical estimate of percentiles, the ith loss is an estimate of the i/(N+1)
percentile; there is N+1 in the denominator.68
In some cases we use in the denominator N, the number of data points,

while in others we use N + 1:
Smoothed empirical estimate of percentiles ⇒ N+1 in the denominator.
p-p plots ⇒ N+1 in the denominator.
Empirical Distribution Function ⇒ N in the denominator.
Kolmogorov-Smirnov Statistic ⇒ N in the denominator.
67
The Empirical Distribution Function is used in the computation of the Kolmogorov-Smirnov Statistic discussed in a
subsequent section.
68
The estimated percentile is also used in the p-p plots, discussed in an subsequent section. In p-p plots there is
N+1 in the denominator of the first component of each point.
Problems:
7.1 (1 point) There are seven losses of sizes: 15, 22, 25, 39, 43, 54, 76.
Estimate the 75th percentile of the size of loss distribution.
A. less than 53
E. at least 59

• The state of Minnehaha requires that each of its towns and cities budget for snow removal,
so as to expect to have enough money to remove all the snow in 19 out of 20 winters
• You are hired as a consultant by the state of Minnehaha to check the snow removal budgets
of each of its towns and cities
• The town of Frostbite Falls, Minnehaha pays $10,000 per inch for snow removal.
• Over the last 127 winters, the ten with the most snow in Frostbite Falls have had the
following numbers of inches: 133, 137, 142, 151, 162, 166, 176, 181, 190, 224.
7.2 (1 point) Determine the single best estimate of how much Frostbite Falls needs to budget for
snow removal for the coming winter.
7.3 (1 point) You are rehired as a consultant by the state of Minnehaha to do the same job one year
later. During the most recent winter, Frostbite Falls had 282 inches of snow!
Determine the revised single best estimate of how much Frostbite Falls needs to budget for snow
removal for the coming winter.
7.4 (2 points) You are given the following random sample of 19 claim amounts from policies with a
limit of 100:
5, 5, 5, 5, 10, 10, 15, 20, 25, 25, 25, 30, 40, 50, 75, 90, 100, 100, 100.
Determine the smoothed empirical estimate of the 70th percentile.
(A) 35 (B) 40 (C) 45 (D) 50 (E) can not be determined
7.5 (2 points) You are given the following random sample of 13 claim amounts:
99 133 175 216 250 277 651 698 735 745 791 906 947
Using the smoothed empirical estimates, to which percentile does 500 correspond?
(A) 45th (B) 46th (C) 47th (D) 48th (E) 49th
7.6 (2 points) You are given the following size of loss data for general liability insurance:
175 200 250 300 300 350 400 450 500 500 550 800 1000 1500
Calculate the smoothed empirical estimate of the 75th percentile.
(A) 550 (B) 600 (C) 650 (D) 700 (E) 750
7.7 (4, 5/86, Q.52) (1 point) You are given the following random sample 1.1, 1.3, 1.8, 2.4, 2.5,
2.6, 2.9, 3.0, 3.2 and 3.7 from an unknown continuous distribution.
Which of the following represents the 90th sample percentile?
A. Undefined B. 3.20 C. 3.25 D. 3.65 E. 3.70
7.8 (4, 5/87, Q.53) (1 point) You are given the following random sample from an unknown
continuous distribution: 34, 61, 20, 16, 91, 85, 6.
What are the 25th and 75th sample percentiles, respectively?
A. 13.4, 67
B. 27.25, 69.75
C. 16, 61
D. 11, 88
7.9 (4B, 11/92,Q.11) (1 point)

A random sample of 20 observations has been ordered as follows:
12, 16, 20, 23, 26, 28, 30, 32, 33, 35,
36, 38, 39, 40, 41, 43, 45, 47, 50, 57
Determine the 60th sample percentile, Π60.
A. 32.4 B. 36.0 C. 38.0 D. 38.4 E. 38.6
7.10 (4B, 5/93, Q.30) (1 point) The following 20 wind losses, recorded in millions of
dollars,occurred in 1992:
1, 1, 1, 1, 1, 2, 2, 3, 3, 4
6, 6, 8, 10, 13, 14, 15, 18, 22, 25
Calculate the 75th sample percentile.
A. 12.25 B. 13.00 C. 13.25 D. 13.75 E. 14.00
7.11 (4B, 11/97, Q.29) (2 points) You wish to calculate the (100p)th sample percentile based on
a random sample of 4 observations. Determine all values of p for which the (100p)th sample
percentile is defined.
A. 0 ≤ p ≤ 1 B. 0.20 ≤ p ≤ 0.80 C. 0.25 ≤ p ≤ 0.75 D. 0.33 ≤ p ≤ 0.67 E. p = 0.50
7.12 (IOA 101, 4/00, Q.1) (1.5 points) Fourteen economists were asked to provide forecasts for
the percentage rate of inflation for the third quarter of 2002.
They produced the forecasts given below.
1.2 1.4 1.5 1.5 1.7 1.8 1.8
1.9 1.9 2.1 2.7 3.2 3.9 5.0
Calculate the median and the 25th and 75th percentiles of these forecasts.
7.13 (4, 5/00, Q.2) (2.5 points) You are given the following random sample of ten claims:
46 121 493 738 775
1078 1452 2054 2199 3207
(A) Less than 2150
(E) At least 3200
7.14 (IOA 101, 4/01, Q.1) (1.5 points) The following amounts are the sizes of claims on
homeowners insurance policies for a certain type of repair.
198 221 215 209 224 210 223 215 203 210
220 200 208 212 216
Determine the smoothed empirical estimates of the 25th percentile (lower quartile), median, and
75th percentile (upper quartile) of these claim amounts.
7.15 (4, 11/02, Q.2 & 2009 Sample Q. 31) (2.5 points) You are given the following claim data for
automobile policies: 200 255 295 320 360 420 440 490 500 520 1020
Calculate the smoothed empirical estimate of the 45th percentile.
(A) 358 (B) 371 (C) 384 (D) 390 (E) 396
7.16 (4, 11/04, Q.2 & 2009 Sample Q.134) (2.5 points)
You are given the following random sample of 13 claim amounts:
99 133 175 216 250 277 651 698 735 745 791 906 947
(A) 219.4 (B) 231.3 (C) 234.7 (D) 246.6 (E) 256.8
7.1. B. 75th percentile is about the (0.75)(7+1) = 6th claim from smallest to largest, which is 54.
7.2. C. The estimate of the 95th percentile is the (127+1)(0.95) = 121.6 winter.
The 121st winter is 151, the 122nd winter is 162. Thus the estimated 95th percentile is:
(0.4)(151) + (0.6)(162) = 157.6 inches. (157.6 inches)($10,000 / inch) = $1.576 million.
7.3. D. We now have a sample size of 128. The estimate of the 95th percentile is the
(128+1)(0.95) = 122.55 winter. The 122nd winter is 162, the 123rd winter is 166.
Thus the estimated 95th percentile is (0.45)(162) + (0.55)(166) = 164.2 inches.
(164.2 inches)($10,000 / inch) = $1.642 million.
Comment: Note that we have relied upon an estimation technique that assumes that each year of
snow is an independent random draw from the same (unknown) distribution. If in fact the amount of
snowfall for two consecutive years is highly correlated, given the very large amount of snow in the
most recent winter, we may have underestimated the budget for the next year. In the case of
correlated years some sort of Simulation Model might be helpful. Insurance losses for
consecutive years tend to be positively correlated. If one has a better than expected year, the
next year is also likely to be better than expected, and vice versa. In any case, relying on a 127
or 128 year long record of weather, treating each year as equally relevant regardless of how
long ago it is, may be a questionable methodology. However, do not worry about all of these
types of issues, when answering what are intended to be straightforward questions on lower
numbered actuarial exams.
7.4. D. p(N+1) = (.7)(19 + 1) = 14. We want the 14th claim from smallest to largest: 50.
Comment: The 16th claim from smallest to largest, 90, is the estimate of the 16/20 = 80th percentile.
Due to the censoring from above, we can not estimate percentiles larger than the 80th.
7.5. C. 277 corresponds to: 6/(13 + 1) = 0.4286.

651 corresponds to: 7/(13 + 1) = 0.5.
500 corresponds to: (0.4286)(651- 500)/(651 - 277) + (0.5)(500 - 277)/(651 - 277) = 0.471.
Comment: Backwards question. (0.47)(13 + 1) = 6.58.
Thus, the smoothed empirical of the 47th percentile is: (0.42)(277) + (.058)(651) = 494.
7.6. B. p(N+1) = (.75)(14 + 1) = 11.25. 11th loss is 550. 12th loss is 800.
estimate of the 75th percentile is: (0.75)(550) + (0.25)(800) = 612.5.
7.7. D. Since one has 10 claims, the 90th percentile is estimated as the (1+10)(0.90) = 9.9th claim.
The 9th claim is 3.2 while the 10th claim is 3.7.
Interpolating linearly, one gets: (0.9)(3.7) + (0.1)(3.2) = 3.65.
7.8. E. Order the data points: 6, 16, 20, 34 , 61, 85, 91. We have 7 data points. The pth
percentile is therefore estimated as the p(7+1) point. Thus the 25th percentile is estimated as the
(0.25)(8) = 2nd point = 16. The 75th percentile is estimated as the (0.75)(8) = 6th point: 85.
7.9. E. With 20 observations one estimates the 60th percentile as the (0.60)(20+1) = 12.6 claim.
The 12th claim (from smallest to largest) is 38, while the 13th claim is 39. Interpolating one estimates
the 60th percentile as: (38)(0.4) + (39)(0.6) = 38.6.
7.10. D. For 20 claims, in order to estimate the 75th percentile, we look at the (20+1)(0.75) =
15.75 claim. The 15th claim is of size 13, while the 16th claim is of size 14.
Linearly interpolating: (1/4)(13) + (3/4)(14) = 13.75.
7.11. B. Let the 4 claims from smallest to largest be: v, x, y, and z. Then v is an estimate of the
100{1/(1+4)} = 20th percentile. x is an estimate of the 100{2/(1+4)} = 40th percentile. y is an
estimate of the 100{3/(1+4)} = 60th percentile. z is an estimate of the 100{4/(1+4)} =
80th percentile. Thus the (100p)th sample percentile is defined for 0.20 ≤ p ≤ 0.80.
7.12. N = 14, so the median is (0.5)(14 + 1) = 7.5th value: (1.8 + 1.9)/2 = 1.85.
25th percentile is (0.25)(14 + 1) = 3.75th value: (0.25)(1.5) + (.75)(1.5) = 1.5.
75th percentile is (0.75)(14 + 1) = 11.25th value: (0.75)(2.7) + (0.25)(3.2) = 2.825.
7.13. D. The estimate of the 90th percentile is the (0.9)(10 + 1) = 9.9th claim.
Thus we linearly interpolate between the 9th claim of 2199 and the 10th claim of 3207:
(0.1)(2199) + (0.9)(3207) = 3106.
3207
3106
2199
9 9.9 10
7.14. Sorting the 15 values from smallest to largest:

198, 200, 203, 208, 209, 210, 210, 212, 215, 215, 216, 220, 221, 223, 224.
(0.25)(15 + 1) = 4. ⇒ 208 is the estimated 25th percentile.
(0.5)(15 + 1) = 8. ⇒ 212 is the estimated median percentile.
(0.75)(15 + 1) = 12. ⇒ 220 is the estimated 75th percentile.
7.15. C. N = 11, p = 0.45, (N+1)p = 5.4. The 5th claim is 360 and the 6th claim is 420.
Linearly interpolating with more weight to 360: (0.6)(360) + (0.4)(420) = 384.
420
384
360
5 5.4 6
7.16. D. (0.35)(13 + 1) = 4.9. Linearly interpolate between the 4th and 5th value:
(0.1)(216) + (0.9)(250) = 246.6.
250
246.6
216
4 4.9 5
2016-C-6, Fitting Loss Distributions §8 Percentile Matching, HCM 10/22/15, Page 145
Section 8, Percentile Matching
In order to use Percentile Matching to fit a distribution, one matches a number of percentiles of the
data and the distribution equal to the number of parameters of the distribution.
Exponential Distribution:
To employ percentile matching with a one parameter distribution, one solves for the parameter
value such that F(x) equals the chosen percentile. For example, assume one were using percentile
matching at the 95th percentile to fit an exponential distribution to the ungrouped data in Section 2.
Exercise: For the ungrouped data set in Section 2, estimate the 95th percentile.
[Solution: There are 130 claims. The observed 95th percentile is at about the
0.95(131) = 124.45th claim. The 124th claim is 1,078,800. The 125th claim is 1,117,600.
Thus the 95th percentile = (0.55)(1,078,800) + (0.45)(1,117,600) = 1,096,260.]
The observed 95th percentile is somewhere around 1.096 million. For the fitted Exponential, we
want F(1.096 million) = 0.95. Therefore, one solves for the exponential parameter θ, such that
0.95 = 1 - exp(-1.096 x 106 /θ). ⇒ θ = 3.7 x 105 . As shown in Appendix A: VaRp (X) = -θ ln(1-p).
1.096 million = VaR0.95 = - θ ln(0.05). ⇒ θ = 3.7 x 105 .
Similarly, for the 90th percentile, the observed 90th percentile is around 7.63 x 105 , and
θ = -7.63 x 105 / ln(1-0.9) = 3.3 x 105 .
Note that the value of the fitted parameter depends on the percentile at which the matching is done.
If the curve really fit the data perfectly, it would not matter what percentile we used; this rarely is the
case in the real world.
In general for percentile matching to an Exponential, if p1 is the percentile at which one is matching,
and x1 is the observed value for that percentile, then θ = -x1 / ln(1 - p1 ).
Exercise: For the ungrouped data set in Section 2, estimate the 75th percentile.
[Solution: There are 130 claims. The smoothed empirical estimate of the 75th percentile is the
(0.75)(131) = 98.25th claim. The 98th claim is 284,300. The 99th claim is 316,300.
Thus the 75th percentile = (0.75)(284,300) + 0(.25)(316,300) = 292,300.]
Exercise: For the ungrouped data set in Section 2, fit an Exponential Distribution by matching to the
75th percentile.
[Solution: F(292,300) = 1 - exp[-292,300/θ] = 0.75. ⇒ θ = -292,300 / ln(1 - 0.75) = 210,850.]
Two Parameter Distributions:
For distributions with two parameters the matching is done at two selected percentiles. Again the
fitted parameters would depend on the percentiles selected. For example, let's fit a LogNormal to
the ungrouped data set in Section 2, at the 75th and 95th percentiles.
As determined above, for this data set the estimated 75th percentile is 292,300 and the estimated
95th percentile is 1,096,260. For the fitted LogNormal we want F(292,300) = 0.75 and
F(1,096,260) = 0.95. We have two equations in the two unknown parameters µ and σ.
Φ[{ln(292,300)−µ} / σ] = 0.75, and Φ[{ln(1,096,260)−µ} / σ] = 0.95.
By use of the Standard Normal Table, Φ[0.674] = 0.75 and Φ[1.645] = 0.95.
Thus the two equations are equivalent to:
{ln(292,300)−µ} / σ = 0.674 and {ln(1,096,260)−µ} / σ = 1.645.
One can solve σ = {ln(1,096,260)-ln(292,300)} / (1.645 - 0.674) = 1.361, and then
µ = ln(1,096,260) - (1.361)(1.645) = 11.67.
Generally, it is relatively straightforward to check the results of percentile matching. Given the fitted
parameters, one goes back and check the resulting values of the Distribution Function.
For example, for a LogNormal Distribution with µ = 11.67 and σ = 1.361,
F(1,096,260) = Φ[{ln(1,096,260)-11.67} / 1.361] = Φ[1.644] = 0.950, and
F(292,300) = Φ[{ln(292,300)-11.67} / 1.361] = Φ[0.673] = 0.750. Thus this checks.69
It checks to the level of accuracy used. With no intermediate rounding I got µ = 11.6667 and σ = 1.36225.
69
This is beyond the level of accuracy usually employed for the crude technique of percentile matching.
Below are some examples of results of percentile matching applied to the ungrouped data set in
Section 2.70
Distrib. Perc. 1 Perc. 2 Perc. 3 Par. 1 Par. 2 Par. 3 Mean(000) Coef. Var. Skewness
Data 313 2.01 4.83
LogNormal 10 60 11.50 1.625 370 3.61 57.8
LogNormal 50 90 11.70 1.439 340 2.63 26.1
LogNormal 75 95 11.67 1.362 296 2.32 19.5
Loglogistic 10 60 1.0435 100,768 2323 N.A. N.A.
Pareto 10 60 1.3035 145,749 480 N.A. N.A.
Pareto 50 90 1.4584 198,286 433 N.A. N.A.
Pareto 75 95 1.7548 242,895 322 N.A. N.A.
Weibull 10 60 164,384 0.8672 177 1.16 2.48
Weibull 50 90 211,889 0.6508 289 1.59 3.97
Weibull 75 95 166,908 0.5829 261 1.82 4.85
Burr 10 50 90 2.0531 336,729 0.8888 365 N.A. N.A.
In the following, let the percentiles at which the matching is done be p1 and p2 , with corresponding
loss amounts x1 and x2 .
LogNormal Distribution:
Exercise: For the ungrouped data in Section 2, determine the smoothed empirical estimate of the
50th percentile.
[Solution: For the ungrouped data in Section 2 with 130 values, the 50th percentile is the (131)(0.5)
= 65.5th value from smallest to largest. (119,300)(0.5) + (122,000)(0.5) = 120,650.]
90th percentile.
[Solution: (131)(0.9) = 117.9. (0.1)(737,700) + (0.9)(766,100) = 763,260..]
Exercise: Fit a LogNormal to the ungrouped data in Section 2, matching at the 50th and 90th
percentiles.
[Solution: F(120,650) = Φ[(ln(120,650) - µ)/σ] = 0.5. ⇒ (11.7001 - µ)/σ = 0. ⇒ µ = 11.7001.
F(763,260) = Φ[(ln(763,260) - µ)/σ] = 0.9. ⇒ (13.5454 - µ)/σ = 1.282.
⇒ σ = (13.5454 - 11.7001)/1.282 = 1.439.]
70
I have used the following estimates of the percentiles: 10th 12,270, 50th 120,650, 60th 148,620,
75th 292,300, 90th 763,260, 95th 1,096,260.
Exercise: Verify that a LogNormal Distribution with parameters µ = 11.7001, and θ = 1.439, matches
the ungrouped data in Section 2 at the 50th, and 90th percentiles.
[Solution: F(120,650) = Φ[(ln(120,650) - 11.7001)/1.439] = Φ[0] = 0.50.
F(763,260) = Φ[(ln(763,260) - 11.7001)/1.439] = Φ[1.282] = 0.90.]
For the LogNormal, the two equations for matching percentiles are: Φ((lnx1 - µ)/σ) = p1 and
Φ((lnx2 - µ)/σ) = p2 . Let ξ1 = Φ-1(p1 ). ξ2 = Φ-1(p2 ). Then (lnx1 - µ)/σ = ξ1 and (lnx2 - µ)/σ = ξ2 .
Therefore, σ = {ln(x2 ) - ln(x1 )} / (ξ2 - ξ1 ) = ln[x2 /x1 ] / (ξ2 - ξ1 ), and µ = ln(x2 ) - σξ2 .
For the above example, ξ1 = Φ-1(0.5) = 0, and ξ2 = Φ-1(0.9) = 1.282.
σ = ln[763,260/120,650] / (1.282 - 0) = 1.439. µ = ln(763,260) - (1.439)(1.282) = 11.70.
Weibull Distribution:
Exercise: Fit the Weibull to the ungrouped data in Section 2, matching at the 50th and 90th
percentiles.
[Solution: For the Weibull F(x) = 1 - exp(-(x/θ)τ). The estimated 50th percentile is 120,650.
The estimated 90th percentile is 763,260. Therefore,
exp[-(120,650/θ)τ] = 0.5, and exp[-(763,260/θ)τ] = 0.1.
⇒ (120,650/θ)τ = -ln(0.5), and 763,260/θ)τ = -ln(0.1).

τ ln(120,650) - τln(θ) = ln(-ln(.5)), and τ ln(763,260) - τln(θ) = ln(-ln(0.1)).
ln[-ln(.1)] ln(120,650) - ln[-ln(.5)] ln(763,260)
Therefore, ln(θ) = = 14.7233/1.20055 = 12.2638.
ln[-ln(.1)] - ln[-ln(.5)]
θ = exp[12.2638] = 211,885.
τ = ln(-ln(0.1)) / {ln(763,260) - ln(θ)} = 0.6508.
Alternately, as shown in Appendix A: VaRp (X) = θ [ -ln(1-p) ]1/τ.
Thus, 120,650 = θ [ -ln(0.5) ]1/τ, and 763,260 = θ [ -ln(0.1) ]1/τ].
Dividing these two equations: 6.3262 = [ln(10)/ln(2)]1/τ.
⇒ ln(6.3262) = ln[ln(10)/ln(2)]/τ = ⇒ τ = 0.6508.

⇒ θ = 763,260 / ln[10]1/0.6508 = 211,886.]
In general, the percentile matching for a Weibull was done as follows:

Let ξ1 = -ln(1-p1 ), ξ2 = -ln(1-p2 ). Then
θ = exp[{ln(ξ2 ) ln(x1 ) - ln(ξ1 ) ln(x2 )} / {ln(ξ2 ) - ln(ξ1 )}], and τ = ln(ξ2 ) / {ln(x2 ) - ln(θ)}.
In the previous example, ξ1 = -ln(1-p1 ) = -ln(0.5) = ln(2). ln(ξ1 ) = ln(ln(2)) = -0.366.
ξ2 = -ln(1-p2 ) = -ln(0.1) = ln(10). ln(ξ2 ) = ln(ln(10)) = 0.834.

ln(x1 ) = ln(120,650) = 11.70. ln(x1 ) = ln(763,260) = 13.55.
Then θ = exp[{ln(ξ2 ) ln(x1 ) - ln(ξ1 ) ln(x2 )} / { ln(ξ2 ) - ln(ξ1 )}]
= exp[{(0.834)(11.70) - (-0.366)(13.55)} / {0.834-(-0.366)}] = 211,889.
If p1 = 0.25 and p2 = 0.75, then the general formulas become:

ξ1 = -ln(0.75) = ln(4/3) and ξ2 = -ln(0.25) = ln(4)
θ = exp[{ln( ln(4)) ln(x1 ) - ln(ln(4/3)) ln(x2 )}/{ ln( ln(4)) - ln(ln(4/3))}] =
exp[{g ln(x1 ) - ln(x2 )}/{ g-1), where g = ln( ln(4))/ln(ln(4/3), and
τ = ln[( ln(4)] / {ln(x2 ) - ln(θ)}.
These match the formulas in Appendix A of Loss Models.71
Loglogistic:
(x / θ)γ
For the Loglogistic Distribution, F(x) = .
1 + (x / θ)γ
Exercise: Fit a Loglogistic to the ungrouped data in Section 2, matching at the 75th and 95th
percentiles.
71
The formulas for percentile matching will not be attached to your exam in the abridged version of Appendix A.
[Solution: Matching, we want F(292,300) = 0.75, and F(1,096,260) = 0.95.

⇒ (292,300/θ)γ / {1 + (292,300/θ)γ} = 0.75, and (1,096,260/θ)γ / {1 + (1,096,260/θ)γ} = 0.95.
{1 + (292,300/θ)γ} / (292,300/θ)γ = 4/3, and {1 + (1,096,260/θ)γ} / (1,096,260/θ)γ = 20/19.
(292,300/θ)−γ + 1 = 4/3 and (1,096,260/θ)−γ + 1 = 20/19.
(292,300/θ)−γ = 1/3, and (1,096,260/θ)−γ = 1/19. (292,300/θ)γ = 3, and (1,096,260/θ)γ =19.
Thus (1,096,260/292,300)γ = 19/3. γ ln(1,096,260/292,300) = ln(19/3)
γ = ln(19/3) / ln(1,096,260/292,300) = 1.396. ⇒ θ = 1,096,260 /(191/γ) = 133,015.
Alternately, as shown in Appendix A: VaRp (X) = θ {p-1 - 1}-1/γ.
Thus, 292,300 = θ {1/0.75 - 1}-1/γ, and 1,096,260 = θ {1/0.95 - 1}-1/γ.

⎛ 1/ 0.75 - 1⎞ 1/ γ
Dividing the two equations: 3.7505 = ⎜ ⎟ = 6.33331/γ.
⎝ 1/ 0.95 - 1⎠
⇒ γ = ln(6.3333)/ln(3.7505) = 1.396. ⇒ θ = 1,096,260 /(191/γ) = 133,015.]
In general, the percentile matching for a Loglogistic is done as follows:

Let ξ1 = p1 / (1-p1 ), ξ2 = p2 / (1-p2 ). Then γ = ln(ξ2 / ξ1 ) / ln(x2 / x1 ), and θ = x2 / ξ2 1/γ.
If p1 = 0.25 and p2 = 0.75, then the formulas become: ξ1 = 1/3 and ξ2 = 3
γ = ln(9) / ln(x2 / x1 ) and θ = x2 / 31/γ . We can get theta as well from θ = x1 / ξ1 1/γ = x1 31/γ.
Thus in this case, θ = x1 x2 . Writing for the 25th and 75th percentiles x1 = p and x2 = q,
then these formulas become: γ = 2 ln(3) / {ln(q) - ln(p)} and θ = exp[((ln(q) + ln(p))/2].72
Pareto Distribution:
Exercise: Set up the equations to be solved, to fit a Pareto to the ungrouped data in
Section 2, matching at the 50th and 90th percentiles.
[Solution: For the Pareto F(x) = 1 - (1 + x/θ)−α. The estimated 50th percentile is 120,650.
The estimated 90th percentile is 763,260. Therefore,
(1 + 120,650/θ)−α = 0.5 and (1 + 763,260/θ)−α = 0.1. One could rewrite this as
120,650/θ = 0.5−1/α - 1 and 763,260/θ = 0.1−1/α - 1.]
72
These match the formulas in Appendix A of Loss Models.
The formulas for percentile matching will not be attached to your exam in the abridged version of Appendix A.
If we were to fit to a Pareto to the ungrouped data in Section 2, matching at the 50th and 90th
percentiles, one could numerically solve for alpha: (0.1−1/α - 1) / (0.5−1/α - 1) = 763,260/120,650.
The solution turns out to be α = 1.4584. Then θ = 120,650 / (0.5−1/α -1) = 198,286.
Exercise: Check the above fit via matching at the 50th and 90th percentiles of a Pareto.
[Solution: F(x) = 1 - (1 + x/θ)−α. F(120,650) = 1 - (1 + 120650/198286)-1.4584 =
1 - 1/1.60851.4584 = 0.500. F(763,260) = 1 - (1 + 763260/198286)-1.4584 = 0.900. ]
In general for the Pareto, let ξ1 (α) = (1-p1 )−1/α, ξ2 (α) = -ln(1-p2 )−1/α.
Then solve the following equation numerically for alpha:73
{ξ2 (α) - 1} / {ξ1 (α) - 1} - x2 /x1 = 0. Then θ = x1 / {ξ1 (α) - 1}.
Burr Distribution with Alpha Fixed:74 75
The Burr Distribution has parameters α, θ and γ. If alpha is known, then we have two parameters,
and therefore we match at two percentiles.
60th percentile.
[Solution: For the ungrouped data in Section 2 with 130 values, the 60th percentile is the (131)(0.6)
= 78.6th value from smallest to largest. (0.4)(146,100) + (0.6)(150,300) = 148,620.]
80th percentile.
[Solution: (131)(0.8) = 104.8. (0.2)(406,900) + (0.8)(423,200) = 419,940.]
Exercise: Set up the equations that need to be solved in order to fit a Burr Distribution with
α = 2 to the ungrouped data in Section 2, by matching at the 60th and 80th percentiles.
[Solution: For the Burr, the survival function S(x) = (1/{1 + (x/θ)γ})α = (1/{1 + (x/θ)γ})2 .
Thus the two equations are: (1/{1 + (148,620/θ)γ})2 = 0.4, and (1/{1 + (419,940/θ)γ})2 = 0.2.]
73
Note that in the case of the Pareto, as well as the Burr, one can reduce to one equation in one unknown to be
solved numerically. In more complicated cases, percentile matching could be performed by solving numerically two
equations in two unknowns. However, this is probably wasted effort due to the inherent lack of accuracy of
percentile matching.
74
See 4, 11/06, Q.1.
75
The Loglogistic Distribution is a Burr Distribution with α = 1. See 4, 11/05, Q.3.
1
= 0.4. ⇒ (148,620/θ)γ = 2.5 - 1 = .5811.
{1 + (148,620 / θ)γ}2
1
= 0.2. ⇒ (419,940/θ)γ = 5 - 1 = 1.2361.
{1 + (419,940 / θ)γ}2
Dividing the two equations eliminates θ: 2.8259γ = 2.1272.
⇒ γ = ln(2.1272)/ln(2.8259) = 0.7266. ⇒ θ = 419,940/1.23611/.7266 = 313,687.
Exercise: Verify that a Burr Distribution with parameters α = 2, θ = 313,687, and

γ = 0.7266, matches the ungrouped data in Section 2 at the 60th, and 80th percentiles.
1 1
[Solution: F(x) = 1 - = 1 - .
{1 + (x / θ)γ }α {1 + (x / 313,687) 0.7266}2
1
F(148,620) = 1 - = 0.600.
{1 + (148,620 / 313,687)0.7266 }2
1
F(419,940) = 1 - = 0.800.]
{1 + (419,940 / 313,687)0.7266 }2
Burr Distribution:
The Burr Distribution has three parameters α, θ and γ. Thus we match at three percentiles.
Exercise: Set up the equations that need to be solved in order to fit a Burr Distribution to the
ungrouped data in Section 2, by matching at the 10th, 50th and 90th percentiles.
[Solution: For the Burr, the survival function S(x) = (1/{1 + (x/θ)γ})α.
For the ungrouped data in Section 2, the 10th percentile is 12,270, the 50th percentile is 120,650
and the 90th percentile is 763,260. Thus the three equations are:
(1/{1 + (12,270/θ)γ})α = 0.9, (1/(1 + {120,650/θ)γ})α = 0.5, and (1/{1 + (763,260/θ)γ})α = 0.1.
These can be simplified to:
(12,270/θ)γ = 0.9−1/α - 1, (120,650/θ)γ = 0.5−1/α - 1, and (763,260/θ)γ) = 0.1−1/α - 1.
γ ln(12,270/θ) = ln(0.9−1/α - 1), γ ln(120,650/θ) = ln(0.5−1/α - 1),
and γ ln(763,260/θ)γ) = ln(0.1−1/α - 1).]

Let ξ1 (α) = (1-p1 )−1/α - 1, ξ2 (α) = ln(1-p2 )−1/α - 1 , ξ3 (α) = ln(1-p3 )−1/α - 1.
Then one could rewrite the equations for percentile matching as:
γ ln(x1 /θ) = ln(ξ1 (α)), γ ln(x2 /θ) = ln(ξ2 (α)), γ ln(x3 /θ) = ln(ξ3 (α)).
Subtracting the first equation from the second equation gives:
γ {ln(x2 /θ) - ln(x1 /θ)} = ln(ξ2 (α)) − ln(ξ1 (α)), or
γ = ln[ξ1 (α)/ξ2 (α)] / ln[x1 /x2 ]. Similarly γ = ln[ξ2 (α)/ξ3 (α)] / ln[x2 /x3 ].
We need to solve numerically for alpha such that:

ln[ξ1 (α)/ξ2 (α)] / ln[x1 /x2 ] = ln[ξ2 (α)/ξ3 (α)] / ln(x2 /x3 ), with ξi(α) = (1-pi)−1/α -1.
Then γ = ln[ξ1 (α)/ξ2 (α)] / ln[x1 /x2 ] and θ = x1 / ξ1 (α)1/γ.
Exercise: Verify that a Burr Distribution with parameters α = 2.05305, θ = 336,729, and
γ = 0.888834, matches the ungrouped data in Section 2 at the 10th, 50th and 90th percentiles.
[Solution: F(x) = 1 - {1/(1 + (x/θ)γ)}α.

F(12270) = 1 - 1/{(1 + (12270/336729)0.888834)}2.05305 = 0.100
F(120650) = 1 - 1/{(1 + (120650/336729)0.888834)}2.05305 = 0.500
F(763260) = 1 - 1/{(1 + (763260/336729)0.888834)}2.05305 = 0.900.]
Summary:
One matches at a number of percentiles equal to the number of fitted parameters.

Important cases include: Exponential Distribution, LogNormal Distribution, Weibull Distribution,
LogLogistic, Burr with fixed alpha, and the Single Parameter Pareto Distribution.76
76
Fitting to the Single Parameter Pareto Distribution will be discussed in a subsequent section
Grouped Data:
One can also apply percentile matching to grouped data.77 One must somehow estimate the
percentile(s) for the data. This is most easily done if one chooses one of the endpoints of the
intervals at which to match.
For example for the grouped data in Section 3, out of 10,000 there are 8175 accidents of size less
than or equal to 25,000.
Interval ($000) Number of Accidents Cumulative Number of Accidents
0-5 2208 2208
5 -10 2247 4455
10-15 1701 6156
15-20 1220 7376
20-25 799 8175
25-50 1481 9656
50-75 254 9910
75-100 57 9967
100 - ∞ 33 10000
SUM 10,000
Therefore, if one fit an Exponential Distribution to this data by percentile matching at 25,000,
we would set 1 - e-25000/θ = 8175/10000. ⇒ θ = -25000/ln(0.1825) = 14,697.
Exercise: Fit a LogNormal Distribution to the above grouped data from Section 3, via percentile
matching at 15,000 and 50,000.
[Solution: There are 6156 out of 10000 accidents less than or equal to 15,000.
Set Φ[(ln(15,000) - µ)/σ] = 0.6156. ⇒ (9.616 - µ)/σ = 0.294.
There are 9656 out of 10000 accidents less than or equal to 50,000.
Set Φ[(ln(50,000) - µ)/σ] = 0.9656. ⇒ (10.820 - µ)/σ = 1.820.
(10.820 - µ)/(9.616 - µ) = 1.820/ .294 = 6.190. ⇒ µ = 9.38. ⇒ σ = 0.79.]
Exercise: Fit a Weibull Distribution to the above grouped data from Section 3, via percentile
matching at 20,000 and 100,000.
[Solution: There are 7376 out of 10000 accidents less than or equal to 20,000.
Set 1 - exp[-(20000/θ)τ] = 0.7376. ⇒ (20000/θ)τ = 1.338.
There are 9967 out of 10000 accidents less than or equal to 100,000.
Set 1 - exp[-(100000/θ)τ] = 0.9967. ⇒ (100000/θ)τ = 5.714.
⇒ 5τ = 5.714/1.338 = 4.271. ⇒ τ = 0.902. ⇒ θ = 14,482.]

77
See 4, 11/02, Q. 37.
Mixtures:
For example, one can model losses via a two-point mixture: F(x) = w A(x) + (1 - w) B(x),
where A(x) and B(x) are each distribution functions.78
w is an additional parameter. 0 ≤ w ≤ 1.
The number of parameters for this mixture is:

1 + (number for parameters for A) + (number of parameters for B).
Thus this mixture has at least three parameters.
On your exam, when fitting a mixture, all but one or two of these parameters will be fixed, since
otherwise one would need a computer to determine the answer.
Exercise: Let A(x) be an Exponential Distribution with mean 10.

Let B(x) be an Exponential Distribution with mean 20.
For some data, the smoothed empirical estimate of the 70th percentile is 15.
FIt a two-point mixture via percentile matching.
[Solution: 0.7 = w (1 - e-15/10) + (1 - w) (1 - e-15/20). ⇒
w = (0.3 - e-15/20) / (e-15/10 - e-15/20) = 0.692.
Comment: Note that for the first Exponential, F(15) = 1 - e-15/10 = 0.777 > 0.7.
For the second Exponential, F(15) = 1 - e-15/20 = 0.528 < 0.7.
The only way it is possible to get 0.7 by weighting together two numbers, with weights that sum to
one and are each at least 0 and at most 1, is if one of the numbers is greater than or equal to 0.7 and
the other number is less than or equal to 0.7.
More generally, for a two point mixture with all the parameters other than w fixed, in order for
percentile matching to result in 0 ≤ w ≤ 1, at the empirical pth percentile, one of the component
distributions has to be greater than or equal to p/100, while the other one is less than or equal to
p/100.]
78
One could mix more than two distributions, and there are also continuous mixtures.
Problems:
Use the following information to answer each of the following two questions.
A data set of claim sizes has its 15th and 85th percentiles at 1000 and 5000 respectively.
A LogNormal distribution is fit to this data via percentile matching applied to these two percentiles.
8.1 (2 points) The fitted distribution has a σ parameter in which of the following intervals?
A. less than 0.6
E. at least 0.9
8.2 (1 point) The fitted distribution has a µ parameter in which of the following intervals?
A. less than 7.2
E. at least 7.8
A data set of claim sizes has its 35th percentile and 75th percentile at
10 and 20 respectively. A Weibull distribution, is fit to this data via percentile matching applied to
these two percentiles.
8.3 (3 points) The fitted distribution has a τ parameter in which of the following intervals?
A. less than 1.3
E. at least 1.6
8.4 (2 points) Determine the θ parameter of the fitted distribution.

A. 15.0 B. 15.5 C. 16.0 D. 16.5 E. 17.0
8.5 (2 points) You observe the following 6 claims: 162.22, 151.64, 100.42, 174.26, 20.29, 34.36.
A Distribution: F(x) = 1 - e-qx, x > 0, is fit to this data via percentile matching at the 57th percentile.
Determine the value of q.
A. less than 0.006
E. at least 0.009

You observe the following 9 values:
11.2, 11.4, 11.6, 11.7, 11.8, 11.9, 12.0, 12.3, 12.4
A Normal Distribution is fit to this data via percentile matching at the 10th and 80th percentiles.
8.6 (2 points) Determine the value of σ.

A. 0.4 B. 0.5 C. 0.6 D. 0.7 E. 0.8
8.7 (1 point) Determine the value of µ.

A. 11.5 B. 11.7 C. 11.9 D. 12.1 E. 12.3
8.8 (2 points) You are given the following information about a set of individual claims:
(i) 80th percentile = 115,000
(ii) 95th percentile = 983,000
A LogNormal Distribution is fit using percentile matching.
Using this LogNormal Distribution, estimate the average size of claim.
E. At least 420,000
8.9 (2 points) For an Illustrative Life Table, 10 million people are alive at age 0.
6,616,155 people are still alive at age 70.
40,490 people are still alive at age 100.
Fit a Weibull distribution by percentile matching.
8.10 (2 points) You are given the following data:

51 66 94 180 317 502 672 1626 3542
You use the method of percentile matching at the 75th percentiles to fit a mixed distribution to these
data: F(x) = w(1 - e-x/500) + (1 - w)(1 - e-x/1000).
Determine the estimate of w.
(A) Less than 0.35
(E) At least 0.50
8.11 (3 points) The smoothed empirical estimates of the 10th and 60th percentiles are 40
and 80 respectively. You use the method of percentile matching to fit a Gompertz Distribution:
F(t) = 1 - exp[-B(ct - 1)/ln(c)], B > 0, c > 1, t ≥ 0.
Estimate S(85).
(A) Less than 0.24
(E) At least 0.30
8.12 (2 points) Assume that the heights of 15 year old boys are Normally Distributed.
The 95th percentile of these heights is 184 centimeters.
The 5th percentile of these heights is 158 centimeters.
Using percentile matching, estimate σ.
A. 7 B. 8 C. 9 D. 10 E. 11
8.13 (4 points) F(10) = 0.58. F(20) = 0.78.

You fit via percentile matching an Inverse Burr Distribution with τ = 0.7.
Determine F(50).
A. 91% B. 92% C. 93% D. 94% E. 95%
8.14 (3 points) Let R be the weekly wage for a worker compared to the statewide average.
R follows a LogNormal Distribution, with σ < 2.
97.5% of workers have weekly wages at most twice the statewide average.
Determine what percentage of workers have weekly wages less than half the statewide average.
A. 2% B. 3% C. 4% D. 5% E. 6%

(i) Losses follow a Weibull distribution with τ = 0.7 and θ unknown.
(ii) A random sample of 200 losses is distributed as follows:
x ≤ 100 90
100 < x ≤ 250 60
x > 250 50
Estimate θ by matching at the 75th percentile.
(A) Less than 150
(E) At least 225
8.16 (3 points) Assume that when professional golfers try to make a put, their chance of failure is a
function of the distance of the golf ball from the cup.
Assume that their chance of failure follows a Loglogistic Distribution.
Their chance of success at 8 feet is 50%.
Their chance of success at 16 feet is 20%.
Estimate their chance of success at 40 feet.
A. 2% B. 3% C. 4% D. 5% E. 6%
8.17 (4 points) Annual income follows a LogNormal Distribution.

The 50th percentile of annual income is $50,000.
The 95th percentile of annual income is $250,000.
Determine the percentage of total income earned by the top 1% of earners.
A. 8% B. 9% C. 10% D. 11% E. 12%
8.18 (2 points) You are given the following data:

42, 123, 140, 151, 209, 327, 435, 479, 721, 1358, 1625.
You use the method of percentile matching at the 50th and 85th percentiles to fit
a LogNormal distribution to these data.
What is the second moment of the fitted distribution?.
(A) Less than 500,000
(B) At least 500,000, but less than 1 million
(C) At least 1 million, but less than 3 million
(D) At least 3 million, but less than 5 million
(E) At least 5 million
8.19 (4 points) You have data on the number of Temporary Total claims for
870 Workers Compensation Insurance classes.
358 of these classes have 250 or fewer claims.
440 of these classes have 500 or fewer claims.
You fit via percentile matching a mixture of two Exponential Distributions, with weights 55% and
45%.
What is the resulting estimate of the number of classes with more than 5000 claims?
A. 100 B. 110 C. 120 D. 130 E. 140
8.20 (2 points) For a portfolio of policies, you are given:

(i) Losses follow a Weibull distribution with parameters θ and τ.
(ii) A sample of 13 losses is:
18 26 28 29 35 43 57 94 119 166 400 569 795
(iii) The parameters are to be estimated by percentile matching using the 50th and 90th
smoothed empirical percentiles.
Calculate the estimate of θ.
(A) Less than 120
(E) At least 135
8.21 (3 points) An insurerʼs total losses are modeled by a lognormal distribution with the following
Value at Risk amounts in millions of dollars:
VaR90% = 2018
VaR95% = 2210
Determine the 99% Value at Risk.
A. 2500 B. 2600 C. 2700 D. 2800 E. 2900
8.22 (2 points) 31% of voters have to wait at most 20 minutes to vote.

83% of voters have to wait at most 60 minutes to vote.
Use the method of percentile matching to fit an Inverse Weibull distribution.
8.23 (3 points) Lives follow the survival function: S(x) = (1 - x/110)c, 0 ≤ x ≤ 110, c > 0.
The median lifetime is 64.
Using percentile matching, determine the mean excess loss (mean residual life) at 74.
(A) 16 (B) 17 (C) 18 (D) 19 (E) 20
8.24 (4, 5/85, Q.52) (3 points) It was determined that the 40th percentile of a sample is 1, and that
the 75th percentile of the sample is 64.
Use percentile matching to estimate the parameter, τ, of a Weibull distribution.
Which of the following represents the value of τ?
ln 0.25 - ln 0.6 ln 0.25 ln 0.25
A. B. C . ln[ ] / ln 64
ln 64 (ln 64) (ln 0.6) ln 0.6
ln 0.25
D. ln[ ] E. None of the above
(ln 64) (ln 0.6)
8.25 (160, 5/89, Q.16) (2.1 points) A sample of 10 lives was observed from the time of
diagnosis until death. You are given:
(i) Times of death were: 1, 1, 2, 3, 3, 3, 4, 4, 5 and 5.
(ii) The lives were subject to a survival function, S(t) = αt2 + βt + 1, 0 ≤ t ≤ k .
Determine the parameter α by the method of percentiles matching, using the 25th and 75th
percentiles.
(A) -0.04 (B) -0.03 (C) -0.02 (D) -0.01 (E) 0
8.26 (4, 5/90, Q.44) (2 points) A random sample of claims has been drawn from a Loglogistic
distribution, with unknown parameters θ and γ. In the sample, 80% of the claim amounts exceed
$100 and 20% of the claim amounts exceed $400.
Find the estimate of θ by percentile matching.
A. Less than 100
E. At least 250
8.27 (160, 5/91, Q.20) (1.9 points) For a complete study of five lives, you are given:
(i) Deaths occur at times t = 2, 3, 3, 5, 7.
(ii) The underlying survival distribution is S(t) = 4-λt, t ≥ 0.
^
Using the percentile matching at the median, calculate λ .
(A) 0.111 (B) 0.125 (C) 0.143 (D) 0.167 (E) 0.333
• Losses follow a Pareto distribution, with parameters θ and α.
• The 10th percentile of the distribution is θ - k, where k is a constant.
• The 90th percentile of the distribution is 5θ - 3k.
Determine α.
A. Less than 1.25
E. At least 2.75
• Losses follow a Weibull distribution, with parameters θ and τ.
• The 25th percentile of the distribution is 1,000.
• The 75th percentile of the distribution is 100,000.
Determine τ.
A. Less than 0.4
E. At least 1.0
From a sample of 10 lives diagnosed with terminal cancer, you are given:
(i) The deaths occurred at times 4, 6, 6, 6, 7, 7, 9, 9, 9, 14.
(ii) The underlying distribution was Weibull with parameters θ and τ.
Determine τ by the method of percentiles matching, using the 25th and 75th percentiles.
(A) 2 (B) 3 (C) 4 (D) 5 (E) 6
8.31 (4, 5/00, Q.32) (2.5 points) You are given the following information about a sample of data:
(i) Mean = 35,000
(ii) Standard deviation = 75,000
(iii) Median = 10,000
(iv) 90th percentile = 100,000
(v) The sample is assumed to be from a Weibull distribution.
Determine the percentile matching estimate of the parameter τ .
(A) Less than 0.25
(E) At least 0.55
8.32 (4, 11/00, Q.39) (2.5 points)

You are given the following information about a study of individual claims:
(i) 20th percentile = 18.25
(ii) 80th percentile = 35.80
Parameters µ and σ of a lognormal distribution are estimated using percentile matching.
Determine the probability that a claim is greater than 30 using the fitted lognormal distribution.
(A) 0.34 (B) 0.36 (C) 0.38 (D) 0.40 (E) 0.42
(i) Losses follow an exponential distribution with mean θ.
(ii) A random sample of losses is distributed as follows:
Loss Range Number of Losses
(0 – 100] 32
(100 – 200] 21
(200 – 400] 27
(400 – 750] 16
(750 – 1000] 2
(1000 – 1500] 2
Total 100
Estimate θ by matching at the 80th percentile.
(A) 249 (B) 253 (C) 257 (D) 260 (E) 263
(i) Losses follow a loglogistic distribution with cumulative distribution function:
(x / θ)γ
F(x) = .
1 + (x / θ)γ
(ii) The sample of losses is:
10 35 80 86 90 120 158 180 200 210 1500
Calculate the estimate of θ by percentile matching, using the 40th and 80th empirically
smoothed percentile estimates.
(A) Less than 77
(E) At least 107
8.35 (4, 11/04, Q.30 & 2009 Sample Q.155) (2.5 points) You are given the following data:
0.49 0.51 0.66 1.82 3.71 5.20 7.62 12.66 35.24
You use the method of percentile matching at the 40th and 80th percentiles to fit an
Inverse Weibull distribution to these data.
Determine the estimate of θ.
(A) Less than 1.35
(E) At least 1.65
8.36 (4, 11/05, Q.3 & 2009 Sample Q.216) (2.9 points) A random sample of claims has been
drawn from a Burr distribution with known parameter α = 1 and unknown parameters θ and γ.
You are given:
(i) 75% of the claim amounts in the sample exceed 100.
(ii) 25% of the claim amounts in the sample exceed 500.
Estimate θ by percentile matching.
(A) Less than 190
(E) At least 220
(i) Losses follow a Burr distribution with α = 2.
(ii) A random sample of 15 losses is:
195 255 270 280 350 360 365 380 415 450 490 550 575 590 615
(iii) The parameters γ and θ are estimated by percentile matching using the smoothed empirical
estimates of the 30th and 65th percentiles.
Calculate the estimate of γ.
(A) Less than 2.9
(E) At least 3.8
8.38 (4, 5/07, Q.24) (2.5 points) For a portfolio of policies, you are given:
(i) Losses follow a Weibull distribution with parameters θ and τ.
(ii) A sample of 16 losses is:
54 70 75 81 84 88 97 105 109 114 122 125 128 139 146 153
(iii) The parameters are to be estimated by percentile matching using the 20th and 70th
smoothed empirical percentiles.
Calculate the estimate of θ.
(A) Less than 100
(E) At least 115
8.39 (4, 5/07, Q.28) (2.5 points)

You are given the following graph of cumulative distribution functions:
Determine the difference between the mean of the lognormal model and the mean of the data.
(A) Less than 50
(E) At least 750
8.1. C. & 8.2. D. Set the distribution function at 5000 of the LogNormal equal to .85 and at 1000
equal to .15: Φ[((ln(5000)-µ)/σ] = 0.85 and Φ[((ln(1000)-µ)/σ] = 0.15.
Therefore consulting the Normal Table, {ln(5000)-µ}/σ = 1.036 and {ln(1000)-µ}/σ = -1.036.
Solving, σ = {ln(5000) - ln(1000)} / (1.036 - (-1.036)) = 0.7768.
and µ = ln(1000) - (0.7768)(-1.036) = 7.713.
8.3. E., 8.4. D. As shown in Appendix A: VaRp (X) = θ [ -ln(1-p) ]1/τ.
Thus, 10 = θ [ -ln(0.65) ]1/τ, and 20 = θ [ -ln(0.25) ]1/τ].
Dividing these two equations: 2 = [ln(4)/ln(1/0.65)]1/τ = .3.218081/τ.
⇒ τ = ln(3.21808)/ln(2) = 1.686. ⇒ θ = 20 / ln[4]1/1.686 = 16.48.

Alternately, for the Weibull F(x) = 1 - exp(-(x/θ)τ). F(10) = 0.35 and F(20) = 0.75.
Therefore, exp(-(10/θ)τ) = 1 - .35, and exp(-(20/θ)τ) = 1 - 0.75.
(10/θ)τ = -ln(1 - 0.35) = .4308, and (20/θ)τ = -ln(1 - 0.75) = 1.3863.
Dividing the second equation by the first: 2τ = 3.2180. ⇒ τln(2) = ln(3.2180) = 1.1688.
τ = 1.1688 / .6932 = 1.686. (20/θ)1.686 = 1.3863. ⇒ θ = 20/1.38631/1.686 = 16.48.
8.5. A. First order the claims from smallest to largest. The 4th claim is 151.64 and is an estimate of
the 4/(1+ 6) = 0.57 percentile. Set 1 - e-q151.64 = 0.57. Thus q = 0.0056.
8.6. B. & 8.7. C. The 10th percentile is estimated as the (.1)(9+1) = 1st claim, 11.2.
The 80th percentile is estimated as the (.80)(9+1) = 8th claim, which is 12.3.
The Distribution Function is in terms of the Standard Normal: F(x) = Φ((x - µ)/σ).
Set F(11.2) = 0.1 and F(12.3) = 0.8. Thus Φ((11.2 - µ)/σ) = 0.1 and Φ((12.3 - µ)/σ) = 0.8.
Now the Standard Normal has Φ(.842) = 0.8 and Φ(1.282) = 1 - 0.9 = 0.1.
Therefore: (12.3 - µ)/σ = 0.842, and (11.2 - µ)/σ = -1.282.
Thus: 12.3 - µ = .842σ, and 11.2 - µ = -1.282σ.
Subtracting the two equations: 1.1 = 2.124σ and thus σ = 0.518.
µ = 11.2 + 1.282σ = 11.2 + (1.282)(.518) = 11.86.
8.8. E. One sets the distribution function of the LogNormal equal to the percentiles:
0.80 = F(115000) = Φ(ln(115000) - µ)/σ), and 0.95 = F(983,000) = Φ(ln(983,000) - µ)/σ).
Consulting the Normal Distribution Table, this implies that:
0.842 = (11.653 - µ)/σ and 1.645 = (13.798 - µ)/σ.
Dividing the second equation by the first: (13.798 - µ)/(11.653 - µ) = 1.954.
Solving µ = 9.403 and σ = 2.672. For the fitted LogNormal Distribution,
E[X] = exp(µ + σ2/2) = exp(9.403 + 2.6722 /2) = e12.973 = 430,538.

8.9. 0.661655 = S(70) = exp[-(70/θ)τ]. ⇒ 0.41301 = (70/θ)τ.
0.004049 = S(100) = exp[-(100/θ)τ]. ⇒ 5.509 = (100/θ)τ.
Dividing the two equations: 0.07526 = 0.7τ. ⇒ τ = 7.263. ⇒ θ = 79.06.

Comment: While the Weibull is not a very good model of human mortality, for τ > 1 it is at least a
first approximation which has an increasing hazard rate.
8.10. A. The 75th percentile is the (0.75)(9 + 1) = 7.5th claim: (672 + 1626)/2 = 1149.
0.75 = w(1 - e-1149/500) + (1 - w)(1 - e-1149/1000). ⇒
w = (0.25 - e-1.149)/( e-2.298 - e-1.149) = 0.309.
Comment: For a two-point mixture in which only the weight is unknown, the pth percentile of the
mixture is between the individual pth percentiles. Therefore, in order for the fitted weight, via
percentile matching at the pth percentile, to be between 0 and 1, the empirical pth percentile has to
be between the individual pth percentiles. In this case, the empirical 75th percentile has to be
between -500 ln(.25) = 693 and -1000 ln(.25) = 1386.
8.11. E. .1 = 1 - exp[-B(c40 - 1)/ln(c)]. ⇒ B(c40 - 1)/ln(c) = .105361.
0.6 = 1 - exp[-B(c80 - 1)/ln(c)]. ⇒ B(c80 - 1)/ln(c) = .916291.

Dividing the two equations: (c80 - 1)/(c40 - 1) = 8.69672.
c80 - 8.69672c40 + 7.69672 = 0.
c40 = {8.69672 ± 8.696722 - (4)(7.69672) }/2 = {8.69672 ± 6.69672}/2 = 7.69672 or 1.
c > 1 ⇒ c40 = 7.69672. ⇒ c = 1.05234.
⇒ B = .916291 ln(1.05234)/(1.0523480 - 1) = 0.0008029.

S(85) = exp[-0.0008029(1.0523485 - 1)/ln(1.05234)] = 0.305.
8.12. B. 0.95 = F(184) = Φ[(184 - µ)/σ]. ⇒ (184 - µ)/σ = 1.645. ⇒ 184 - µ = 1.645σ.
0.05 = F(158) = Φ[(158 - µ)/σ]. ⇒ (158 - µ)/σ = -1.645. ⇒ 158 - µ = -1.645σ.
Subtracting the two equations: 3.29σ = 26. ⇒ σ = 7.90 centimeters.

Comment: µ = 158 + 1.645σ = 171 = (158 + 184)/2.
8.13. C. As shown in Appendix A, VaRp [X] = θ {p-1/τ - 1}-1/γ.
Thus, 10 = θ {0.58-1/0.7 - 1}-1/γ = θ 0.84931/γ, and 20 = θ {0.78-1/0.7 - 1}-1/γ = θ 2.34691/γ.
Dividing these two equations: 2 = 2.76331/γ. ⇒ γ = ln(2.7633)/ln(2) = 1.466.
⇒ θ = 10 (1.17751/1.466) = 11.18. F(50) = (1 + (11.18/50)1.466)-0.7 = 92.9%.

Alternately, for the Inverse Burr, F(x) = {(x /θ)γ/(1 + (x /θ)γ)}τ = {1 + (θ /x)γ}−τ = {1 + (θ /x)γ}-0.7.
Therefore, 0.58 = F(10) = {1 + (θ /10)γ}-0.7. ⇒ (θ /10)γ = 1/0.581/.7 - 1 = 1.1775.
Also, 0.78 = F(20) = {1 + (θ /20)γ}-0.7. ⇒ (θ /20)γ = 1/0.781/.7 - 1 = 0.4261.
Dividing the first equation by the second equation: 2γ = 2.763. ⇒ γ = ln(2.763)/ln(2) = 1.466.
⇒ θ = 10 (1.17751/1.466) = 11.18. F(50) = (1 + (11.18/50)1.466)-0.7 = 92.9%.
8.14. E. Since R is the ratio with respect to the statewide average, E[R] = 1.
Therefore, the LogNormal Distribution has mean of 1.
exp[µ + σ2/2] = 1. ⇒ µ = -σ2/2.
0.975 = F(2) = Φ[(ln2 − µ)/σ] . ⇒ (ln2 − µ)/σ = 1.960. ⇒ ln2 − µ = 0.69315 + σ2/2 = 1.960σ.
σ2 - 3.842σ + 1.386. ⇒ σ = {3.92 ± 3.922 - (4)(1)(1.386) }/2 = (3.92 ± 3.134)/2 =
0.393 or 3.527. Since we are given that σ < 2, σ = 0.393. µ = -σ2/2 = -0.0772.
F(0.5) = Φ[(ln.5 − µ)/σ] = Φ[(-.6932 + .0772)/.393] = Φ[-1.567] = 5.9%.
Comment: Such wage tables are used to price the impact of changes in the laws governing
Workers Compensation benefits.
8.15. B. Out of a total of 200 losses, there are 150 losses less than or equal to 250 and 50 greater
than 250, so 250 is the best estimate of the 75th percentile, given this grouped data.
Matching at the 75th percentile: 1 - exp[-(250/θ)0.7] = 0.75. ⇒ ln(0.25) = -(250/θ)0.7.
⇒ θ = 250/ {-ln(0.25)}1/0.7 = 157.

8.16. C. We match at two percentiles, getting two equations in two unknowns.

For the Loglogistic, VaRp [X] = θ {p-1 - 1}-1/γ.
At 8 feet the chance of failure is 50%, and at 16 feet the chance of failure is 80%.
Thus we have, 8 = θ {1/0.5 - 1)-1/γ = θ,
and 16 = θ {1/0.8 - 1)-1/γ = θ 41/γ .
Therefore, 16 = (8) (41/γ). ⇒ γ = 2.

1
The chance of success is: 1 - F(x) = .
1 + (x / 8)2
1
The chance of success at 40 feet is: = 1/26 = 3.85%.
1 + (40 / 8)2
1
Comment: Check, the chance of success at 8 feet is: = 50%,
1 + (8 / 8)2
1
and the chance of success at 16 feet is: = 20%.
1 + (16 / 8)2
The chance of failure goes to 1 as x approaches infinity.
8.17. B. 0.50 = Φ[{ln(50,000) - µ}/σ]. ⇒ µ = ln(50,000) = 10.820.
0.95 = Φ[{ln(250,000) - µ}/σ]. ⇒ σ = {ln(250,000) - ln(50,000)} / 1.645 = 0.978.

Thus the 99th percentile is: exp[10.820 + (2.326)(0.978)] = 486,420.
E[X] = exp[10.820 + 0.9782 /2] = 80,680.
E[X ∧ 486,420] = exp(µ + σ2/2) Φ[{ln(486,420) - µ - σ2 }/σ] + (486,420) {1 - Φ[{ln(486,420) - µ}/σ]}
= (80,680) Φ[2.326 - 0.978] + (486,420) {1 - Φ[2.326]} = (80,680)(0.9115) + (486,420)(0.01)
= 78,404.
E[X | X > 486,420] = e(486,420) + 486,420 = (E[X] - E[X ∧ 486,260]) / S(486,420) + 486,420.
= (80,680 - 78,404) / 0.01 + 486,420 = 714,020.
The percentage of total income earned by the top 1% of earners is:
(0.01)E[X | X > 486,420] / E[X] = (0.01)(714,020) / 80,680 = 8.85%.
Comment: The distribution of annual incomes in the United States has a heavier righthand tail than
this LogNormal.
8.18. E. (0.5)(11 + 1) = 6. (0.85)(11 + 1) = 10.2.

The smoothed empirical estimate of the 50th percentile is: 327.
The smoothed empirical estimate of the 85th percentile is: (0.8)(1358) + (0.2)(1625) = 1411.4.
One sets the distribution function of the LogNormal equal to the percentiles:
0.5 = F(327) = Φ[(ln(327) - µ)/σ], and 0.85 = F(1411.4) = Φ[(ln(1411.4) - µ)/σ].
Consulting the bottom of the Normal Distribution Table, this implies that:
0 = (5.78996 - µ)/σ and 1.036 = (7.25234 - µ)/σ.
Solving µ = 5.78996 and σ = 1.41156. For the fitted LogNormal,
E[X2 ] = exp[2µ + 2σ2 ] = exp[(2)(5.78996) + (2)(1.411562 )] = 5.75 million.
8.19. D. Matching the survival functions yields two equations in two unknowns:
1 - 358 / 870 = 0.55 exp[-250/θ1] + 0.45 exp[-250/θ2].
1 - 440 / 870 = 0.55 exp[-500/θ1] + 0.45 exp[-500/θ2].
Let u = exp[-250/θ1] and v = exp[-250/θ2].

Then the two equations can be rewritten as:
0.5885 = 0.55 u + 0.45 v. ⇒ 58.62 = 55u + 45v. ⇒ v = 1.3027 - 1.2222 u.
0.4943 = 0.55 u2 + 0.45 v2 . ⇒ 49.43 = 55 u2 + 45 v2 .
⇒ 49.43 = 55 u2 + 45 (1.3027 - 1.2222 u)2 . ⇒

143.29 ± 143.292 - (4)(122.22)(26.94)
122.22 u2 - 143.29 u + 26.94 = 0. ⇒ u = .
(2)(122.22)
u = 0.9372 or 0.2352.
If u = 0.9372, then v = 1.3027 - (1.2222)(0.9372) = 0.1573.
If u = 0.2352, then v = 1.3027 - (1.2222)(0.2352) = 1.0153.
However, v = exp[-250/θ2] < 1, so the second set of roots is no good.
S(5000) = 0.55 exp[-5000/θ1] + 0.45 exp[-5000/θ2] = 0.55 u20 + 0.45 v20 =
(0.55)(0.937220) + (0.45)(0.157320) = 0.1503.

The estimated number of classes with more than 5000 claims is: (0.1503)(870) = 131.
Comment: The data was taken from “NCCIʼs 2007 Hazard Group Mapping,”
by John P. Robertson, Variance, Vol. 3, Issue 2, 2009.
exp[-250/θ1] = u = 0.9372. ⇒ θ1 = 3855. exp[-250/θ2] = v = 0.1573. ⇒ θ2 = 135.
8.20. B. (0.5)(13 + 1) = 7. (0.9)(13 + 1) = 12.6.

The smoothed empirical estimate of the 50th percentile is: 57.
From Appendix A of Loss Models, for the Weibull: VaRp (X) = θ {-ln(1-p)}1/τ.
⇒ 57 = θ {-ln(1-0.5)}1/τ = θ 0.693151/τ, and 704.6 = θ {-ln(1-0.9)}1/τ = θ 2.302591/τ.
Dividing these two equations: 12.3614 = 3.321921/τ. ⇒ τ = ln[3.32192] / ln[12.3614] = 0.4774.
⇒ θ = 704.6 / 2.302591/0.4774 = 122.8.

8.21. B. Perform percentile matching to fit the LogNormal.

(ln[2018] - µ)/ σ = 1.282. ⇒ 7.6099 - µ = 1.282σ.
(ln[2210] - µ) / s = 1.645. ⇒ 7.7007 - µ = 1.645σ.
⇒ 0.0908 = 0.363σ.⇒ σ = 0.250. ⇒ µ = 7.6099 - (1.282)(0.250) = 7.29.

Then the 99th percentile is: exp[7.29 + (2.326)(0.250)] = 2621 (in millions of dollars).
Comment: As discussed in “Mahlerʼs Guide to Risk Measures,” VaR90% is the 90th percentile.
8.22. As shown in Appendix A: VaRp (X) = θ {-ln(p)}−1/τ.
Thus, 20 = θ ln(1/0.31)-1/τ = θ 1.17121/τ, and 60 = θ ln(1/0.83)-1/τ = θ 0.18631/τ.
Dividing the two equations: 3 = 0.15911/τ. ⇒ τ = 1.673.
⇒ θ = (20)(1.17121/1.673) = 21.98.
Alternately, F(x) = exp[-(θ/x)τ].
0.31 = F(20) = exp[-(θ/20)τ]. ⇒ 1.1712 = (θ/20)τ.
0.83 = F(60) = exp[-(θ/60)τ]. ⇒ 0.1863 = (θ/60)τ.
⇒ 6.287 = 3τ. ⇒ τ = 1.673.

⇒ θ = (20)(1.17121/1.673) = 21.98.
8.23. E. 1/2 = S(64) = (1 - 64/110)c. ⇒ c = ln(0.5) / ln(0.4182) = 0.7951.

100 100
∫ S(x) dx ∫ (1 - x / 110)0.7951 dx (110 / 1.7951) (1 - 74 / 110)1.7951

e(74) = 74 = 74 = = 20.06.
S(74) (1 - 74 / 110)0.7951 0.4114
Comment: Modified De Moivre's Law, with ω = 110.

e(x) = (ω - x) / (c + 1). e(74) = (110 - 74) / 1.7951 = 20.05.
8.24. C. 1 - exp[-(1/θ)τ] = 0.4, and 1 - exp[-(64/θ)τ] = 0.75.
ln(0.6) = -(1/θ)τ, and ln(.25) = -(64/θ)τ. Dividing the two equations: ln(.25)/ln(.6) = 64τ.
ln[ln (0.25) / ln(0.6)] = τln(64). ⇒ τ = ln[ln(0.25) / ln (0.6)] / ln 64 ≅ 0.24.
8.25. D. The smoothed empirical estimate of the 25th percentile is the (0.25)(10 + 1) = 2.75th loss
⇔ (0.25)(1) + (0.75)(2) = 1.75.
The smoothed empirical estimate of the 75th percentile is the (.75)(10 + 1) = 8.25th loss
⇔ (0.75)(4) + (0.25)(5) = 4.25.
Set 0.75 = S(1.75) = α3.0625 + β1.75 + 1. ⇒ 0 = α12.25 + β7 + 1.
Set 0.25 = S(4.25) = α3.0625 + β4.25 + 1. ⇒ 0 = α72.25 + β17 + 3.

Multiplying the first equation by 17 and the second equation by 7 and subtracting:
0 = -297.5α - 4. ⇒ α = -0.0134. ⇒ β = -0.1194.
8.26. D. As shown in Appendix A: VaRp (X) = θ {p-1 - 1}-1/γ.
Thus, 100 = θ {1/0.2 - 1}-1/γ, and 400 = θ {1/0.8 - 1}-1/γ.
Dividing the two equations: 4 = 161/γ. ⇒ γ = 2. ⇒ θ = 400 / 41/γ = 400 / 41/2 = 200.
Alternately, F(x) = (x/θ)γ/{1 + (x/θ)γ}.

Since there are two parameters, we match percentiles at two points.
0.2 = F(100) = (100/θ)γ/(1+(100/θ)γ), and 0.8 = F(400) = (400/θ)γ/(1+(400/θ)γ) .
Therefore, (100/θ)γ = 1/(1-.2) - 1 = 1/4 and (400/θ)γ = 1/(1-.8) - 1 = 4 .

Therefore, γ ln100 - γln(θ) = ln(1/4), and γ ln400 - γ ln(θ)= ln(4).
Subtracting the two equations: {ln(400) - ln(100)}γ = ln(4) - ln(.25) .
Therefore, γ = {ln(4)-ln(.25)}/ {ln(400) - ln(100)} = ln(16)/ln(4) = 2ln(4)/ ln(4) = 2.
Therefore θ = 400 / 41/γ = 400 / 41/2 = 200.
Alternately, one can divide the two equations, and get (400/100)γ = 4/(1/4).
4γ = 16. Therefore, γ = 2 and θ = 200.

Comment: Let ξ1 = p1 / (1 - p1 ) = 0.2/0.8 = 1/4, ξ2 = p2 / (1 - p2 ) = 0.8/0.2 = 4.
Then γ = ln(ξ2 / ξ1 ) / ln(x2 / x1 ) = ln(16) / ln(400/100) = 2 ln(4) / ln(4) = 2,
and θ = x2 / ξ2 1/γ = 400 / 41/2 = 200.
8.27. D. Set 0.5 = S(3) = 4-λ3. 3λ = 1/2. λˆ = 1/6 = 0.167.
8.28. C. For the Pareto Distribution F(x) = 1 - (θ/(θ+x))α . We are given that
0.10 = F(θ−k) = 1 - (θ/(θ+θ−k))α and 0.90 = F(5θ−3k) = 1 - (θ/(θ+5θ−3k))α.
Therefore, 0.90 = (θ/(2θ−k))α and 0.10 = (θ/(6θ−3k))α = (θ/(2θ−k))α(3−α).
Dividing these two equations one gets: 0 9 = 3α. Therefore α = 2.
8.29. A. One matches at two percentiles:

1 - exp(-(1000 / θ)τ) = 0.25, and 1 - exp(-(100000 / θ)τ) = 0.75.
Therefore, (1000 / θ)τ = -ln(0.75) , and (100000 / θ)τ = -ln(0.25).
Dividing the equations, 100τ = ln(0.25)/ ln(0.75) = 4.8188. τ = ln(4.8188)/ln(100) = 0.341.

8.30. C. The empirical 25th percentile is the (0.25)(11) = 2.75th value, which is 6.
The empirical 75th percentile is the (0.75)(11) = 8.25th value, which is 9.
F(t) = 1 - exp[-(t/θ)τ]. 0.25 = F(6) = 1 - exp[-(6/θ)τ]. ⇒ (6/θ)τ = -ln(0.75) = .2877.
0.75 = F(9) = 1 - exp[-(9/θ)τ]. ⇒ (9/θ)τ = -ln(0.25) = 1.3863.
Dividing the two equations: (9/6)τ = 4.8186. ⇒ τ = ln(4.8186)/ln(1.5) = 3.88.

Comment: θ = 9/1.38631/3.88 = 8.27.
8.31. D. Since the Weibull Distribution has two parameters, we need to match at two percentiles, in
this case the 50th (the median) and the 90th.
1 - exp(-(10000 / θ)τ) = 0.5, and 1 - exp(-(100000 / θ)τ) = 0.90.
Therefore, (10000 / θ)τ = -ln(1-0.5) = 0.693, and (100000 / θ)τ = -ln(1-0.90) = 2.303.
Dividing the equations, 10τ = 2.303/.693 = 3.32. Therefore, τ = log10(3.32) = 0.521.
Alternately, for the Weibull: VaRp (X) = θ {-ln(1-p)}1/τ.
⇒ 10,000 = θ {-ln(1-0.5)}1/τ = θ 0.6931/τ, and 100,000 = θ {-ln(1-0.9)}1/τ = θ 2.3031/τ.

Proceed as before.
Comment: θ^ = 20,166. The given mean and standard deviation are not used.
8.32. A. One sets the distribution function of the LogNormal equal to the percentiles:
0.20 = F(18.25) = Φ[(ln(18.25) - µ)/σ], and 0 = F(35.80) = Φ[(ln(35.80) - µ)/σ].
Consulting the bottom of the Normal Distribution Table, this implies that:
-0.842 = (2.904 - µ)/σ, and 0.842 = (3.578 - µ)/σ.
Solving µ = 3.241 and σ = 0.400. For the fitted LogNormal,
S(30) = 1 - Φ[(ln(30) - 3.241)/0.400] = 1 - Φ(0.400) = 1 - 0.6554 = 0.3446.
8.33. A. Out of a total of 100 losses, there are 80 losses less than or equal to 400 and 20 greater
than 400, so 400 is the best estimate of the 80th percentile, given this grouped data.
Matching at the 80th percentile: 1 - e−400/θ = 0.8. ⇒ θ = -400/ln(0.2) = 249.
Comment: When we have grouped data, we do not really know the exact individual sizes.
Therefore, we just look for a place where the appropriate percentage of losses are less than or
equal to X. When we have ungrouped data, as is more commonly the case, then we use the
smoothed empirical estimate of percentiles. The ungrouped case is the one to know well.
8.34. E. The smoothed empirical estimate of the 40th percentile is:

(0.4)(11 + 1) = 4.8th loss ⇔ (86)(0.2) + (90)(0.8) = 89.2.
Similarly, 80th percentile is: (.8)(11 + 1) = 9.6th loss ⇔ (200)(0.4) + (210)(0.6) = 206.
As shown in Appendix A: VaRp (X) = θ {p-1 - 1}-1/γ.
Thus, 89.2 = θ {1/0.4 - 1}-1/γ, and 206 = θ {1/0.8 - 1}-1/γ.
Dividing the two equations: 2.30942 = 61/γ. ⇒ γ = 2.141. ⇒ θ = (1.51/2.141)(89.2) = 107.8.

Alternately, set F(89.2) = 0.4 and F(206) = 0.8.
(89.2/θ)γ / {1 + (89.2/θ)γ} = 0.4. ⇒ 1.5 = θγ/ 89.2γ.
(206/θ)γ / {1 + (206/θ)γ} = 0.8. ⇒ 0.25 = θγ/ 206γ.
⇒ 1.5/0.25 = (206/89.2)γ. ⇒ γ = ln(6)/ln(206/89.2) = 2.141. ⇒ θ = (1.51/2.141)(89.2) = 107.8.

Comment: Check: (89.2/107.8)2.141 / {1 + (89.2/107.8)2.141} = 0.6666/(1 + 0.6666) = 0.4.
(206/107.8)2.141 / {1 + (206/107.8)2.141} = 4/(1 + 4) = 0.8.
8.35. D. (0.4)(9 + 1) = 4. ⇒ The 40th percentile is 1.82.
(0.8)(9 + 1) = 8. ⇒ The 80th percentile is 12.66.
As shown in Appendix A: VaRp (X) = θ {-ln(p)}−1/τ.
Thus, 1.82 = θ ln(1/0.4)-1/τ = θ 1.091361/τ, and 12.66 = θ ln(1/0.8)-1/τ = θ 4.481421/τ.
Dividing the two equations: 6.95604 = 4.106271/τ. ⇒ τ = 0.728.
⇒ θ = (1.82)(0.91631/.728) = 1.61.
Alternately, F(x) = exp[-(θ/x)τ].
0.4 = F(1.82) = exp[-(θ/1.82)τ]. ⇒ 0.91629 = (θ/1.82)τ.
0.8 = F(12.66) = exp[-(θ/12.66)τ]. ⇒ 0.22314 = (θ/12.66)τ.
⇒ 4.106 = (12.66/1.82)τ = 6.956τ. ⇒ 1.412 = τ 1.9396. ⇒ τ = 0.728.

⇒ θ = (1.82)(.91631/.728) = 1.61.
8.36. E. As shown in Appendix A: VaRp (X) = θ {(1-p)-1/α - 1}1/γ = θ {1/(1-p) - 1}1/γ.
100 = θ (1/.75 - 1)1/γ = θ 0.33331/γ, and 500 = θ (1/.25 - 1)1/γ = θ 31/γ.
Dividing the two equations: 5 = 91/γ. ⇒ γ = ln9/ln5 = 1.3652. ⇒ θ = (100)(31/1.3652) = 223.6.

⎛ θγ ⎞ α θγ xγ
Alternately, F(x) = 1 - ⎜ γ ⎟ =1- γ = .
⎝ θ + xγ ⎠ θ + xγ θγ + xγ
100 γ
0.25 = F(100) = γ . ⇒ (0.75) 100γ = 0.25θγ.
θ + 100γ
500γ
0.75 = F(500) = γ . ⇒ (0.25) 500γ = 0.75θγ.
θ + 500 γ
Dividing the two equations: 5γ / 3 = 3. ⇒ γ = ln9/ln5 = 1.3652. ⇒ θ = (100)(31/1.3652) = 223.6.

Comment: A Burr Distribution with α = 1 is a Loglogistic Distribution.
8.37. E. (15 + 1)(0.3) = 4.8. ⇔ (280)(0.2) + (350)(0.8) = 336.
(15 + 1)(.65) = 10.4. ⇔ (450)(0.6) + (490)(0.4) = 466.
As shown in Appendix A: VaRp (X) = θ {(1-p)-1/α - 1}1/γ = θ {1/(1-p)0.5 - 1}1/γ.

1 1
336 = θ ( - 1)1/γ = θ 0.195231/γ, and 466 = θ ( - 1)1/γ = θ 0.690311/γ.
0.7 0.35
Dividing the two equations: 1.3869 = 3.53591/τ. ⇒ γ = 3.86. ⇒ θ = 513.
Alternately, Matching at the 30th percentile: 0.30 = 1 - {1 + (336/θ)γ}-2. ⇒ (336/θ)γ = 0.1952.
Matching at the 65th percentile: 0.65 = 1 - {1 + (466/θ)γ}-2. ⇒ (466/θ)γ = 0.6903.
Dividing the two equations: 1.387γ = 3.536. ⇒ γ = 3.86. ⇒ θ = 513.

8.38. E. (0.2)(16 + 1) = 3.4. (0.7)(16 + 1) = 11.9.

From Appendix A, for the Weibull: VaRp (X) = θ {-ln(1-p)}1/τ.
⇒ 77.4 = θ {-ln(1-0.2)}1/τ = θ 0.223141/τ, and 124.7 = θ {-ln(1-0.7)}1/τ = θ 1.203971/τ.
Dividing these two equations: 1.6111 = 5.39561/τ. ⇒ τ = 3.534. ⇒ θ = 118.3.

Alternately, percentile matching we get two equations in two unknowns:
0.2 = 1 - exp[-(77.4/θ)τ]. ⇒ (77.4/θ)τ = 0.22314.
0.7 = 1 - exp[-(124.7/θ)τ]. ⇒ (124.7/θ)τ = 1.20397.
Dividing these two equations: 1.6111τ = 5.3956. ⇒ τ = 3.534. ⇒ θ = 118.3.

Comment: Check: 1 - exp[-(77.4/118.3)3.534] = 0.200. 1 - exp[-(124.7/118.3)3.534] = 0.700.
8.39. B. From the graph, F(10) = 20% and F(100) = 60%.

Therefore, matching percentiles:
Φ[(ln10 - µ)/σ] = 0.2. ⇒ (ln10 - µ)/σ = -0.842. ⇒ ln10 - µ = -0.842σ.
Φ[(ln100 - µ)/σ] = 0.6. ⇒ (ln100 - µ)/σ = 0.25. ln100 - µ = 0.25σ.
⇒ σ = {ln(100) - ln(10)} / (0.25 + 0.842) = 2.109. ⇒ µ = ln(100) - (0.23)(2.109) = 4.078.

Mean of the LogNormal is: exp[4.078 + 2.1092 /2] = 546.
From the empirical distribution function, 20% of the data is 10, 40% of the data is 100, and 40% of
the data is 1000. The mean is: (20%)(10) + (40%)(100) + (40%)(1000) = 442.
The difference between the means is: 546 - 442 = 104.
Comment: A somewhat unusual question and a little long.
If one does not round the 60th percentile of the Standard Normal Distribution to 0.25, then
Φ[(ln10 - µ)/σ] = 0.2. ⇒ (ln10 - µ)/σ = -0.842. ⇒ ln10 - µ = -0.842σ.
Φ[(ln100 - µ)/σ] = 0.6. ⇒ (ln100 - µ)/σ = 0.253. ⇒ ln100 - µ = 0.253σ.
⇒ σ = {ln(100) - ln(10)} / (0.253 + 0.842) = 2.103. ⇒ µ = ln(100) - (0.253)(2.103) = 4.073.

Mean of the LogNormal is: exp[4.073 + 2.1032 /2] = 536.
The difference between the means is: 536 - 442 = 94, resulting in the same letter solution.
2016-C-6, Fitting Loss Distributions §9 Method of Moments, HCM 10/22/15, Page 179
Section 9, Method of Moments
One can fit a type of distribution to data via the Method of Moments, by finding that set of
parameters such that the moments (about the origin) of the given distribution match the observed
moments. If one has a single parameter, such as in the case of the Exponential Distribution, then one
matches the observed mean to the theoretical mean of the loss distribution.
Fitting the Exponential to the ungrouped data set in Section 2 using the Method of Moments:
θ = E[X] = 312,675.
Note that for the Exponential the Method of Maximum Likelihood applied to ungrouped
data matches the result for the Method of Moments. Applying either one of these to fit an
exponential are commonly asked on exams. Know how to do this!
Ungrouped Data Fitted Exponential

Mean 312,675 312,675
Coefficient of Variation 2.01 1.00
Skewness 4.83 2.00
Thus we expect the Exponential has much too thin a righthand tail in order to properly fit the
ungrouped data in Section 2. Fortunately there are other distributions to choose from, such as those
in Appendix A of Loss Models.
In general, given a distribution with n parameters, one can try to match the first n moments.79 For
many of the two parameter distributions in the Appendix A of Loss Models, formulas are given for
fitted parameters via the method of moments in terms of the first two moments.80 If one has a
distribution with two parameters, then one can either match the first two moments or match the mean
and the variance, whichever is easier.81
So for example, if we try to fit a Gamma distribution to the data in Section 2, then we can match the
first two moments, since the Gamma has two parameters alpha and theta:
E[X] = 3.12674 x 105 = αθ.
E[X2 ] = 4.9284598 x 1011 = α (α+1) θ2.
79
If one has three parameters then one attempts to match the first three moments. You are extremely unlikely to be
asked a method of moments question involving more than two parameters.
80
These formulas are not included in the Appendix attached to the exam. Formulas are given for the Pareto, Gamma,
Inverse Gamma, LogNormal, Inverse Gaussian, and Beta (for fixed scale parameter). In their formulas in Appendix A
of Loss Models, m stands for the first moment and t stands for the second moment.
81
One gets the same answer either way.
Therefore, E[X2 ] - E[X]2 = αθ2 = 3.951 x 1011.
Therefore θ = (E[X2 ] - E[X]2 )/ E[X] = 1.264 x 106 .

Thus α = E[X2 ] / (E[X2 ] -E[X]2 ) = 0.2475.82
Exercise: Fit the Pareto Distribution to the data in Section 2.

[Solution: Set the first moments equal: 3.12674 x 105 = θ/(α−1),
and set the second moments equal: 4.9284598 x 1011 = 2θ2/{(α−1)(α-2)}.

Dividing the second equation by the square of the first, eliminates θ:
4.9284598 x 1011 / (3.12674 x 105 )2 = 5.0411 = 2(α−1)/(α-2). ⇒ 3.0411α = 2(4.0411).
⇒ α = 2.65 ⇒ θ = (3.12674 x 105 )(2.658 - 1) ≅ 518,000.

Alternately, using the formulas for the method of moments in Appendix A of Loss Models:83
m = E[X] = 3.12674 x 105 . t = E[X2 ] = 4.9284598 x 1011 .
α = 2 (t - m2 ) /(t - 2m2 ) = 2( 3.951 x 1011) / ( 2.973159 x 1011) = 2.658.

θ = mt / (t - 2m2 ) = 1.541001 x 1017 / (2.973159 x 1011) = 518,304.]
Exercise: Set up the equations to be solved in order to fit a Weibull Distribution to the data in
Section 2 using the Method of Moments.
[Solution: One matches the first two moments.
θ Γ[1+ 1/τ] = 3.12674 x 105 . θ2 Γ[1+ 2/τ] = 4.9284598 x 1011.

One could eliminate theta and get an equation to solve numerically for tau:84
Γ[1+ 2/τ] / Γ[1+ 1/τ] 2 = 4.9284598 x 1011 / (3.12674 x 105 )2 = 5.0411.
Comment: One can numerically solve for τ = 0.5406 and then
θ = 312,674/ Γ[1+ 1/τ] = 312,674/1.74907 = 178,765.]
82
This matches the formulas in Appendix A of Loss Models, with m = µ1 and t = µ2 ′.
83
The formulas for method of moments will not be attached to your exam in the abridged version of Appendix A.
84
Note that in the case of the Weibull one can reduce to one equation in one unknown to be solved numerically. In
more complicated cases, the method of moments could be performed by solving numerically two equations in two
unknowns. However, this may be wasted effort due to the inherent lack of accuracy of the method of moments.
Exercise: Fit the LogNormal Distribution to the ungrouped data in Section 2.

[Solution: Set the first moments equal: 3.12674 x 105 = exp(µ + σ2/2),
and set the second moments equal: 4.9284598 x 1011 = exp(2µ + 2σ2).
Dividing the second equation by the square of the first, eliminates mu:
4.9284598 x 1011 / (3.12674 x 105 )2 = 5.0411 = exp(σ2). ⇒ σ = 1.272.
µ = ln(3.12674 x 105 ) - (1.2722 )/2 ≅ 11.84.

Alternately, using the method of moments in Appendix A of Loss Models:
m = E[X] = 3.12674 x 105 . t = E[X2 ] = 4.9284598 x 1011.
σ= ln(t) - 2 ln(m) = 1.6176 = 1.2718. µ = ln(m) - σ 2/2 = 11.844.]
It is relatively easy to check the results of fitting via the method of moments. For example, for a
LogNormal with parameters µ = 11.844 and σ = 1.2718, the first moment is:
exp(11.844 + 1.27182 /2) = 312,618, and the second moment is:
exp[2(11.844) + 2(1.27182 )] = e26.923 = 4.9259 x 1011. These do indeed match, subject to
rounding, the first two moments of the ungrouped data in Section 2.
Exercise: Using the fitted LogNormal Distribution, estimate the median of the distribution from which
the ungrouped data in Section 2 was drawn.
[Solution: The median of a LogNormal Distribution is eµ. e11.844 = 139,246.
Comment: This differs from the smoothed empirical estimate of the median, the average of the 65th
and 66th values: (119,300 + 122,000)/2 = 120,650.]
Exercise: Fit via Method of Moments the Inverse Gaussian Distribution to the ungrouped data in
Section 2.
[Solution: µ = E[X] and µ3/θ = E[X2 ] - E[X]2. Thus θ = E[X]3 / (E[X2 ] - E[X]2 ).
E[X] = 3.12674 x 105 . E[X2 ] = 4.9284598 x 1011. µ = 312,674, and θ = 77,373.]
Parameters of curves fit to the ungrouped data in Section 2 by the Method of Moments:85
Pareto: α = 2.658, θ = 518,304
Weibull: θ = 178,765, τ = 0.5406
Gamma: α = 0.2475, θ = 1.264 x 106
LogNormal: µ = 11.844, σ =1.2718
Inverse Gaussian µ = 312,674, θ = 77,373.
85
The Weibull had to be fit via computer, since the formula for the moments involves an Gamma functions.
C V2 = Variance / E[X]2 = (E[X2 ] - E[X]2 ) / E[X]2 = E[X2 ] / E[X]2 - 1.
E[X2 ]
1 + CV2 = .
E[X]2
The coefficient of variation depends on the shape parameter(s), not the scale parameter. In many
examples of method of moments with two parameters, we divide the second moment by the
square of the first moment, eliminating the scale parameter and thus solving for the shape parameter.
Grouped Data:
Similarly, one can use the method of moments to fit curves to the grouped data in Section 3.
Fitting the Exponential to the grouped data set in Section 3 using the Method of Moments, one
matches the first moment to the single parameter: θ = E[X] = 15,738.
Grouped
Data Fitted Exponential
Mean 15738 15738
Coefficient of Variation ≈1 1
Skewness ≈3 2
Comparing skewness, we expect the exponential has a righthand tail that is a little too thin in order to
properly fit the grouped data in Section 3.
In order to fit distributions with two parameters or more, one has to use estimates for the moments,
such as those that were made in a previous section.
For this particular grouped data set, one will run into a problem trying to fit a Pareto, since the Pareto
has too heavy a tail to fit this data. For the Pareto:
E[X] = θ/ (α−1) = 1.57 x 104 . E[X2 ] = 2 θ2 / (α−2) (α−1) = 4.88 x 108 .

Solving for alpha and theta:
θ = E[X] E[X2 ]/ (E[X2 ]- 2E[X]2 ) = 7.66 x 1012 / (-0.0498 x 108 ) = -1.53 x 106 .
α = 2 ( E[X2 ] - E[X]2 ) / (E[X2 ] - 2E[X]2 ) = 4.83 x 108 / (-0.0498 x 108 ) = -97.

This is not a viable solution, since both alpha and theta are supposed to be positive. In general, one
will run into difficulty trying to fit a Pareto to a data set with a coefficient of variation less than or equal to
one.86
Parameters of curves fit to grouped data in Section 3 by the Method of Moments:87
Weibull: θ = 15774, τ = 1.011

Gamma: α = 1.021 , θ = 15,385
LogNormal: µ = 9.320, σ = 0.8264
Inverse Gaussian: µ = 15,700 and θ = 16,024.
Exercise: A set of data has first moment of 1302 and second moment 4,067,183.
Fit a Pareto Distribution via the method of moments.
[Solution: E[X] = θ/ (α-1) = 1302. E[X2 ] = 2 θ2 / {(α−2) (α-1)} = 4,067,183.
Solving for alpha and theta:
θ = E[X] E[X2 ] / (E[X2 ] - 2E[X]2 ) = 5,295,472,266 / 676,775 = 7825.
α = 2 (E[X2 ] - E[X]2 ) / (E[X2 ] - 2E[X] 2 ) = 4,743,958 /676,775 = 7.01.

Comment: These are the same formulas as in Appendix A of Loss Models,88
with m = 1302 and t = 4,067,183.]
86
The Pareto has a coefficient of variation greater than 1. As alpha gets very large, the Pareto approaches an
exponential distribution and the coefficient of variation approaches 1.
87
Using estimated first moment of 1.57 x 104 and estimated second moment of 4.88 x 108 .
88
The formulas for method of moments will not be attached to your exam in the abridged version of Appendix A.
Problems:
Use the following information to answer the following four questions:

You observe the following five claims: 410, 1924, 2635, 4548, 6142.
9.1 (1 point) Using the method of moments, a LogNormal distribution is fit to this data.
What is the value of the fitted µ parameter?
A. less than 7.8
E. at least 8.1
9.2 (2 points) Using the method of moments, a LogNormal distribution is fit to this data.
What is the value of the fitted σ parameter?
A. less than 0.6
E. at least 0.9
9.3 (1 point) Using the method of moments, a Normal distribution is fit to the natural logarithms of this
data. What is the value of the fitted µ parameter?
A. less than 7.8
E. at least 8.1
9.4 (2 points) Using the method of moments, a Normal distribution is fit to the natural logarithms of
this data. What is the value of the fitted σ parameter?
A. less than 0.6
E. at least 0.9
9.5 (1 point) An exponential distribution F(x) = 1 - e-x/θ is fit to the following size of claim data by
the method of moments. What is the value of the fitted parameter θ?

Range($) # of claims loss ($000)
0-100 6300 300
100-200 2350 350
200-300 850 200
300-400 320 100
400-500 110 50
over 500 70 50
10000 1050
A. less than 100
E. at least 130
The following information should be used to answer the next two questions:
10 Claims have been observed: 1500, 5500 3000, 3300, 2300, 6000, 5000, 4000, 3800, 2500.
The underlying distribution is assumed to be Gamma, with parameters α and θ unknown.
9.6 (2 points) In what range does the method of moments estimator of θ fall?
A. less than 400
E. at least 700
9.7 (1 point) In what range does the method of moments estimator of α fall?
A. less than 4
E. at least 7
9.8 (1 point) An insurer writes a health insurance policy with a coinsurance factor of 80%.
The insurer makes 623 payments for a total of $184,013.
The insurer assumes the losses, prior to the effect of the coinsurance factor, follow a distribution:
1
F(x) = 1 - , with E[X] = θπ/4.
{1 + (x / θ)2}2
What is the method of moments fitted value of θ?
A. 410 B. 430 C. 450 D. 470 E. 490
9.9 (2 points) You observe the following 6 claims: 162.22, 151.64, 100.42, 174.26, 20.29, 34.36.
A Distribution: F(x) = 1 - e-qx, x > 0, is fit to this data via the Method of Moments.
A. less than 0.006
B. at least 0.006 but less than .007
C. at least 0.007 but less than .008
D. at least 0.008 but less than .009
E. at least 0.009
Use the following information in the next six questions:

You observe the following 10 claims:
1729, 101, 384, 121, 880, 3043, 205, 132, 214, 82.
9.10 (2 points) You fit this data via the method of moments to a Pareto Distribution.
Determine α.
A. 4.0 B. 4.5 C. 5.0 D. 5.5 E. 6.0
9.11 (1 point) You fit this data via the method of moments to a Pareto Distribution. Determine θ.
A. 2400 B. 2500 C. 2600 D. 2700 E. 2800
9.12 (1 point) You fit this data via the method of moments to an Inverse Gaussian Distribution.
In which of the following intervals is µ?
A. less than 400
E. at least 700
9.13 (1 point) You fit this data via the method of moments to an Inverse Gaussian Distribution.
In which of the following intervals is θ?
A. less than 350
E. at least 380
9.14 (1 point) You fit this data via the method of moments to an Inverse Gamma Distribution.
In which of the following intervals is α?
A. less than 1
E. at least 4
9.15 (1 point) You fit this data via the method of moments to an Inverse Gamma Distribution.
In which of the following intervals is θ?
A. less than 1000
E. at least 1300
9.16 (3 points) The following five losses have been observed:

$500, $1,000, $1,500, $2,500, $4,500.
Use the method of moments to fit a LogNormal Distribution.
Use this LogNormal Distribution to estimate the probability that a loss will exceed $4,500.
A. Less than 5%
E. At least 8%
Use the following information in the next two questions:

You observe the following 10 values: 0.21, 0.40, 0.14, 0.65, 0.53, 0.92, 0.30, 0.44, 0.76, 0.07.
The underlying distribution is a Beta Distribution as per Loss Models, with θ = 1.
The parameters a and b are fit to this data via the Method of Moments.
9.17 (2 points) In which interval is the fitted a?

A. less than 1.0
E. at least 1.6
9.18 (2 points) In which interval is the fitted b?

A. less than 1.0
E. at least 1.6

• The random variable X has the density function
f(x) = 0.4 exp(-x/δ1)/δ1 + 0.6 exp(-x/δ2)/δ2 , 0 < x < ∞, 0 < δ1 < δ2.
• A random sample taken of the random variable X has mean 4 and variance 27.
Determine the method of moments estimate of δ2.
A. Less than 4.5
E. At least 6.0

(i) Claim amounts follow a shifted exponential distribution with probability density
function: f(x) = e-(x-δ)/θ/θ, δ < x < ∞.
(ii) A random sample of claim amounts X1 , X2 ,..., X10:
15 16 18 20 24 34 41 52 66 75
Estimate δ by matching both the mean of the shifted exponential distribution to the empirical mean,
and the median of the shifted exponential distribution to the smoothed empirical estimate of the
median.
(A) 12.5 (B) 13.0 (C) 13.5 (D) 14.0 (E) 14.5
9.21 (1 point)
You are modeling a claim process as a mixture of two independent distributions A and B.
Distribution A is exponential with mean 1.
Distribution B is exponential with mean 10.
Positive weight p is assigned to distribution A.
The sample mean is 3.
Determine p using the method of moments.
A. Less than 0.80
E. At least 0.95
9.22 (3 points) The following data have been collected:

Year Number Of Claims Average Claim Size
1 1732 22,141
2 2007 22,703
3 1920 24,112
4 1851 24,987
Inflation is assumed to be 4% per year.
A LogNormal distribution with parameters σ = 3 and µ is used to model the claim size
distribution. Estimate µ for Year 6 using the Method of Moments.
(A) 5.66 (B) 5.67 (C) 5.68 (D) 5.69 (E) 5.70
9.23 (3 points) The portion of games won during a season by each of the 14 baseball teams in the
American League were: 0.407, 0.426, 0.426, 0.444, 0.463, 0.469, 0.488, 0.512, 0.543, 0.543,
0.580, 0.580, 0.593, 0.593.
Fit via the method of moments a Beta Distribution with θ = 1.
9.24 (2 points) The parameter of the Inverse Exponential distribution is to be estimated using the
method of moments based on the following data:
2 5 11 28 65 143
Estimate θ by matching the kth moment with k = -1.
(A) Less than 10
(E) At least 25
9.25 (3 points) You are modeling a claim process as a mixture of two independent distributions A
and B.
You are given:
(i) Distribution A is exponential.
(ii) Distribution B is exponential.
(iii) Weight 0.8 is assigned to distribution A.
(iv) The mean of the mixture is 10.
(iv) The variance of the mixture is 228.
Determine the mean of Distribution B using the method of moments.
(A) 22 (B) 24 (C) 26 (D) 28 (E) 30
9.26 (3 points) If Y follows a Poisson Distribution with parameter λ, then for c > 0, cY follows an
“Over-dispersed Poisson” Distribution with parameters c and λ.
The distribution of Aggregate Losses has a mean of 10 and variance of 200.
Fit via the method of moments an Over-dispersed Poisson Distribution to the distribution of
aggregate losses. Estimate the probability that the aggregate losses are less than their mean.
A. Less than 55%
E. At least 70%
9.27 (2 points) From a complete mortality study of 7 laboratory animals, you are given:
(i) The times of death, in weeks, are: 2, 3, 4, 5, 6, 8, 14.
(ii) The operative survival model is assumed to be Exponential with mean θ.
(iii) θ1 is the estimate of the parameter θ using percentile matching at the median.
(iv) θ2 is the estimate of the parameter θ using the method of moments.
Calculate the absolute difference in the estimated survival functions at 10 using θ1 and θ2.
(A) 0.02 (B) 0.03 (C) 0.04 (D) 0.05 (E) 0.06
9.28 (2 points) A set of data has a sample mean and median of 830 and 410, respectively.
You fit a LogNormal Distribution by matching these two sample quantities to the corresponding
population quantities. What is the fitted value of the parameter σ?
(A) 1.1 (B) 1.2 (C) 1.3 (D) 1.4 (E) 1.5
9.29 (2 points) The following claim data were generated from a Pareto distribution:
303 30 35 78 12
Using the method of moments to estimate the parameters of a Pareto distribution,
calculate the loss elimination ratio at 10.
(A) 2% (B) 4% (C) 6% (D) 8% (E) 10%
9.30 (5 points) You are given following data for 3048 professional liability insurance claims:
Claim Size Interval Number of Claims
0 to 5000 1710
5001 to 25,000 968
25,001 to 100,000 343
100,001 to 250,000 23
250,001 to 500,000 4
More than 500,000 0
Assume that the sizes are uniformly distributed on each interval.
Fit a LogNormal Distribution via the Method of Moments.
9.31 (2 points) We have the following data from Pareto Distribution with α = 6.
50, 100, 200, 250, 600, 1200.
Using the method of moments, estimate E[(X - 500)+].
A. 120 B. 130 C. 140 D. 150 E. 160

(i) Losses in Year i follow a gamma distribution with parameters αi and θi.
(ii) αi = α, for i = 1, 2, 3,…
(iii) The parameters θi vary in such a way that there is an annual inflation rate of 8% for losses.
(iv) The following is a sample of six losses:
Year 1: 100 300
Year 2: 200 500
Year 3: 100 1000
Using trended losses, determine the method of moments estimate of θ5.
(A) 300 (B) 400 (C) 500 (D) 600 (E) 700
9.33 (3 points) The parameters of an Inverse Pareto distribution are to be estimated based on the
following data by matching kth moments with k = -1 and k = -2:
10 30 100 300 800
Determine S(1000) for the fitted Inverse Pareto.
A. 3% B. 4% C. 5% D. 6% E. 7%
9.34 (3 points) You are given the following data on the sizes of 9156 policies.
X, Policy Premium ($000) Number of Policies
2 8000
5 1000
25 150
100 5
250 1
Let X be the policy premium in thousands of dollars.
You assume ln[X] follows a Gamma Distribution.
Fit the Gamma Distribution to ln[X] via method of moments.

• The random variables X has the density function
α 10 α
f(x) = , 0 < x < ∞, α > 0.
(x +10)α + 1
• A random sample of size 7 is taken of the variable X: 1, 3, 7, 10, 18, 21, 37.
Determine the method of moments estimate of α.
A. 1.72 B. 1.74 C. 1.76 D. 1.78 E. 1.80
9.36 (2 points) You observe the following 5 values: 12, 41, 35, 3, 67.
The underlying distribution is a Beta Distribution with θ = 100.
The parameters a and b are fit to this data via the Method of Moments.
Determine the value of the fitted b.
(A) 1.8 (B) 2.0 (C) 2.2 (D) 2.4 (E) 2.6
9.37 (2 points) Four losses are observed from a Gamma distribution.

The observed losses are: 600, 1100, 1500, and 2500.
Find the method of moments estimate for θ.
A. 320 B. 340 C. 360 D. 380 E. 400
9.38 (2 points) The following six losses have been observed:

5, 20, 35, 60, 85, 180
Use the method of moments to fit a LogNormal Distribution.
Use this LogNormal Distribution in order to estimate E[X3 ].
9.39 (2 points) X is a two-parameter Pareto random variable with parameters θ and α.

You are given a sample of size 1000 and the following information:
1000
∑ xi = 198,000.
i=1
1000
∑ xi2 = 148,322,000.
i=1
Find the Method of Moments estimate for Prob[300 < X < 800].
(A) 11% (B) 13% (C) 15% (D) 17% (E) 19%
9.40 (2 points) X is a two-parameter Pareto random variable with parameters θ and α.

e(1000) = 1000, and e(4000) = 2000.
Determine θ and α.
9.41 (4, 5/86, Q.55) (2 points) X1 , X2 ,..., Xn is an independent sample drawn from a lognormal
distribution with parameters µ and σ2.

n n
∑ Xi ∑ (Xi - X)2
1 1
Let X = S2 =
n n- 1
i=1 i=1
In terms of X and S2 obtain estimators for µ and σ2 using the method of moments.
A. µ = X ; σ2 = S2 / X B. µ = ln X ; σ2 = ln S2
C. µ = X ; σ2 = ln (S2 + X 2 ) D. µ = 0.5 ln[ X 3 / ( X + S2 )] ; σ2 = ln (S2 - X 2 )
E. µ = ln X - 0.5 ln (1 + S2 / X 2 ) ; σ2 = ln (1 + S2 / X 2 )
9.42 (160, 11/86, Q.12) (2.1 points) A random sample of death records yields the following exact
ages at death: 30, 50, 60, 60, 70, 90.
The age at death of the population from which the sample is drawn follows a gamma distribution.
The parameters α and θ are estimated using the method of moments.
Determine the estimate of α.
(A) 6.0 (B) 7.2 (C) 9.0 (D) 10.8 (E) 12.2
9.43 (4, 5/87, Q.60) (1 point) Using the method of moments what is an estimate of the mean of a
lognormal distribution given the sample: 3, 4.5, 6, 6.25, 6.5, 6.75, 7, 7.5, 8.5, 10
A. Less than 6
B. At least 6, but less than 6.25
E. 6.75 or more.
9.44 (4, 5/88, Q.53) (2 points) Given the distribution f(x) = axa-1, 0 < x < 1, 0 < a < ∞,
and the sample 0.7, 0.14, 0.8, 0.9, 0.65; what is the method of moments estimate for a?
A. Less than 1.0
E. 1.9 or more

(i) Five lives are observed from time t = 0 until death.
(ii) Deaths occur at t = 3, 4, 4, 11 and 18.
Assume the lives are subject to the probability density function f(t) = t e-t/c / c2 , t > 0.
Determine c by the method of moments.
(A) 1/4 (B) 1/2 (C) 1 (D) 2 (E) 4
9.46 (2, 5/90, Q. 31) (1.7 points) Let X be a continuous random variable with density function
f(x ; θ) = x(1−θ)/θ / θ for 0 < x < 1, where θ > 0. What is the method-of-moments estimator of θ?
1 - X X - 1 X X 1
A. B. C. D. E.
X X 1 - X X - 1 1 + X
9.47 (4, 5/90, Q.34) (2 points) The following observations: 1000, 850, 750, 1100, 1250, 900,
are a random sample taken from a Gamma distribution with unknown parameters α and θ.
In what range does the method of moments estimators of α fall?
A. Less than 30
E. 60 or more
9.48 (160, 11/90, Q.19) (1.9 points)

From a complete mortality study of 10 laboratory mice, you are given:
(i) The times of death, in days, are 2, 3, 4, 5, 5, 6, 8, 10, 11, 11.
(ii) The operative survival model is assumed to be uniform from 0 to ω.
(iii) ω1 is the estimate of the uniform parameter ω using percentile matching at the median.
(iv) ω2 is the estimate of the uniform parameter ω using the method of moments.
Calculate ω1 - ω2.
(A) -2 (B) -1 (C) 0 (D) 1 (E) 2
9.49 (4, 5/91, Q.41) (2 points) A large sample of claims has an observed average claim size of
$2,305 with a variance of 989,544. Assuming the claim severity distribution to be lognormal,
estimate the probability that a particular claim exceeds $3,000.
(Use the moments of the lognormal distribution.)
A. Less than 0.14
E. At least 0.26
9.50 (4, 5/91, Q.46) (2 points) The following sample of 10 claims is observed:
1500 6000 3500 3800 1800 5500 4800 4200 3900 3000.
The underlying distribution is assumed to be Gamma, with parameters α and θ unknown.
In what range does the method of moments estimators of θ fall?
A. Less than 250
E. 400 or more
• Losses follow a LogNormal distribution with parameters µ and σ.
• The following five losses have been observed: $500, $1,000, $1,500, $2,500, $4,500.
Use the method of moments to fit a Normal Distribution to the natural logs of the loss sizes.
Use the corresponding LogNormal Distribution to estimate the probability that a loss will exceed
$4,500.
A. Less than 0.01
E. At least 0.13
9.52 (4B, 5/92, Q.26) (1 point) The random variable X has the density function with parameter β
given by f(x;β) = (1/β2 ) x exp[-.5 (x/β)2 ]; x > 0, β > 0.
Where E[X] = (β/2) 2π and the variance of X is: 2β2 - (π/2)β2 .
You are given the following observations of X: 4.9, 1.8, 3.4, 6.9, 4.0.
Determine the method of moments estimate of β.
A. Less than 3.00
E. At least 3.45
From a complete mortality study of five lives, you are given:
(i) The underlying survival function is exponential with hazard rate λ.
(ii) Deaths occur at times 1, 2, t3 , t4 , 9, where 2 < t3 < t4 < 9.
(iii) The estimate of λ using the Method of Moments is 0.21.
(iv) The estimate of λ using the Percentile Matching at the median is 0.21.
Calculate t4 .
(A) 6.5 (B) 7.0 (C) 7.5 (D) 8.0 (E) 8.5
f(x) = αxα−1, 0 < x < 1, α > 0.
• A random sample of three observations of X yields the values
0.40, 0.70, 0.90.
Determine the method of moments estimate of α.
A. Less than 0.5
E. At least 3.5
• The random variable X has the density function f(x) = 2(θ - x)/θ2, 0 < x < θ.
• A random sample of two observations of X yields the values 0.50 and 0.90.
Determine the method of moments estimator of θ.
A. Less than 0.45
E. At least 1.95
f(x) = α(x + 1)−(α+1), 0 < x < ∞, α > 0.
• A random sample of size n is taken of the random variable X.
• The values in the random sample totals nµ.
Assuming α > 1, determine the method of moments estimator of α.
A. µ B. µ / (µ - 1) C. µ / (µ + 1) D. (µ - 1) / µ E. (µ + 1) / µ

f(x) = 0.5 exp(-x/λ1)/λ1 + 0.5 exp(-x/λ2)/λ2, 0 < x < ∞, 0 < λ1 ≤ λ2.
• A random sample taken of the random variable X has mean 1 and variance k.
9.57 (4B, 11/98, Q.25) (3 points) If k is 3/2, determine the method of moments estimate of λ1.
A. Less than 1/5
B. At least 1/5, but less than 2/5
C. At least 2/5, but less than 3/5
D. At least 3/5, but less than 4/5
E. At least 4/5
9.58 (4B, 11/98, Q.26) (2 points) Determine the values of k for which method of
moments estimates of λ1, and λ2 exist.
A. 0 < k B. 0 < k < 3 C. 0 < k < 2 D. 1 ≤ k E. 1 ≤ k < 3
f(x) = wf1 (x) + (1-w)f2 (x), 0 < x < ∞, 0 ≤ w ≤ 1.
• A single observation of the random variable X , yields the value 1.
∞
• ∫ x f1 (x) dx = 1
0
∞
• ∫ x f2 (x) dx = 2
0
• f2 (x) = 2f1 (x) ≠ 0

Determine the method of moments estimate of w.
A. 0 B. 1/3 C. 1/2 D. 2/3 E. 1
From a laboratory study of ten lives, you are given:
(i) From the observed data, the following values of S(t) are estimated:
t ˆ (t)
S
5 0.8
8 0.6
9 0.4
12 0.2
19 0.0
(ii) A Weibull distribution is to be fitted to the sample data by using percentile matching at the
20th and 60th percentiles.
Calculate S(8), the estimated probability of surviving to time 8, using the fitted Weibull.
(A) 0.45 (B) 0.50 (C) 0.55 (D) 0.60 (E) 0.65
Interval Number of Losses Sum Sum of Squares
(0,2000] 39 38,065 52,170,078
(2000,4000] 22 63,816 194,241,387
(4000,8000] 17 96,447 572,753,313
(8000, 15000] 12 137,595 1,628,670,023
(15,000 ∞) 10 331,831 17,906,839,238
Total 100 667,754 20,354,674,039
A Pareto Distribution is fit to this data using the method of moments.
Determine the parameter estimates.
9.62 (4, 5/00, Q.36) (2.5 points) You are given the following sample of five claims:
4 5 21 99 421
You fit a Pareto distribution using the method of moments.
Determine the 95th percentile of the fitted distribution.
(A) Less than 380
(E) At least 425
9.63 (4, 11/00, Q.2) (2.5 points)
The following data have been collected for a large insured:
1 100 10,000
2 200 12,500
Inflation increases the size of all claims by 10% per year.
A Pareto distribution with parameters α = 3 and θ is used to model the claim size
distribution.
Estimate θ for Year 3 using the method of moments.
(A) 22,500 (B) 23,333 (C) 24,000 (D) 25,850 (E) 26,400
9.64 (4, 5/01, Q.39) (2.5 points) You are modeling a claim process as a mixture of two
independent distributions A and B. You are given:
(i) Distribution A is exponential with mean 1.
(ii) Distribution B is exponential with mean 10.
(iii) Positive weight p is assigned to distribution A.
(iv) The standard deviation of the mixture is 2.
Determine p using the method of moments.
(A) 0.960 (B) 0.968 (C) 0.972 (D) 0.979 (E) 0.983
(i) Claim amounts follow a shifted exponential distribution with probability density function:
f(x) = e-(x-δ)/θ/θ, δ < x < ∞.
(ii) A random sample of claim amounts X1 , X2 ,..., X10:
5 5 5 6 8 9 11 12 16 23
(iii) Σ Xi = 100 and Σ Xi2 = 1306
Estimate δ using the method of moments.

(A) 3.0 (B) 3.5 (C) 4.0 (D) 4.5 (E) 5.0
9.66 (4, 11/03, Q.8 & 2009 Sample Q.6) (2.5 points)
For a sample of dental claims x1 , x2 ,..., x10, you are given:
(i) Σ xi = 3860 and Σ xi2 = 4,574,802.
(ii) Claims are assumed to follow a lognormal distribution with parameters µ and σ.
(iii) µ and σ are estimated using the method of moments.
Calculate E[X ∧ 500] for the fitted distribution.
(A) Less than 125
(E) At least 275
(i) A sample x1 , x2 ,..., x10 is drawn from a distribution with probability density function:
{e-x/θ/θ + e-x/σ/σ}/2, 0 < x < ∞.

(ii) θ > σ
(iii) Σxi = 150 and Σxi2 = 5000
Estimate θ by matching the first two sample moments to the corresponding population quantities.
(A) 9 (B) 10 (C) 15 (D) 20 (E) 21
9.68 (4, 11/04, Q.14 & 2009 Sample Q.143) (2.5 points)
The parameters of the inverse Pareto distribution
F(x) = {x/(x + θ)}τ
are to be estimated using the method of moments based on the following data:
15 45 140 250 560 1340
Estimate θ by matching kth moments with k = -1 and k = -2.
(A) Less than 1
(E) At least 50
9.69 (2 points) Using the data in the prior question, 4, 11/04, Q.14, the parameters of the
Inverse Gamma distribution are to be estimated using the method of moments.
Estimate θ by matching kth moments with k = -1 and k = -2.
(A) 26 (B) 28 (C) 30 (D) 32 (E) 34
9.70 (CAS3, 5/05, Q.19) (2.5 points) Four losses are observed from a Gamma distribution.
The observed losses are: 200, 300, 350, and 450.
Find the method of moments estimate for α.
A. 0.3 B. 1.2 C. 2.3 D. 6.7 E. 13.0
9.71 (4, 5/05, Q.24 & 2009 Sample Q.193) (2.9 points)
The following claim data were generated from a Pareto distribution:
130 20 350 218 1822.
Using the method of moments to estimate the parameters of a Pareto distribution, calculate
the limited expected value at 500.
(A) Less than 250
(E) At least 340
(i) Losses on a certain warranty product in Year i follow a lognormal distribution
with parameters µi and σi.
(ii) σi = σ, for i = 1, 2, 3,…
(iii) The parameters µi vary in such a way that there is an annual inflation rate of 10%
for losses.
(iv) The following is a sample of seven losses:
Year 1: 20 40 50
Year 2: 30 40 90 120
Using trended losses, determine the method of moments estimate of µ3.
(A) 3.87 (B) 4.00 (C) 30.00 (D) 55.71 (E) 63.01
9.73 (4, 5/07, Q.10) (2.5 points) A random sample of observations is taken from a shifted
exponential distribution with probability density function:
f(x) = e-(x-δ)/θ/θ, δ < x < ∞.
The sample mean and median are 300 and 240, respectively.
Estimate δ by matching these two sample quantities to the corresponding population quantities.
(A) Less than 40
(E) At least 100
9.74 (CAS3, 11/07, Q.5) (2.5 points)
X is a two-parameter Pareto random variable with parameters θ and α.
A random sample from this distribution produces the following four claims:
• x1 = 2,000
• x2 = 17,000
• x3 = 271,000
• x4 = 10,000
Find the Method of Moments estimate for α.
A. Less than 2
E. At least 5
9.75 (CAS3L, 5/09, Q.17) (2.5 points) A random variable, X, follows a lognormal distribution.
You are given a sample of size n and the following information:
Σ xi / n = 1.8682
Σ xi2 / n = 4.4817
Use the method of moments to estimate the lognormal parameter σ.
A. Less than 0.4
E. At least 1.6
• A gamma distribution has mean αθ and variance αθ2.
• Five observations from this distribution are:
2 10 12 8 8
Calculate the method of moments estimate for α.
A. Less than 4.0
E. At least 7.0
• E[X2 ] = 10
• f(x) = α x where 0 ≤ x ≤ θ
Estimate α using the method of moments.
A. Less than 0.15
E. At least 0.60
9.1. B & 9.2. A. The observed mean is (410 + 1924 + 2635 + 4548 + 6142)/5 = 3131.8.
The second moment is: (4102 + 19242 + 26352 + 45482 + 61422 )/5 = 13844314. For the
LogNormal Distribution the mean is exp[µ +.5 σ2], while the second moment is exp[2µ + 2σ2]. With
2 parameters, the method of moments consists of matching the first 2 moments.
Thus setting exp[µ +.5 σ2] = 3131.8 and exp[2µ + 2σ2] = 13,844,314, we can solve by dividing the
square of the 1st equation into the 2nd equation: exp[σ2] = 13844314 / 3131.82 = 1.4115.
Thus σ = 0.5871 and thus µ = 7.877.
Comment: It is only a coincidence that eµ = e7.877 = 2636, equal to the sample median of 2635,
subject to rounding.
9.3. A. & 9.4. E. µ = (Σ ln x i)/5 = 7.72. Σ( ln x i)2 /5 = 60.49. σ = 60.49 - 7.722 = 0.94.
Comment: The mean and variance of the logs of the claim sizes have been matched to those of a
Normal Distribution. The method of moments applied to the Normal Distribution underlying the
LogNormal, such as in these two questions, is not the same as applying the method of moments
directly to the LogNormal, such as in the previous two questions. Applying the method of moments
to the Normal Distribution underlying the LogNormal turns out to be the same as applying Maximum
Likelihood to the LogNormal Distribution.
9.5. B. For the exponential the mean is θ.

Therefore for the method of moments θ = observed mean = 105.
9.6. C. & 9.7. E. Compute the observed first two moments.
Claim Size Square of Claim Size
1500 2,250,000
5500 30,250,000
3000 9,000,000
3300 10,890,000
2300 5,290,000
6000 36,000,000
5000 25,000,000
4000 16,000,000
3800 14,440,000
2500 6,250,000
Average 3690 15,537,000
Match the observed and theoretical means and variances:
αθ = 3690, and αθ2 = 15,537,000 - 36902 = 1,920,900.
Therefore, α = 36902 / 1920900 = 7.088 and θ = 3690/7.088 = 521.
9.8. D. The total losses prior to the effect of the coinsurance factor are: 184013/.8 = 230016. Thus
the average loss is 230016/623 = 369.2. Set the observed mean equal to the theoretical mean:
369.2 = θπ/4. θ = (4/π)(369.2) = 470.
Comment: This is a ParaLogistic Distribution as per Loss Models, with α = 2.
9.9. E. The average of the 6 claims is 107.2. For this Exponential Distribution, the mean is 1/q.
Set mean = 1/q = 107.2. Solve for q = 1 / 107.2 = 0.0093.
9.10. B. & 9.11. A.

first moment = (1729 + 101 + 384 + 121 + 880 + 3043 + 205 + 132 + 214 + 82)/ 10 = 689.1.
2nd mom. = (17292 + 1012 + 3842 + 1212 + 8802 + 30432 + 2052 + 1322 + 2142 + 822 )/ 10 =
13307957 / 10 = 1330796.
Matching first moments: θ/(α-1) = 689.1.
Matching 2nd moments: 2θ2 / {(α-1)(α-2)} = 1330796.
Dividing the second equation by the square of the first equation:
2(α-1)/(α-2) = 1330796/689.12 = 2.8025.
Solving, α = 2(1 - 2.8025)/(2 - 2.8025) = 4.49. Then, θ = 689.1(4.49 - 1) = 2406.
9.12. D. & 9.13. E. Matching first moments: µ = 689.1.

Matching variances: µ3 /θ = 1330796 - 689.12 = 855,937 .
Then, θ = 689.13 / 855,937 = 382.
9.14. C. & 9.15. B. Matching first moments: θ/(α-1) = 689.1.
Matching second moments: θ2 /{(α-1)(α-2)} = 1,330,796.
Then, dividing the second equation by the square of the first equation:
(α-1) / (α-2) = 1,330,796/689.12 = 2.8025.
Solving α = ((2)(2.8025) - 1)/( 2.8025 - 1) = 2.555. θ = (2.555 - 1)(689.1) = 1072.
9.16. B. The observed mean is: (500 + 1000 + 1500 + 2500 + 4500)/5 = 2000.
The second moment is: (5002 + 10002 + 15002 + 25002 + 45002 )/5 = 6,000,000. For the
LogNormal Distribution the mean is exp[µ +.5 σ2], while the second moment is exp[2µ + 2σ2].
With 2 parameters, the method of moments consists of matching the first 2 moments.
Thus set exp[µ +.5 σ2] = 2000 and exp[2µ + 2σ2] = 6,000,000.
Divide the square of the 1st equation into the 2nd equation:
exp[2µ + 2σ2]/exp[2µ + σ2] = exp[σ2] = 6,000,000 / 20002 = 1.5. ⇒ σ = 0.637. ⇒ µ = 7.398.
1 - F(4500) = 1 - Φ[(ln(4500) - 7.398)/0.637] = 1 - Φ[1.59] = 5.6%.
9.17. B. & 9.18. D. first moment = 0.442, 2nd moment = 0.26396.

Size Square of Size
0.21 0.0441
0.40 0.1600
0.14 0.0196
0.65 0.4225
0.53 0.2809
0.92 0.8464
0.30 0.0900
0.44 0.1936
0.76 0.5776
0.07 0.0049
Average 0.442 0.26396
This is a Beta Distribution as in Loss Models, with θ = 1.
Matching first moments: a/(a+b) = 0.442.
Matching second moments: a(a+1) / {(a+b)(a+b+1)} = .26396.
From the first equation: b = 1.262a.
Dividing the second equation by the first equation: (a+1)/(a+b+1) = .5972.
Therefore, (a+1) = (a + 1.262a +1)(.5972).⇒ a = 1.15 and b = 1.45.
9.19. D. For an Exponentials with mean δ, the second moment is 2δ2 .
The moments of the mixed distribution are the mixture of those of the individual distributions.
Therefore, the mixed distribution has mean: 0.4δ1 + 0.6δ2 and second moment: 2(0.4δ12 + 0.6δ22).
Using the method of moments with two parameters, we match the first two moments.
4 = 0.4δ1 + 0.6δ2 ⇒ 20 = 2δ1 + 3δ2. ⇒ δ1 = 10 - 1.5δ2.
27 + 42 = 2(0.4δ12 + 0.6δ22) ⇒ 215 = 4δ12 + 6δ22.
Substituting into the second equation: 215 = 4(10 - 1.5δ2)2 + 6δ22. ⇒ 15δ22 - 120δ2 + 185 = 0.
δ2 = {120 ± 1202 - (4)(15)(185) }/{(2)(15)} = 4 ± 1.915 = 5.915 or 2.085.
Note that δ2 > δ1. If δ2 = 5.915, then δ1 = (185 - 3δ22) / 12δ2 = 1.128 < δ2.
If instead, δ2 = 2.085, then δ1 = (185 - 3δ22) / 12δ2 = 6.873 > δ2.

9.20. B. The empirical mean is: 361/10 = 36.1.

The smoothed empirical estimate of the median is: (24 + 34)/2 = 29.
If X follows a shifted Exponential, then Y = X - δ follows an Exponential.
Y has mean θ, and median: -θln(0.5) = 0.693θ.
Therefore, X = Y + δ, has mean: θ + δ, and median: 0.693θ + δ.
We want: θ + δ = 36.1 and 0.693θ + δ = 29. ⇒ θ = (36.1 - 29)/(1 - 0.693) = 23.1. ⇒ δ = 13.0.
9.21. A. Mean of the mixed distribution is: (1)(p) + (10)(1-p) = 10 - 9p.

Setting the mean of the mixture equal to 3: 10 - 9p = 3. p = 7/9 = 0.778.
Comment: We do not use the fact that the individual distributions are Exponential.
For a two-point mixture in which only the weight is unknown, the mean of the mixture is between the
individual means. Therefore, the sample mean has to be between the individual means for the fitted
weight via method of moments to be between zero and one.
In this case, the empirical mean has to be between 1 and 10.
9.22. E. Since we want to estimate µ for Year 6, inflate all of the data to the cost level of Year 6.
For Year 1, (22,141)(1.04)5 = 26937.9.
So for Year 1, the inflated losses are: (1732)(26937.9) = 46.656 million.
Year Number Average Size Inflated Average Inflated
of Claims of Claim Size of claim Dollars of Loss
1 1732 $22,141 $26,937.9 $46,656,463
2 2007 $22,703 $26,559.3 $53,304,513
3 1920 $24,112 $27,122.7 $52,075,624
4 1851 $24,987 $27,025.9 $50,025,013
Sum 7510 $202,061,614
The total inflated losses are: 202.062 million.
Average inflated claim size = 202.062 million /7510 = 26,906.
Set the observed and theoretical means equal:
exp[µ + 0.5 σ2] = exp[µ + 4.5] = 26,906. µ = ln(26,906) - 4.5 = 5.70.
9.23. The mean of the data is: 0.5048.
The second moment of the data is: 0.2590.
0.5048 = a/(a + b). ⇒ b = 0.9810a.
0.2590 = a(a+1)/ {(a + b)(a + b + 1)}. ⇒ 0.2590(1.9810a)(1.9810a + 1) = a(a + 1).
⇒ 1.0164a + 0.5131 = a + 1. ⇒ a = 29.7. ⇒ b = 29.1.

Comment: The winning percentages do not average to 50%, since some games were played
against the National League. In this year, which was 2007, the American League did better than
average in its games against the National League.
A graph of the fitted Beta Distribution, with support from 0 to 1:
f(x)
6
x
0.3 0.4 0.5 0.6 0.7
9.24. A. E[1/X] = (1/2 + 1/5 + 1/11 + 1/28 + 1/65 + 1/143)/ 6 = 0.1415.

For the Inverse Exponential Distribution, E[X-1] = θ−1 Γ[-1 + 1] = Γ[0]/θ = 1/θ.
1/θ = 0.1415. ⇒ θ = 7.067.

9.25. C. Mean of the mixed distribution is: (θA)(0.8) + (θB)(0.2) = 10.
Second Moment of the mixed distribution is: (2θA2)(0.8) + (2θB2)(0.2) = 228 + 102 = 328.
θA = 12.5 - 0.25θB. 1.6(12.5 - 0.25θB)2 + 0.4θB2 = 328.
0.5θB2 - 10θB - 78 = 0. θB = {10 + 102 + (4)(0.5)(78) } / {(2)(0.5)} = 26.
Comment: θA = 12.5 - 0.25θB = 6.

9.26. C. The mean of c times a Poisson is cλ. The variance of c times a Poisson is c2 λ.
cλ = 10. c2 λ = 200. ⇒ c = 20. ⇒ λ = 0.5. 20N < 10. ⇔ N < 1/2. ⇔ N = 0.
Density at zero for the Poisson is: e−λ = e-0.5 = 60.6%.

Comment: Since Var[cX]/E[cX] = cVar[X]/E[X], for c > 1, the Over-dispersed Poisson Distribution
has a variance greater than it mean. See for example “A Primer on the Exponential Family of
Distributions”, by David R. Clark and Charles Thayer, CAS 2004 Discussion Paper Program.
9.27. E. Using percentile matching at the median, set F(5) = 0.5: 0.5 = 1 - e-5/θ.
⇒ θ1 = 7.2135. S(10) = e-10/7.2135 = .250.

Using the method of moments: θ2 = (2 + 3 + 4 + 5 + 6 + 8 + 14)/7 = 6. S(10) = e-10/6 = 0.189.
|0.250 - 0.189| = 0.061.
9.28. B. The LogNormal Distribution has mean = exp[µ + σ2/2], and median = exp[µ].
410 = exp[µ]. ⇒ µ = 6.016.
830 = exp[µ + σ2/2]. ⇒ µ + σ2/2 = 6.721. ⇒ σ = 1.19.

Comment: One can solve for the median by setting 0.5 = Φ[(ln(x) - µ)/σ].
9.29. E. E[X] = (303 + 30 + 35 + 78 + 12)/5 = 91.6 = θ/(α-1).
E[X2 ] = (3032 + 302 + 352 + 782 + 122 )/5 = 20032.4 = 2θ2 / {(α-1)(α-2)}.
Dividing the second equation by the the square of the first equation:
2(α-1)/(α-2) = 2.387. ⇒ α = 7.17. ⇒ θ = 565.2.
E[X] = θ/(α-1). E[X ∧ x] = {θ/(α-1)} {1 - (θ/(θ+x))α−1}.
LER(x) = E[X ∧ x]/E[X] = 1 - {θ/(θ+x)}α−1.

LER(10) = 1 - {565.2 / (565.2 + 10)}7.17-1 = 10.3%.
9.30. For each interval from a to b, the mean of the uniform is: (a + b)/2.
For example, (5 + 25)/2 = 15.
For each interval from a to b, the second moment of the uniform is: (b3 - a3 ) / {(b-a)(3)}.
For example, (253 - 53 ) / {(25 - 5)(3)} = 258.333.
0 5 1710 2.500 8.333
5 25 968 15.000 258.333
25 100 343 62.500 4,375.000
100 250 23 175.000 32,500.000
250 500 4 375.000 145,833.333
3048 15.012 1,015.674
Then weight the moments by the number of claims.
{(1710)(8.33) + (968)(258.33) + (343)(4375) + (23)(32,500) + (4)(145,833.33)}/3048 =
1015.7.
Set the first and second moments of the LogNormal equal to their estimates:
exp[µ + σ2/2] = 15,012. exp[2µ + 2σ2] = 1,015,674,000.
exp[σ2] = 1,015,674,000/15,0122 = 4.5069. ⇒ σ = 1.227. ⇒ µ = 8.864.
Comment: Data summarized from Table 2 of Sheldon Rosenbergʼs discussion of “On the Theory
of Increased Limits and Excess of Loss Pricing”, PCAS 1977.
Estimating the moments for grouped data is discussed in “Mahlerʼs Guide to Loss Distributions.”
9.31. B. θ/(α-1) = θ/5 = X = (50 + 100 + 200 + 250 + 600 +1200)/6 = 400. ⇒ θ = 2000.
E[X ∧ 500] = (2000/5) {1 - (2000/2500)5 } = 268.93. E[X] = 2000/5 = 400.
E[(X - 500)+] = E[X] - E[X ∧ 500] = 400 - 268.93 = 131.07.
9.32. A. Since we wish to estimate theta for year 5, inflate all of the losses to the year 5 level:
(100)(1.084 ) = 136.0. (300)(1.084 ) = 408.1.
(200)(1.083 ) = 251.9. (500)(1.083 ) = 629.9.
(100)(1.082 ) = 116.6. (1000)(1.082 ) = 1166.4.
First moment is: (136.0 + 408.1 + 251.9 + 629.9 + 116.6 + 1166.4) / 6 = 451.5.
Second moment is: (136.02 + 408.12 + 251.92 + 629.92 + 116.62 + 1166.42 ) / 6 = 336,559.
Variance is: 336,559 - 451.52 = 132,707.
Matching the mean and variance, results in two equations in two unknowns:
αθ = 451.5. αθ2 = 132,707.
Divide the second equation by the first equation: θ = 132,707 / 451.5 = 294.
Comment: Similar to 4, 11/05, Q.21
9.33. B. For an Inverse Pareto, E[1/X] = θ−1 / (τ - 1), and E[1/X2 ] = 2θ−2 / {(τ - 1)(τ - 2)}.
The negative first moment for the data is:
{1/10 + 1/30 + 1/100 + 1/300 + 1/800}/5 = 0.02958.
The negative second moment for the data is:
{1/102 + 1/302 + 1/1002 + 1/3002 + 1/8002 } / 5 = 0.002245.
Matching moments, results in two equations in two unknowns:
θ−1 / (τ - 1) = 0.02958. 2θ−2 / {(τ - 1)(τ - 2)} = 0.002245.
2 (τ - 1) / (τ - 2) = 0.002245 / 0.029582 = 2.566. ⇒ τ = 5.534.
⇒ θ = 1 / {(0.02958)(5.534 - 1)} = 7.456. S(1000) = 1 - (1000 / 1007.456)5.534 = 4.0%.

9.34. The first moment is:

8000 ln[2] + 1000 ln[5] + 150 ln[25] + 5 ln[100] + ln[250]
= 0.83726.
9156
8000 ln[2]2 + 1000 ln[5]2 + 150 ln[25]2 + 5 ln[100]2 + ln[250]2
= 0.88735.
9156
α θ = 0.83726. α θ2 = 0.88735 - 0.837262 = 0.18635
⇒ θ = 0.22257. ⇒ α = 3.7618.
Comment: If ln[X] follows a Gamma Distribution, then X follows a LogGamma Distribution,
a distribution which is not discussed in Loss Models.
9.35. A. This is a Pareto distribution with θ = 10, and mean: 10 / (α - 1).
X = (1 + 3 + 7 + 10 + 18 + 21 + 37)/7 = 13.857.
10 / (α - 1) = 13.857. ⇒ α = 1.722.
9.36. C. first moment = (12 + 41 + 35 + 3 + 67)/5 = 31.6.

2nd moment = (122 + 412 + 352 + 32 + 672 )/5 = 1509.6.
a
Matching first moments: 100 = 31.6.
a + b
a (a +1)
Matching second moments: 1002 = 1509.6.
(a + b) (a + b + 1)
From the first equation: b = 2.1646 a.
Dividing the second equation by the first equation: 100 (a+1) / (a+b+1) = 47.772.
Therefore, a + 1 = (a + 2.1646a +1)(0.47772). ⇒ a + 1 = 1.51179 a + 0.47772. ⇒
a = 1.0205, and b = 2.209.
9.37. B. αθ = (600 + 1100 + 1500 + 2500) / 4 = 1425.
Second moment = α(α+1)θ2 = (6002 + 11002 + 15002 + 25002 ) / 4 = 2,517,500.

Divide the second equation by the square of the first:
(α+1)/α = 2,517,500/14252 = 1.2398. ⇒ α = 1/.2398 = 4.171. ⇒ θ = 1425/4.171 = 342.
9.38. D. The observed mean is: (5 + 20 + 35 + 60 + 85 + 180)/6 = 64.17.

The second moment is: (52 + 202 + 352 + 602 + 852 + 1802 )/6 = 7479.
For the LogNormal Distribution the mean is exp[µ + 0.5 σ2],
while the second moment is exp[2µ + 2σ2].

Thus set exp[µ + 0.5 σ2] = 64.17, and exp[2µ + 2σ2] = 7479.
exp[2µ + 2σ2] / exp[2µ + σ2] = exp[σ2] = 7479 / 64.172 = 1.816. ⇒ σ = 0.7725. ⇒ µ = 3.863.
E[X3 ] = exp[3µ + 4.5σ2] = exp[(3)(3.863) + (4.5)(0.77252 )] = exp[14.27] = 1.58 million.

9.39. C. Set the moments of the Pareto equal to that of the data:
θ
198 = .
α - 1
2 θ2
148,322 = .
(α − 1) (α − 2)
Divide the second equation by the square of the first equation: 3.7833 = 2(α-1) / (α-2).
⇒ α = 3.122. ⇒ θ = 420.
Prob[300 < X < 800] = S(300) - S(800) = (420/720)3.122 - (420/1220)3.122 = 15.00%.
9.40. For the Pareto Distribution: e(x) = (θ+x)/(α-1).

1000 = e(1000) = (θ+1000)/(α-1).
2000 = e(4000) = (θ+4000)/(α-1).
⇒ 2 = (θ+4000)/(θ+1000). ⇒ θ = 2000. ⇒ α = 4.
9.41. E. The method of moments consists of matching the first two moments, since we have two
parameters. For the LogNormal Distribution the mean is
exp[µ + 0.5 σ2] while the second moment is exp[2µ + 2σ2]. Thus the variance is:
exp[2µ + σ2]{exp[σ2] -1}. Taking X as an estimate of the mean and S2 as an estimate of the
variance, we set: exp[µ +.5σ2] = X and exp[2µ + σ2]{exp[σ2] -1} = S2.
We get: exp[σ2] -1 = S2 / X 2 . Thus σ2 = ln( 1 + S2 / X 2 ).
Therefore, µ = ln( X ) - .5σ2 = ln( X ) - .5 ln( 1 + S2 / X 2 ).
9.42. D. αθ = (30 + 50 + 60 + 60 + 70 + 90)/6 = 60.
α(α+1)θ2 = (302 + 502 + 602 + 602 + 702 + 902 )/6 = 3933.33.

Dividing the second equation by the square of the first equation: 1 + 1/α = 1.09259. α = 10.8.
Comment: θ = 5.56. One could instead set: αθ2 = 3933.33 - 602 = 333.33.
9.43. D. The observed mean is: (3 + 4.5 + 6 + 6.25 + 6.5 + 6.75 + 7 + 7.5 + 8.5 + 10)/10 =
6.60. Since the method of moments consists of matching the first two moments, the mean of the
fitted distribution will also be 6.60.
Comment: If one were to carry through the method of moments, the second moment is:
(32 +4.52 + 62 +6.252 +6.52 +6.752 + 72 +7.52 +8.52 +102 )/10 = 46.96. For the LogNormal
Distribution the mean is exp[µ + 0.5 σ2], while the second moment is exp[2µ + 2σ2]. Thus setting
exp[µ + 0.5 σ2] = 6.6 and exp[2µ + 2σ2] = 46.96, we can solve exp[σ2] = 46.96 / 6.62 = 1.078.
Thus σ = 0.274 and therefore µ = 1.850.
9.44. D. Integrating xf(x) = axa from zero to one, the mean is: a/(1+a).
The observed mean is 0.638. Since we have one parameter, the method of moments consists of
matching the first moment. Setting a/(1+a) = 0.638, we get a = 1/{(1/0.638) -1} = 1.762.
Comment: A Beta distribution with b =1 and θ = 1, therefore with mean: θa/(a+b) = a/(1+a).
9.45. E. This is a Gamma Distribution with α = 2 and θ = c. X = (3 + 4 + 4 + 11 + 18)/5 = 8.
Set 8 = αθ = 2c. ⇒ c = 4.
1 1
x=1
9.46. A. ∫0 x f(x) dx =
∫0 x1/ θ / θ dx = x1 + 1/ θ / (θ + 1) ]
x =0
= 1/(θ+1).
Set the theoretical mean equal to the sample mean: 1/(θ+1) = X . ⇒ θ = (1 - X )/ X .
9.47. B. The observed first moment = m = (1000+850+750+1100+1250+900)/6 = 975.

The second moment = t = (10002 + 8502 + 7502 + 11002 + 12502 + 9002 )/6 = 977,917.
αθ = 975 and αθ2 = 977,917 - 9752 = 27292.
Therefore, α = 9752 / 27292 = 34.8 and θ = 975/34.8 = 28.0.
9.48. A. The median of the uniform distribution is ω/2.
The empirical median is: (5+6)/2 = 5.5. Setting ω/2 = 5.5, ⇒ ω1 = 11.
The mean of the uniform distribution is ω/2. The empirical mean is 6.5.
Setting ω/2 = 6.5, ⇒ ω2 = 13. ω1 - ω2 = 11 - 13 = -2.

9.49. C. For the LogNormal Distribution: Mean = exp(µ + 0.5 σ2),
second moment = exp(2µ + 2σ2). Setting these equal to the observed moments:
exp(µ + 0.5 σ2) = 2305, and exp(2µ + 2σ2) = 989,544 + 23052 = 63,02,569. Therefore
µ + 0.5 σ2 = ln(2305) = 7.7428 and 2µ + 2σ2 = ln(6,302,569) =15.6565.
Therefore, σ2 = 15.6565 - (2)(7.7428) = 0.1709 and µ = 7.7428 -(.5)(0.1709) = 7.657.
For the LogNormal Distribution: F(x) = Φ[{ln(x) - µ} / σ].
1 - F(3000) = 1- Φ[{ln(3000) - 7.657} / 0.1709 ] = 1 - Φ[0.85] = 20%.
9.50. E. first moment = 3800. second moment = 16,332,000.

Claim Size Square of Claim Size
1500 2,250,000
6000 36,000,000
3500 12,250,000
3800 14,440,000
1800 3,240,000
5500 30,250,000
4800 23,040,000
4200 17,640,000
3900 15,210,000
3000 9,000,000
Average 3800 16,332,000
αθ = 3800, and αθ2 = 16,332,000 - 38002 = 1,892,000.
Therefore, α = 38002 / 1,892,000 = 7.632 and θ = 3800/7.632 = 498.
Comment: α = mean2 / (2nd moment - mean2 ) = 7.63.
9.51. C.
(1/5)Σ ln xj = (1/5)(ln 500 + ln 1000 + ln 1500 + ln 2500 + ln 4500) = 36.6715 /5 = 7.3343.
(1/5)Σ ln xj2 = (1/5){(ln 500)2 + (ln 1000)2 + (ln 1500)2 + (ln 2500)2 + (ln 4500)2 }
= 271.7963/5 = 54.3593.
Then matching the moments of the log claim sizes to a Normal Distribution:
µ = 7.3343. σ2 = 54.3593 - 7.33432 . ⇒ σ = 0.7532.
For the LogNormal Distribution: F(x) = Φ[(ln x - µ )/σ].
1 - F(4500) = 1 - Φ[(ln 4500 − 7.3343 )/0.7532] = 1 - Φ[1.43] = 1 - 0.9236 = 0.0764.
9.52. D. This distribution has one parameter, so under the method of moments one sets the
observed mean of (4.9 + 1.8 + 3.4 + 6.9 + 4.0)/5 = 4.2, equal to the theoretical mean.
(β/2) 2 π = 4.2. ⇒ β = (4.2)(2) / 2 π = 3.35.
Comment: Weibull Distribution, with parameters θ = β 2 and τ = 2. Thus this an example of a

single parameter special case of a two parameter distribution in which one parameter is fixed,
although this fact is not particularly helpful in solving this problem.
The mean is: θ Γ(1 + 1/τ) = Γ(3/2) β 2 = { π / 2} β 2 = (β/2) 2 π .
The second moment is: θ2 Γ(1 + 2/τ) = Γ(2) β2 2 = 2β2 .
9.53. E. The empirical median is t3 . 1/2 = S(t3 ) = exp[-.21t3 ]. ⇒ t3 = 3.3.

The empirical mean is: (1 + 2 + 3.3 + t4 + 9)/5 = 3.06 + t4 /5.
From the method of moments result: 1/.21 = 3.06 + t4 /5. ⇒ t4 = 8.51.
9.54. C. Since this is a one parameter distribution the method of moments involves matching the
mean. The observed mean is (.4+.7+.9) / 3 = .667 = 2 / 3.
The mean of this distribution is the integral from zero to one of xf(x), which is:
1
x=1
∫0 α xα dx = (α xα + 1) / (α + 1) ]
x =0
= α / (α+1).
Setting this equal to the observed mean of 2/3, gives 2α + 2 = 3α. Therefore α = 2.
Comment: A Beta distribution with a = α, b =1 and θ = 1, therefore with mean θa/(a+b) = α/(1+α).
9.55. E. The mean of the given density is:

θ θ
x= θ
∫0 x f(x) dx =
∫0 2x(θ- x) / θ2 dx = (2 / θ2)(θx2 / 2 - x3 / 3)]
x= 0
= (2/θ2)(θ3 /6) = θ/3.
Since one has a single parameter, in order to apply the Method of Moments one sets the mean of
the fitted distribution equal to the observed mean: (0.5 + 0.9) /2 = θ / 3. ⇒ θ = (3)(0.7) = 2.1.
9.56. E. In order to apply the method of moments one sets the observed mean equal to the
theoretical mean and solve for the single parameter α. The theoretical mean is by integration by
∞ ∞ ∞
x =∞
parts: ∫0 x f(x) dx = ∫0 xα (x +1)- α - 1 dx = ]
-x(x +1)- α
x= 0
+ ∫0 (x+ 1)- α dx = 0 + 1/(α-1) = 1/(α-1).
Setting µ = 1/(α-1) and solving: α = (µ + 1) / µ.
Comment: This is a Pareto Distribution, with θ = 1 fixed, with mean θ/(α-1) = 1/(α-1). Alternately, the
tail probability S(x) = (x+1)−α can be obtained by integrating the density function from x to infinity.
Then the mean is the integral from 0 to infinity of S(x).
9.57. C. The mean of these Exponentials is: λ.

The variance of these Exponentials is: λ2 .
Therefore, the second moment of these Exponentials is: λ2 + λ2 = 2λ2 .
The mean and second moment of the mixed distribution are the weighted average of those of the
individual distributions. Therefore, the mixed distribution has
mean: 0.5(λ1 + λ2) and second moment: 0.5(2λ12 + 2λ22).
Using the method of moments with two parameters, we match the first two moments.
1 = 0.5(λ1 + λ2) or 2 = λ1 + λ2, k + 12 = 0.5(2λ12 + 2λ22) or k + 1 = λ12 + λ22.
Squaring the first equation gives: 4 = λ22+ λ 12 + 2λ 1λ 2.
Subtracting the second equation gives: 3-k = 2λ1λ 2.
Thus λ2 = (3-k)/ 2λ1. Substituting back into the first equation:
2 = λ1 + (3-k)/ 2λ1 . Thus λ12 - 2λ1 + (3-k)/2 = 0.
Letting k = 3/2 as given, this equation becomes, λ1 2 - 2λ1 + 3/4 = 0.
2 ± 22 - (4)(1)(3 / 4)
λ1 = = 1 ± 1/2 = 1/2 or 3/2. (Note λ2 was defined to be the larger one.)
2
Check: 0.5(λ1 + λ2) = 0.5(1/2 + 3/2) = 1. 0.5(2λ12 + 2λ22) = 1/4 + 9/4 = 5/2 = 1+ k.
9.58. E. The solution to the previous question reduces to a quadratic equation:
λ 12 - 2λ1 + (3-k)/2 = 0. The solutions are: 1 ± 1 - (3 - k)/ 2 = 1 ± (k - 1) / 2 .
The solutions are complex for k < 1.
The solutions are positive (and real) for k < 3.
For k ≥ 3, one solution is negative (or zero), which is not allowed since for the given densities both
lambdas are positive. (For k > 3, (k - 1) / 2 > (3 - 1) / 2 = 1.)
Thus we require 1 ≤ k < 3 .
9.59. E. The mean of the mixed distribution is (1)(w) + (2)(1 - w) = 2 - w.

Set the theoretical mean equal to the observed mean: 2 - w = 1. w = 1.
Comment: One has to assume that f1 has no parameters and that f2 has no parameters; for
example f1 might be an Exponential Distribution with mean 1, while f2 might be a Pareto
Distribution with α = 3 and θ = 4, and thus mean of 2. Here we have only one remaining parameter
w. Otherwise, the question makes no sense. Commonly both f1 and f2 would have non-fixed
parameters and thus the means would depend on these unknown parameters; in that case, one
would have to match more than the first moment in order to apply the method of moments. One
does not commonly apply the method of moments to a single observation. The answer would
have been the same if the observed mean of many observations had been 1. If for example, the
observed mean had been instead 1.4, then w = .6. If the observed mean had been outside
[1, 2], then we would not have gotten an answer; w would have been outside [0, 1].
ˆ (t) = 0.8.
9.60. B. The empirical 20th percentile is 5, where S
The empirical 60th percentile is 9.
F(t) = 1 - exp[-(t/θ)τ]. .20 = F(5) = 1 - exp[-(5/θ)τ]. ⇒ (5/θ)τ = -ln(.8) = 0.2231.
0.6 = F(9) = 1 - exp[-(9/θ)τ]. ⇒ (9/θ)τ = -ln(0.4) = 0.9163.
Dividing the two equations: (9/5)τ = 4.107. ⇒ τ = ln(4.107)/ln(1.8) = 2.40.

θ = 9/ .91631/2.40 = 9.33.
S(8) = exp[-(8/9.33)2.40] = 0.50.
9.61. The observed mean is: 667,754/100 = 6677.54.
The observed second moment is: (sum of squared loss sizes)/(number of losses) =
20,354,674,039/100 = 203,546,740.39.
The first moment of a Pareto is: θ/(α-1), and the second moment of a Pareto is: 2θ2 / {(α-1)(α-2)}.
Matching first moments θ/(α-1) = 6677.54.
Matching second moments 2θ2 / {(α-1)(α-2)} = 203,546,740.39.
2(α-1)/(α-2) = 203,546,740.39/6677.542 = 4.5649.
Solving, α = 2(1 - 4.5649)/(2 - 4.5649) = 2.780.
Then, θ = 6677.54(2.780 - 1) = 11,886.
Comment: In this case, when it comes to the first two moments, we have enough information to
proceed in exactly the same manner as if we had ungrouped data.
9.62. A. The first moment is: (4 + 5 + 21 + 99 + 421)/5 = 110.

The second moment is: (42 + 52 + 212 + 992 + 4212 )/5 = 37504.8.
Match the first two moments: θ/(α−1) = 110 and 2θ2/((α−1)(α−2)) = 37504.8.
37504.8 / 1102 - 1
Solving, α = 2 = 3.819.
37504.8 / 1102 - 2
θ = (3.819 - 1)(110) = 310.1. 0.95 = 1 - (1+x/θ)−α. Therefore, x = θ(0.05−1/α - 1) = 369.
Comment: VaRp (X) = θ {(1 - p)−1/α - 1}.
9.63. E. Since we want to estimate θ for Year 3, inflate all of the data to the cost level of Year 3:
10000(1.1)2 = 12100. 12500(1.1) = 13750.
The total inflated losses are: (100)(12100) + (200)(13750) = 3,960,000.
Average claim size = 3,960,000 /(100+200) = 13,200.
θ/(α-1) = 13,200. θ = (3-1)(13200) = 26,400.
Alternately, in Year 1, set θ/(α-1) = 10,000. ⇒ θ = 20,000.
in Year 2, set θ/(α-1) = 12,500. ⇒ θ = 25,000.

Inflate up to year 3: (20000)(1.12 ) = 24,200, and (25000)(1.1) = 27,500.
Weight the two inflated values by the number of claims in each year:
{(100)(24,200) + (200)(27,500)} / (100 + 200) = 26,400.
9.64. E. Mean of the mixed distribution is: (1)(p) + (10)(1-p) = 10 - 9p.
Second Moment of the mixed distribution is: {(2)(12 )}(p) + {(2)(102 )}(1-p) = 200 - 198p.
Variance of the mixed distribution is: 200 - 198p - (10 - 9p)2 = 100 - 18p - 81p2 .
Setting the variance of the mixture equal to 22 = 4:
-18 ± 182 + (4)(96)(81)
81p2 + 18p - 96 = 0. p = = -0.1111 ± 1.0943 =
(2) (81)
- 1.2054 or 0.9832. p = 0.9832, since p is stated to be positive.
Comment: The moments of a mixed distribution are a weighted average of the moments of the
individual distributions. An Exponential Distribution has mean θ and second moment 2θ2.
Usually one would not apply the method of moments to a one parameter situation by matching the
standard deviation or variance. Rather, usually one would match the mean, but one is not given
enough information to do so in this case.
9.65. D. Solve for the two parameters, θ and δ, by matching the first two moments.
∞ ∞ ∞
∫δ
E[X] = x f(x) dx = ∫δ x e - (x - δ ) / θ / θ dx = ∫0 (y + δ) e - y / θ / θ dy = θ + δ.
∞ ∞ ∞
∫δ
E[X2 ] = x2 f(x) dx = ∫δ x2 e- (x - δ) / θ / θ dx = ∫0 (y + δ)2 e - y / θ / θ dy =
∞
∫0 (y2 + 2yδ + δ2) e - y / θ / θ dy = 2θ2 + 2θδ + δ2.

observed first moment = 100/10 = 10 = theoretical 1st moment = θ + δ.
observed 2nd moment = 1306/10 = 130.6 = theoretical 2nd moment = 2θ2 + 2θδ + δ2.
Therefore, θ = 10 - δ. 2(10 - δ)2 + 2(10 - δ)δ + δ2 = 130.6.
δ2 - 20δ + 69.4 = 0. ⇒ δ = {20 ± (400 - (4)(69.4) }/2 = 4.5 or 15.5.
15.5 is inappropriate since we observe claim amounts < 15.5. Thus δ = 4.5.
Comment: θ = 5.5. If y = x - δ, y follows an Exponential Distribution. θ = E[Y] = E[X] - δ.
2θ2 = E[Y2 ] = E[(X-δ)2 ] = E[X2 ] - 2δE[X] + δ2 = E[X2 ] - 2δ(δ+θ) + δ2 = E[X2 ] - 2δθ - δ2. ⇒
E[X2 ] = 2θ2 + 2θδ + δ2. The shifted Exponential is a special case of the shifted Gamma. See the
“Translated Gamma Distribution”, page 388 of Actuarial Mathematics, not on the Syllabus.
9.66. D. The observed mean is: 3860/10 = 386.
The second moment is: 4,574,802/10 = 457,480.2.
Thus set exp[µ + 0.5 σ2] = 386 and exp[2µ + 2σ2] = 457,480.2.
exp[2µ + 2σ2] / exp[2µ + σ2] = exp[σ2] = 457,480.2 / 3862 = 3.070. ⇒ σ = 1.059. ⇒ µ = 5.395.
E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x{1 - Φ[(lnx − µ)/σ]}.
E[X ∧ 500] = exp(5.395 + 1.0592/2)Φ[(ln500 − 5.395 − 1.0592)/1.059] +

500{1 - Φ[(ln500 - 5.395)/1.059]} =
386Φ[-.29] + (500) {1 - Φ[.77]} = (386)(1 - 0.6141) + (500)(1 - 0.7794) = 259.
9.67. D. This a 50-50 mixture of two Exponentials, with means θ and σ.
E[X] = (θ + σ)/2. E[X2 ] = mixture of the 2nd moments = (2θ2 + 2σ2)/2 = θ2 + σ2.
Set (θ + σ)/2 = 150/10 = 15, and θ2 + σ2 = 5000/10 = 500.
⇒ σ = 30 - θ. ⇒ 2θ2 - 60θ + 400 = 0. ⇒ θ = 10 or 20.

However, we want θ > σ. ⇒ θ = 20 and σ = 10.
Comment: A two point mixture of Exponentials is the most common mixture; you should be able to
recognize and work with it, even when unusual letters such as σ are used for a mean.
9.68. C. For an Inverse Pareto, E[1/X] = θ−1 / (τ - 1), and E[1/X2 ] = 2θ−2 / {(τ - 1)(τ - 2)}.
The negative first moment for the data is:
{1/15 + 1/45 + 1/140 + 1/250 + 1/560 + 1/1340}/6 = 0.017093.
The negative second moment for the data is:
{1/152 + 1/452 + 1/1402 + 1/2502 + 1/5602 + 1/13402 }/6 = 0.0008348.
θ−1 / (τ - 1) = 0.017093. 2θ−2 / {(τ - 1)(τ - 2)} = 0.0008348.
2(τ - 1)/(τ - 2) = 0.0008348/0.0170932 = 2.857. ⇒ τ = 4.334.
⇒ θ = 1 / {(.017093)(4.334 - 1)} = 17.55.

Alternately, if X follows an Inverse Pareto with parameters τ and θ, then 1/X follows a Pareto with
parameters τ and 1/θ. So we can match the corresponding first and second moments for the Pareto:
θ/(α-1) = 0.017093, and 2θ2 / {(α-1)(α-2)} = 0.0008348.
⇒ α = 4.334 and θ = 0.05699.

Translating back to the parameters of the Inverse Pareto: τ = 4.334 and θ = 1/0.05699 = 17.55.
9.69. D. For an Inverse Gamma, E[1/X] = θ−1Γ(α+1)/Γ(α) = θ−1α, and
E[1/X2 ] = θ−2Γ(α+2) / Γ(α) = θ−2α(α + 1).

θ−1α = 0.017093. θ−2α(α + 1) = 0.0008348.
(α + 1)/α = 0.0008348/0.0170932 = 2.857. ⇒ α = 0.539. ⇒ θ = 0.539/0.017093 = 31.5.
Alternately, if X follows an Inverse Gamma with parameters α and θ, then 1/X follows a Gamma with
parameters α and 1/θ. So we can match the corresponding first and second moments for the
Gamma: θα = 0.017093, and θ2α(α+1) = 0.0008348.
⇒ (α+1)/α = 2.857. ⇒ α = 0.539. ⇒ θ = 0.017093/0.539 = 0.0317.

Translating back to the parameters of the Inverse Gamma: α = 0.539 and θ = 1/0.0317 = 31.5.
9.70. E. X = 1300/4 = 325 = αθ.
E[X2 ] = (2002 + 3002 + 3502 + 4502 )/4 = 113,750 = α(α + 1)θ2.
Dividing the 2nd equation by the square of the first equation: 1 + 1/α = 1.0769. ⇒ α = 13.0.
Alternately, one can match the variances: αθ2 = 113,750 - 3252 = 8125.
⇒ θ = 8125/325 = 25. ⇒ α = 325/25 = 13.0.
9.71. C. E[X] = (130 + 20 + 350 + 218 + 1822)/5 = 508 = θ/(α-1).
E[X2 ] = (1302 + 202 + 3502 + 2182 + 18222 )/5 = 701401.6 = 2θ2/{(α-1)(α-2)}.

Dividing the second equation by the the square of the first equation:
2(α-1)/(α-2) = 2.718. ⇒ α = 4.786. ⇒ θ = 1923.
E[X ∧ x] = {θ/(α−1)}{1 - (θ/(θ+x))α−1}.

E[X ∧ 500] = {1923/(4.786 - 1)}{1 - (1923/(1923 + 500))4.786-1} = 296.
9.72. B. Put all the losses on a common level, by inflating them all to year 3.
(20)(1.12 ) = 24.2, (40)(1.12 ) = 48.4, (50)(1.12 ) = 60.5,
(30)(1.1) = 33, (40)(1.1) = 44, (90)(1.1) = 99, (120)(1.1) = 132.
Mean of the inflated losses is: 441.1/7 = 63.01.
Second moment of the inflated losses is: 36838.45/7 = 5265.6.
Match the first two moments: exp[µ + σ2/2] = 63.01. exp[2µ + 2σ2] = 5265.6.
Divide the second equation by the square of the first: exp[σ2] = 1.326. ⇒ σ = 0.531.
⇒ µ = ln(63.01) - 0.5312 /2 = 4.00.

Comment: Since we have inflated all of the losses to the year 3 level, the resulting estimate of µ is
estimate of the mu parameter for year 3, what they have called µ3 .
9.73. E. By integrating, F(x) = 1 - e-(x-δ)/θ, δ < x < ∞.
Solve for the median: 0.5 = 1 - e-(x-δ)/θ. median = θ ln2 + δ.

If x follows a shifted exponential distribution then x - δ follows an Exponential Distribution.
Therefore E[X - δ] = θ. ⇒ E[X] = θ + δ.
Set 300 = θ + δ, and 240 = θ ln2 + δ. ⇒ θ = 60/(1 - ln2) = 195.5. ⇒ δ = 300 - 195.5 = 104.5.
Comment: Matching one percentile and matching the first moment.
Sort of a combination of percentile matching and the method of moments.
One can compute the mean of the distribution via integration, with y = x - δ:
∞ ∞ ∞ ∞ ∞
∫ x f(x) dx = ∫ x e-(x - δ)/ θ / θ dx = ∫ (y + δ) e- y / θ / θ dy = ∫ y e- y / θ / θ dy + δ ∫ e- y / θ / θ dy
δ δ 0 0 0
= θ + δ, where the first of the final two integrals is the mean of an Exponential Distribution.
9.74. C. First moment = (2,000 + 17,000 + 271,000 + 10,000)/4 = 75,000.

Second moment = (2,0002 + 17,0002 + 271,0002 + 10,0002 )/4 = 18,458,500.
Set the moments of the Pareto equal to that of the data:
75,000 = θ/(α-1).
18,458,500 = 2θ2/{(α-1)(α-2)}.
Divide the second equation by the square of the first equation: 3.2815 = 2(α-1)/(α-2).
⇒ α = 3.561. ⇒ θ = 192.1.
9.75. B. For the LogNormal Distribution the mean is exp[µ + 0.5 σ2],
while the second moment is exp[2µ + 2σ2].

Thus set exp[µ +.5 σ2] = 1.8682, and exp[2µ + 2σ2] = 4.4817.
exp[2µ + 2σ2] / exp[2µ + σ2] = exp[σ2] = 4.4817 / 1.86822 = 1.2841. ⇒ σ = 0.5. ⇒ µ = 0.5.
9.76. C. E[X] = 8. E[X2 ] = (22 + 102 + 122 + 82 + 82 )/5 = 75.2.

Match the mean and variance. Set αθ = 8, and αθ2 = 75.2 - 82 = 11.2.
⇒ θ = 1.4. ⇒ α = 8/1.4 = 5.714.

Alternately, match the first and second moments: αθ = 8, and α(α+1)θ2 = 75.2.
Solving, θ = 1.4, and α = 5.714.
9.77. A. The density must integrate to one over its support:
θ
1= ∫0 α x dx = α θ2/2. ⇒ α = 2/θ2.
θ θ
E[X2 ] = ∫0 x2 f(x) dx = ∫0 x2 (2 / θ2) x dx = (θ4/4)(2/θ2) = θ2/2 = 1/α.
Matching the second moments:
10 = 1/α . ⇒ α = 0.1.
Comment: It is unusual to perform method of moments on a one parameter distribution by matching
second moments rather than first moments.
2016-C-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/22/15, Page 228
Section 10, Fitting to Ungrouped Data by Maximum Likelihood
For ungrouped data {x1 , x2 , ... , xn } define:

Likelihood = Π f(xi) Loglikelihood = Σ ln f(xi)
In order to fit a chosen type of size of loss distribution by maximum likelihood, you maximize the
likelihood or equivalently maximize the loglikelihood. In other words, for ungrouped data you find
the set of parameters such that either Π f(xi) or Σ ln f(xi) is maximized.
For single parameter distributions one can usually solve for the parameter value by taking the
derivative and setting it equal to zero.
α 7α
For example, take f(x) = α + 1 . 89 Then, ln f(x) = ln(α) + α ln(7) - (α+1)ln(7 + x).
(7 + x)
If you observe five losses of size: 1, 3, 6, 10, 25, then the

loglikelihood = ln f(1) + ln f(3) + ln f(6) + ln f(10) + ln f(25), and is a function of α:
LogLikelihood
-1 6 . 5
- 16.55
-1 6 . 6
- 16.65
-1 6 . 7
- 16.75
Alpha
1.2 1.4 1.6 1.8 2
For α = 2, ln f(1) = ln(2) + 2 ln(7) - (3)ln(8) = -1.65.

For α = 2, loglikelihood = -1.65 - 2.32 - 3.11 - 3.91 - 5.81 = -16.8.
Graphically, the maximum likelihood corresponds to α ≅ 1.4.

89
This is a Pareto Distribution, but with θ = 7 fixed, leaving α as the sole parameter.
Here is how to algebraically find the value of α which maximizes the loglikelihood.
ln f(x) = ln(α) + α ln(7) - (α+1)ln(7 + x).
5 5
∑ ln[f(xi)] = ∑ {ln(α) + α ln(7) - (α + 1)ln[7 + xi]} .

i=1 i=1
The derivative of the loglikelihood with respect to α is:

5 5
∑ {1/ α + ln(7) - ln[7 + xi]} = 5/α - ∑ ln[(7 + xi) / 7].

i=1 i=1
5
Setting this derivative equal to zero: 0 = 5/α - ∑ ln[(7 + xi) / 7].
i=1
5
Solving for alpha: α = 5 / ∑ ln[(7 + xi) / 7] = 5 / {ln(8/7) + ln(10/7) + ln(13/7) + ln(17/7) + ln(32/7)}
i=1
= 5 / {0.1335 + 0.3567 + 0.6190 + 0.8873 + 1.5198} = 1.422.

1.422 is the Maximum Likelihood α.
Exercise: For an Inverse Exponential Distribution,

determine the maximum likelihood θ, fit to: 1, 3, 6, 10, 25.
[Solution: f(x) = θe-θ/x/x2 . ln f(x) = ln(θ) - θ/x - 2ln(x).
∑ ∂ lnf(x
∂ lnf(x) i)
= 1/θ - 1/x. 0= = n/θ - Σ1/xi.
∂θ ∂θ
n
⇒θ=n/ ∑ 1/ xi = 5 / (1/1 + 1/3 + 1/6 + 1/10 + 1/25) = 5/1.64 = 3.049.]
i=1
Exercise: For the Gamma Distribution with α fixed, write down the equation that needs to be solved
in order to maximize the likelihood.
[Solution: ln f(x) = -αlnθ + (α−1)lnx - (x/θ) - ln[Γ(α)].
Setting equal to zero the partial derivative of loglikelihood with respect to theta:
n n
0 = -nα/θ + ∑ xi /θ2. ⇒ θ = {(1/n) ∑ xi } / α .
i=1 i=1
Comment: This is the same equation as for the Method of Moments for the Gamma Distribution with
α fixed. Note that for α = 1 fixed, one would have an Exponential Distribution.]
Exercise: You observe ten claims of size: 10, 20, 30, 50, 70, 90, 120, 150, 200, 250.
Fit a Gamma Distribution with α = 4 fixed, to this data using the method of Maximum Likelihood.
n
[Solution: θ = {(1/n) ∑ xi } / α = (990/10)/4 = 24.75.]
i=1
For distributions with more than one parameter, one is still able to set all the partial derivatives equal
to zero, but is unlikely to be able to solve for the parameters in closed form.
Exercise: For the Gamma Distribution write down two equations that need to be solved in order to
d ln[G(α)]
maximize the loglikelihood. ψ(α) is the digamma function, = ψ(α).
dα
[Solution: f(x) = θ−αxα−1 e−x/θ / Γ(α). ln f(x) = -αlnθ + (α−1)lnx - (x/θ) - ln[Γ(α)].
n
∂ ∑ ln[f(xi)] n n
i=1
∂α
= -n lnθ + ∑ ln[xi] - n ψ(α) = 0. ⇒ ψ(α) = (1/n) ∑ ln[xi] - lnθ.
i=1 i=1
n
∂ ∑ ln[f(xi)] n n n
i=1
∂θ
= -nα/θ + ∑ xi /θ2 = 0. ⇒ θ = (1/n) ∑ xi / α. ⇒ lnθ = ln[(1/n) ∑ xi ] - ln(α).
i=1 i=1 i=1
n n
Substituting the second equation into the first: ψ(α) - ln α = (1/n) ∑ ln[xi] - ln[(1/n) ∑ xi ].
i=1 i=1
Comment: Beyond what you should be asked on your exam.]
For most distributions with more than one parameter, one usually maximizes the loglikelihood by
standard numerical algorithms, rather than by solving the equations in closed form.90 91 Quite often
one uses percentile matching or the method of moments to obtain a good starting point for such a
numerical algorithm. In these cases, one should still be able to calculate the likelihood or loglikelihood
for a given set of parameters.
Exercise: For a Gamma Distribution with α = 3 and θ = 10, what is the loglikelihood for the following
set of data: 20, 30, 40?
[Solution: ln f(x) = -αlnθ + (α-1)lnx - (x/θ) - ln[Γ(α)] = -3 ln(10) + 2 ln(x) - x/10 - ln2.
ln f(20) = -3.609. ln f(30) = -3.799. ln f(40) = -4.223.
loglikelihood = -3.609 - 3.799 - 4.223 = -11.631.]
Given a list of sets of values for the parameters, one could be asked to see which set produces the
largest loglikelihood.92
90
Many commercial software packages will maximize or minimize functions.
As discussed in Appendix C of the first edition of Loss Models, one can use the Nelder-Mead simplex algorithm to
minimize the negative loglikelihood. Having been given a starting value, this algorithm searches n-dimensional
space iteratively finding points where the function of interest is smaller and smaller.
91
The lack of access to a computer, restricts the variety of possible exam questions.
92
For example, five sets of values for the parameters could be given in an exam question.
Pareto Distribution Fit to the Ungrouped Data in Section 2:
α θα
For the Pareto Distribution: f(x) = , and ln[f(x)] = ln(α) + αln(θ) - (α+1)ln(θ + x).
(θ + x)α + 1
For a particular pair of values of alpha and theta one can compute ln[f(x)] for each observed size of
claim and add up the results. For alpha = 1.5 and theta = 100,000, ones gets
ln[f(x)] = ln(α) + αln(θ) - (α+1)ln(θ + x) = 17.68 - (2.5) ln(100,000 + x).
For five of the loss sizes in Section 2, this is:

x ln[f(x)] for Pareto with alpha = 1.5 and theta = 100,000
300 -11.11
37300 -11.90
86600 -12.67
150300 -13.40
423200 -15.24
Exercise: With the aid of a computer, for the data in Section 2, compute the log likelihood for
a Pareto distribution with alpha = 1.5 and theta = 100,000.
[Solution: Computing ln[f(x)] for each of the 130 losses and summing one gets a log likelihood of
-1762.25.]
Note that once the type of curve (Pareto) and the data set (Section 2) have been selected, the
loglikelihood is a function of the parameters. Here is a chart of the log likelihood for the data in Section
2, for various values of the Pareto parameters alpha and theta:
Theta Alpha
(100 thousand) 1.5 1.6 1.7 1.8 1.9
1.0 -1762.25 -1766.59 -1771.44 -1776.74 -1782.45
1.5 -1750.55 -1752.32 -1754.59 -1757.32 -1760.44
2.0 -1748.03 -1748.19 -1748.87 -1749.00 -1751.53
2.5 -1749.30 -1748.36 -1747.93 -1747.94 -1748.36
3.0 -1752.38 -1750.61 -1749.35 -1748.54 -1748.14
3.5 -1756.36 -1753.95 -1752.05 -1750.60 -1749.55
4.0 -1760.79 -1757.87 -1755.45 -1753.49 -1751.92
As seen in the above table and the following graph, the values of alpha and theta which maximize
the loglikelihood are close to α =1.7, θ = 250,000.93 Small differences in loglikelihood are significant;
over this whole chart the loglikelihood only varies by 2%. For values of alpha and theta near these
values, the loglikelihood is near the maximum.
93
A more exact computation yields α =1.702 , θ = 240,151, with loglikelihood of - 1747.87.
Here is a graph of this loglikelihood as a function of the parameters α and θ, with the maximum
loglikelihood marked with a dot:
As shown below, for values along the straight line corresponding to an estimated mean of
θ / (α-1) which is close to 340 thousand, the loglikelihood is large.
Pareto Estimated Mean ($ thousand)

Theta Alpha
(100 thousand) 1.5 1.6 1.7 1.8 1.9
1.0 200 167 143 125 111
1.5 300 250 214 188 167
2.0 400 333 286 250 222
2.5 500 417 357 312 278
3.0 600 500 429 375 333
3.5 700 583 500 438 389
4.0 800 667 571 500 444
This example has some features which are quite common when fitting curves by maximum
likelihood. The loglikelihood surface in three dimensions has a ridge. The slope along that
ridge is shallow; values of the loglikelihood along that ridge do not vary quickly. Thus there is a
considerable uncertainty around the fitted set of parameters. This will be discussed further in
later sections.
Distributions Fit to the Ungrouped Data in Section 2:
Parameters of curves fit to the ungrouped data in Section 2 by the method of Maximum Likelihood:94
Distribution Parameters Fit via Maximum Likelihood

to Ungrouped Data in Section 2
Exponential θ = 312,675
Pareto α = 1.702 θ = 240151
Weibull θ = 231158 τ = 0.6885
Gamma α = 0.5825 θ = 536,801
LogNormal µ = 11.5875 σ = 1.60326
Inverse Gaussian µ = 312675 θ = 15226
Transformed Gamma α = 4.8365 θ = 816 τ = 0.30089
Generalized Pareto α = 1.7700 θ = 272,220 τ = 0.94909
Burr α = 1.8499 θ = 272,939 γ = 0.97036
LogLogistic γ = 1.147 θ = 115,737
ParaLogistic α = 1.125 θ = 134,845
The means, coefficients of variation and skewness are as follows:
Ungrouped Maximum Likelihood Fitted Curves

Data Expon. Weibull Gamma TGam Pareto Burr InvGaus GenPar LogNorm
Mean ($000) 313 313 298 313 302 342 334 313 336 389
Coef. Var. 2.01 1.00 1.49 1.31 1.85 N.D. N.D. 4.53 N.D. 3.47
Skewness 4.83 2.00 3.60 2.62 6.80 N.D. N.D. 13.59 N.D. 52.3
Some of these distributions fail to have a finite variance and skewness. It is not uncommon for
higher moments of fitted size of loss distributions to fail to exist (as finite quantities.) In contrast,
any actual finite sample of claims has finite calculated moments. However, samples taken from
distributions without finite variance, will tend to have large estimated coefficients of variation.
Distributions with infinite skewness or even infinite variance are used in many actual
applications, but one should always do so with appropriate caution.
94
Note that in the case of the LogNormal, the Maximum Likelihood fit is equivalent to fitting a Normal Distribution to
the log claim sizes via the Method of Moments; µ = the mean of the log claim sizes, while σ is the standard deviation
of the log claim sizes.
The survival functions for the data (thick), Gamma Distribution, and Transformed Gamma Distribution,
fit by maximum likelihood, are shown below:95
S(x)
0.200
0.100
0.050
0.020
0.010
0.005 Gamma
Trans.
Gamma
0.002
x (million)
1.0 1.5 2.0 3.0
The tail of the Gamma is too light.96 The Transformed Gamma is closer to the data.
The survival functions for the data (thick), Pareto Distribution, and LogNormal Distribution:
S(x)
0.100
0.050
0.020
0.010
Pareto
0.005
LogNormal
0.002
x (million)
1.0 1.5 2.0 3.0 5.0 7.0 10.0
95
Note that both axes are on a log scale.
96
The tails of the Exponential and Weibull are also too light.
Both the Pareto and the LogNormal do a much better job of fitting the data than the Gamma, with
the Pareto somewhat better than the LogNormal.
Comparing Loglikelihoods:
The values of the loglikelihood can be used to compare the maximum likelihood curves:
Negative # Negative #
Distribution Loglikelihood Pars. Distribution Loglikelihood Pars.
Generalized Pareto 1747.82 3 LogLogistic 1749.20 2

Burr 1747.85 3 LogNormal 1752.20 2
Pareto 1747.87 2 Weibull 1753.04 2
ParaLogistic 1748.71 2 Gamma 1759.05 2
Transformed Gamma 1748.98 3 Exponential 1774.88 1
Inverse Gaussian 1808.02 2
Of the three parameter distributions, the one with the best loglikelihood is the Generalized Pareto.
Of the two parameter distributions, the one with the best loglikelihood is the Pareto.
Note how the small difference in loglikelihood from about -1748 to -1759 takes you from a
distribution that fits well to one that doesnʼt. Small differences in the loglikelihood are important. Also
note the closeness of the values for the first three distributions. The Burr and Generalized Pareto
each have the Pareto as a special case.97
Adding additional parameters always allows one to fit the data better,98 but one should only add
parameters when they provide a significant gain in accuracy. “The principle of parsimony” states
that no more causes should be assumed than will account for the effect.99 100 As applied here, the
principle of parsimony, states that one should use the minimum number of parameters that get the
job done.101
A simpler model has a number of advantages, including: may smooth irregularities in the data, is
more likely to apply over time or in similar situations, each value may be more accurately estimated.
A more complex model will closely fit the observed data.
97
In this case the fitted Burr and Generalized Pareto curves are both very close to a Pareto.
98
For example, the Gamma is a special case of the Transformed Gamma, with τ = 1. Therefore, the Maximum
Likelihood Gamma is one of the Transformed Gammas we will look at. Therefore Maximum Likelihood over all
Transformed Gammas is ≥ that of the Maximum Likelihood Gamma.
99
This principle is also referred to as Occam's Razor.
100
"Everything should be made as simple as possible, but not one bit simpler," Albert Einstein.
101
For example, one can always fit data better with a quadratic curve rather than a linear curve. One can fit exactly any
10 points with a 9th degree polynomial. However, that does not imply that it would be preferable to use the fitted 9th
degree polynomial.
For example, an Exponential Distribution with one parameter is simpler than its generalization the
Gamma Distribution with two parameters. A Gamma Distribution will always fit the data better.
However, it may be that the Gamma is just picking up peculiarities of the data set, which will not be
reproduced in the future. One must balance goodness of fit versus simplicity.
In this case, the 2 parameter Pareto Distribution is preferable to either of the 3 parameter
distributions: the Burr or the Generalized Pareto. The loglikelihood could be used to decide whether
the 3 parameter Burr or Generalized Pareto curves provide a statistically significant better fit
compared to the 2 parameter special case the Pareto.
Taking twice the difference of the loglikelihoods, or twice the log of the ratio of likelihoods, we get
0.04 or 0.10. Comparing to the Chi-Square distribution for 1 degree of freedom we find that the
difference is insignificant.
Similarly, twice the difference between the loglikelihood of the

Transformed Gamma (3 parameters) and the Gamma (2 parameters) is: (2) (10) = 20, which is
significant. As will be discussed, the Likelihood Ratio Test can be used to determine whether such
an improvement in loglikelihood is statistically significant.102
Based on the loglikelihood the Pareto fits a little better than the Transformed Gamma;
the Pareto has fewer parameters and a better loglikelihood then the Transformed Gamma.
You should compare the graphs of the distributions versus the ungrouped data in Section 2 to verify
that the Pareto and Transformed Gamma fit well, while the Gamma does not fit well. One can also
compare various statistics such as Kolmogorov-Smirnov Statistic as is done in a subsequent
section. In addition, comparing the mean excess losses and/or the Limited Expected Values
provide useful information on the fit of the curves as shown subsequently.
102
The Likelihood Ratio Test will be discussed in a subsequent section.
Limited Expected Values, Comparison of Empirical and Fitted:
Using the formulas in Appendix A of Loss Models, the values of the Limited Expected Value for
various limits for the distributions fit by maximum likelihood to the ungrouped data in Section 2 are:103
Limited Expected Value ($000)

Distribution 10 K 100 K 1m 2.5 m 5m 10 m
Data 9.7 75.0 236.2 283.7 312.7 312.7
Pareto 9.7 74.2 234.1 280.3 302.9 317.7

Weibull 9.3 72.5 258.4 293.1 297.3 297.6
Gamma 9.3 74.4 280.8 311.4 312.9 312.9
Trans. Gamma 9.6 73.1 245.4 288.5 300.5 303.5
Gen. Pareto 9.6 73.9 235.0 280.4 302.0 315.4
Burr 9.6 73.9 235.0 280.5 301.9 315.2
LogNormal 9.7 71.2 244.5 312.1 348.3 370.7
Data excluding 9.7 74.8 230.2 266.5 277.9 277.9
largest claim
Data duplicating 9.7 75.2 242.0 300.6 346.9 346.9
largest claim
One can usefully compare the fitted versus observed Limited Expected Values, in order to check
the goodness of fit of various distributions. The Pareto does a good job of matching the
observed Limited Expected Values. (For the ungrouped data, the Generalized Pareto and Burr
fitted curves are very close to the fitted Pareto curve and thus also fit well.) The Transformed
Gamma seems next best, followed by the LogNormal. The Gamma and Weibull donʼt seem to
match well.
It should be noted that for a small data set the observed Limited Expected Values are subject
to considerable fluctuation at higher limits. This is illustrated by computing Limited Expected
Values with the largest loss either eliminated from the ungrouped data set or duplicated. Given
the large difference that results at higher limits, one should be cautious about rejecting a curve
based on comparing to the observed Limited Expected Values at the upper limits. (In fact one
of the reasons for fitting curve is because at higher limits the data is thinner.)
103
The parameters of these fitted distributions are: Exponential: θ = 312,675; Pareto: α = 1.702 and
θ = 240,151; Weibull: θ = 231,158 and τ = 0.6885; Gamma: α = 0.5825 and θ = 536,801; Transformed Gamma:
α = 4.8365, θ = 816, and τ = 0.30089, Generalized Pareto: α = 1.7700, θ = 272,220, and τ =0.94909;
Burr: α = 1.8499, θ = 272,939, and γ = 0.97036; LogNormal: µ = 11.59 and σ = 1.603.
Shown below are the Limited Expected Values for the data (thick), Pareto, and the LogNormal fit via
Maximum Likelihood to the ungrouped data.
Limited Expected Value (000)
300 Pareto
LogNormal
200
Size (000)
1000 1500 2000 3000 5000
Mean Excess Losses, Comparison of Empirical and Fitted:
The values of the Mean Excess Loss for various limits for the distributions fit by maximum likelihood
to the ungrouped data in Section 2 are :
Mean Excess Loss ($000)

Distribution 10 K 100 K 1m 5m 10 m
Data 322.9 423.3 1421.0
Exponential 312.7 312.7 312.7 312.7 312.7

Pareto 356.5 484.7 1766.7 7464.7 14587.2
Weibull 323.4 394.8 606.5 922.0 1124.0
Gamma 341.0 393.6 477.4 518.1 527.0
Trans. Gamma 318.4 418.6 890.0 1965.4 2894.3
Gen. Pareto 351.1 472.8 1644.8 6839.6 13332.3
Burr 349.4 470.9 1621.5 6669.3 12965.7
LogNormal 408.8 614.8 1767.3 5024.4 8298.2
Below are the empirical mean excess loss (thick), the mean excess loss for the maximum likelihood
Pareto and the maximum likelihood Transformed Gamma:
e(x) (million)
3.0 Pareto
2.0
1.5
1.0
Transformed Gamma
x (million)
0.5 1.0 2.0
The Transformed Gamma is perhaps somewhat too light-tailed, while the Pareto is perhaps
somewhat too heavy-tailed. If one were extrapolating out to large loss sizes, which distribution is
used is very important. Below are compared the Mean Excess Losses estimated from the fitted
Pareto, LogNormal, and Transformed Gamma, out to $20 million:
e(x) (million)
25
20 Pareto
15 LogNormal
10
Transformed Gamma
5
x (million)
5 10 15 20
I have found the Mean Excess Losses (Mean Residual Lives) particularly useful at
distinguishing between the tails of the different distributions when interested in using the
curves to estimate Excess Ratios. Comparing the Limited Expected Values seems particularly
useful when the distributions are to be used for estimating increased limit factors.
Loss Elimination Ratios and Excess Ratios, Comparison of Empirical and Fitted:
Here are Loss Elimination Ratios and Excess (Pure Premium) Ratios for the curves fit via maximum
likelihood to the ungrouped data in Section 2:
Excess Ratios Loss Elimination Ratios

Distribution $1 million $2.5 million $5 million $10 K $100 K $1 million
Data 0.245 0.093 0.000 0.031 0.240 0.755
Pareto 0.316 0.181 0.115 0.028 0.217 0.684
Weibull 0.132 0.015 0.001 0.031 0.243 0.868
Gamma 0.103 0.005 0.000 0.030 0.238 0.897
Trans. Gamma 0.193 0.051 0.011 0.032 0.241 0.807
Gen. Pareto 0.300 0.164 0.100 0.029 0.220 0.700
Burr 0.297 0.161 0.097 0.029 0.221 0.703
LogNormal 0.374 0.200 0.108 0.025 0.182 0.626
Even though the empirical excess ratio at $5 million is 0, those for most of the fitted distributions are
significantly positive. While the largest loss reported in our ungrouped data set of 130 losses is
about $5 million, this does not imply that if the experiment were repeated we couldnʼt get a loss of
$10 million, $20 million, or more.
Even if very rare, large losses can still have a significant impact on the expected value of the excess
ratio. The loss dollars from hurricanes are a good example. Storms of an intensity such that they
occur less frequently than once in several decades can have a significant impact on the total
expected loss as well as the expected value of the excess ratios.
Below are the excess ratios for the data (solid), Pareto (dotted), and LogNormal (dashed):
Excess Ratio
0.6
0.5
Pareto
0.4
0.3
Empirical
0.2
LogNormal
0.1
Size (million)
0.50 0.70 1.00 1.50 2.00 3.00
In our example, if the losses are actually being drawn from the Pareto distribution fit by maximum
likelihood, then 10% of the expected total dollars are coming from dollars in the layer excess of $5
million! This is in spite of the fact that for the particular sample we observe no dollars in that layer. If
we had added a loss of $10 million to the observed ungrouped data, then the observed excess
ratio would have been about 10% for this layer. (Remember that a loss of size $10 million only
contributes $5 million to the layer excess of $5 million.) For this Pareto distribution, 7.5% of the total
expected losses come from losses larger than $20 million, 4% from losses greater than $50 million,
2.5% from losses greater than $100 million, and half a percent from losses greater than $1 billion!
This is what is meant by a very heavy tailed distribution.
If instead the losses follow the fitted Transformed Gamma, then about 1% of the total loss dollars are
expected to come from the layer excess of $5 million, rather than 10% as for the fitted Pareto. The
observed excess ratio would be about 1% if we added a single loss of $5.5 million to the
ungrouped data set. In comparison to the Pareto, for the Transformed Gamma, 0.6% of the total
expected losses come from losses larger than $20 million, and only 4 x 10-6 from losses greater
than $50 million. Thus while the fitted Transformed Gamma has a heavy tail, it is not nearly as heavy
as the fitted Pareto.
There is no way we can distinguish on a statistical basis between these two distributions solely
based on 130 observed claims. Yet they produce substantially different estimates of the
expected losses in higher layers. This illustrates the difficulties actuaries have estimating high
layers of loss in real world applications. Nevertheless, the techniques in Loss Models form at
least the starting point from which most such estimates are made.
Below are compared the Excess Ratios estimated from the fitted Pareto, LogNormal (thick), and
Transformed Gamma, out to $20 million:
Excess Ratio
0.35
0.30
0.25
0.20
0.15
0.10
Pareto
0.05 Trans.
LogNormal
Gamma
Size (million)
5 10 15 20
Linear Exponential Families:
For Linear Exponential Families, the Methods of Maximum Likelihood and Moments produce the
same result when applied to ungrouped data.104 Thus there are many cases where one can apply
the method of maximum likelihood to ungrouped data by instead performing the simpler method of
moments: Exponential, Poisson, Normal for fixed σ, Binomial for m fixed (including the special case
of the Bernoulli), Negative Binomial for r fixed (including the special case of the Geometric), the
Gamma for α fixed, and the Inverse Gaussian for θ fixed.
104
This useful fact is demonstrated in "Mahler's Guide to Conjugate Priors".
Creating Single Parameter Distributions:
It is possible to reduce the number of parameters in any distribution, by assuming one or more of
the parameters is a constant. This can be done with any of the distributions
in Loss Models. For example, as has already been mentioned, the two parameter Gamma
distribution with a shape parameter of unity is the one parameter Exponential distribution.
Each of the two parameter distributions discussed above can be made a one parameter distribution
by setting either parameter equal to a constant. This is not only useful in the real world applications,
but it is likely to be used in exam questions.105
For example, assume we take the Pareto distribution, and set θ = 300,000. To fit via the method of
Maximum Likelihood the resulting one parameter distribution to the ungrouped data in Section 2, we
set the derivative of the log likelihood with respect to α, equal to zero.
n
The loglikelihood is: ∑ { ln(α) + α ln(300,000) - (α + 1)ln(300,000 + xi) } .
i=1
n
The derivative of the loglikelihood with respect to α: ∑ {1/ α + ln(300,000) - ln(300,000 + xi) } .
i=1
Setting this derivative equal to zero and solving for alpha:

n
α=n/ ∑ ln[(300,000 + xi)/ 300,000] = 1.963.
i=1
Note that this is not the same value obtained for α when the two parameter Pareto distribution was
fit by maximum likelihood to the ungrouped data. Remember that in that case the value of both α
and θ were allowed to vary freely. The fitted parameters α =1.702, θ = 240 thousand produce the
maximum likelihood over all possible pairs of parameters. α = 1.963 only maximizes the likelihood
for all possible values of α when θ = 300,000.
Invariance of the Method of Maximum Likelihood Under Change of Variables:
The Method of Maximum Likelihood is unaffected by change of variables, that are one to
one and monotonic, such as: x2 , x , 1/x, ex, e-x, and ln(x). This important result often lets one
reduce the method to a simpler case.
105
For example, many of the exam questions on maximum likelihood involve this idea.
For example, assume one is using Maximum Likelihood to fit a Weibull Distribution with θ unknown
and τ = 3 fixed. Then f(x) = 3x2 exp(-(x/θ)3 )/θ3 .
The loglikelihood is: Σlnf(xi) = -θ-3Σxi3 + 2Σln(xi) -3n ln(θ) + n ln(3).
Set the partial derivative with respect to θ equal to zero: 0 = 3θ-4 Σxi3 - 3n/θ. Thus θ = {Σxi3 /n}1/3.
Alternately, one could use the change of variables y = x3 , which transforms this Weibull106 into an
Exponential with mean θ3 . Then when applied to ungrouped data, the maximum likelihood fit to an
Exponential is equal to that of the Method of Moments: θ3 = Σyi / n.
Thus transforming back via y = x3 , θ = {Σxi3 /n}1/3.
Now the reason this works is that the loglikelihood of the Exponential and the Weibull differ only by
terms which don't depend on the parameter θ.
The density of the Exponential with mean θ3 is: g(y) = exp(-y/ θ3 )/ θ3 .
The log density of the Exponential is: ln g(y) = -y/θ3 -ln(θ3 ) = -x3 / θ3 - 3ln(θ).
This differs by: 2ln(x) + n ln(3), from the log density of the Weibull (for τ = 3) of:
-θ-3x3 + 2ln(x) - 3ln(θ) + n ln(3).
Thus the value of θ that maximizes one of the loglikelihoods, also maximizes the other.
This is what is meant by the Maximum Likelihood being invariant under change of variables.
The key is that the change of variables can not depend on any of the parameters.107
Exercise: Let x1, x2 , ... , xn be fit to a Normal Distribution via Maximum Likelihood.
Determine µ and σ.
[Solution: f(x) = (1/σ) exp(-0.5{(x-µ)/σ}2 ) / 2 π .
ln f(x) = -0.5{(x-µ)2 /σ2 } - ln(σ) - (1/2)ln(2π). Σ ln f(xi) = -0.5{Σ(xi-µ)2 /σ2 } - nln(σ) - (n/2)ln(2π).
Set the partial derivatives of the loglikelihood equal to zero.
∂ ∑ ln[f(xi)] = Σ(xi-µ)2/σ3 - n/σ = 0. ∂ ∑ ln[f(xi)] = Σ(xi-µ)/σ2 = 0. Therefore Σ(xi-µ) = 0.
∂σ ∂µ
µ = (1/n)Σxi. Therefore σ =
Σ(xi - µ)2 .]
n
106
The Weibull can be obtained from an Exponential via a power transformation.
107
Note that in the example, τ was fixed at 3 and thus was not a parameter.
Note that the fitted µ and σ are the usual estimates for the mean and variance.108 Thus in the case of
the Normal, applied to ungrouped data, the Method of Maximum Likelihood produces the same
result as the Method of Moments.
Exercise: Let x1, x2 , ... , xn be fit to a LogNormal Distribution via Maximum Likelihood.
Determine µ and σ.
[Solution: f(x) = exp[-0.5 ({ln(x) − µ} / σ)2] / {xσ 2 π )

ln f(x) = -0.5{(ln(x)-µ)2 /σ2 } - ln(σ) - ln(x) - (1/2)ln(2π)
Σ ln f(xi) = -0.5{Σ(ln(xi)-µ)2 /σ2 } - nln(σ) - nln(x) - (n/2)ln(2π)
∂ ∑ ln[f(xi)] = Σ(ln[xi] - µ)2/σ3 - n/σ = 0. ∂ ∑ ln[f(xi)] = Σ(ln[xi] - µ)/σ2 = 0.
∂σ ∂µ
∑ lnxi . ⇒ σ2 = ∑ (lnxi - µ)2

Therefore, Σ(ln[xi ] - µ) = 0. ⇒ µ = .]
N N
Notice that for the LogNormal Distribution, the Method of Maximum Likelihood gives the same result
as the Method of Maximum Likelihood applied to the Normal Distribution and the log of the claim
sizes. This is the case because the Method of Maximum Likelihood is invariant under changes of
variables such as y = ln(x).109 If a set of parameters maximizes the likelihood of a LogNormal
Distribution, then they also maximize the likelihood of the corresponding Normal.
In addition, since in the case of the Normal the Method of Maximum Likelihood and the Method
of Moments (applied to ungrouped data) produce the same result, applying the Method of
Maximum Likelihood to the LogNormal is the same as applying the Method of Moments to the
underlying Normal.
Another important example of a change of variables is the effect of uniform inflation.

If for example, we have 5% annual inflation over 3 years, then if X is the loss amount in the year
2000, Y = 1.053 X is the loss amount in the year 2003. We could fit a distribution using maximum
likelihood applied to the year 2000 data, and then adjust this distribution for the effects of uniform
inflation.110 Alternately, we could fit a distribution using maximum likelihood applied to the data
adjusted to a year 2003 level. Due to the invariance of maximum likelihood under change of
variables, the results would be the same in the two alternatives.
108
Note that this estimate of the variance with n rather than n-1 in the denominator is biased.
109
If x follows a LogNormal, then ln(x) follows a Normal.
110
How to do this is discussed in "Mahler's Guide to Loss Distributions." For example for a Pareto Distribution, α
stays the same and θ is multiplied by the inflation factor, in this case 1.053 .
Demonstration of the Invariance of the Method of Maximum Likelihood:
The Method of Maximum Likelihood is unaffected by change of variables, that are one to one
monotonic such as: x2 , x , and ln(x).
Let y = g(x), where g does not depend on any of the parameters to be fitted, and is one-to-one
monotonic.111 Then the Distribution Functions are related via FY(g(x)) = FX(x), while the density
functions are related via fY(g(x)) g'(x) = fX(x).
The loglikelihood in terms of x is: Σ ln fX(xi) = Σ ln fY(yi) + Σ ln g'(xi).

However, the second term on the right hand side of the equation does not depend on the
parameters of the loss distributions fX and fY. Thus if a set of parameters maximizes the loglikelihood
of fX, then it also maximizes the loglikelihood of fY. Thus in general, the Method of Maximum
Likelihood is invariant under such changes of variables.
Uniform Distribution:
For a uniform distribution from 0 to ω.

If we have data: 65, 72, 80.
Then ω ≥ 80. We could not have ω = 79.8.
In general ω ≥ maximum of the sample.
For a uniform distribution from d to 20.

If we have data: 5, 7, 11.
Then d ≤ 5. We could not have d = 5.1.
In general d ≤ minimum of the sample.
These are the reasons why for a uniform distribution from a to b, the maximum Iikelihood fit is:
a = minimum of the sample, and b = maximum of the sample.
111
y = g(x) is one-to-one if each value of y corresponds to no more than one value of x.
y = g(x) is monotonic increasing if as x increases y does not decrease.
y = g(x) is monotonic decreasing if as x increases y does not increase.
Beta Distribution:
As shown in Appendix A of Loss Models, the Beta Distribution has support from 0 to θ.112
1 Γ(a + b)
f(x) = (x/θ)a (1 - x/θ)b-1 / x = (x/θ)a (1 - x/θ)b-1 / x
β(a, b) Γ(a) Γ(b)
(a + b - 1)!
= (x/θ)a-1 (1 - x/θ)b-1 / θ, 0 ≤ x ≤ θ.
(a - 1)! (b- 1)!
For a = 1, b = 1, the Beta Distribution is the uniform distribution from [0, θ].
For various special cases, one can fit a Beta Distribution via maximum Iikelihood on the exam.
For b = 1 and θ fixed and known: f(x) = a xa-1 / θa.

ln [f(x)] = ln[a] + (a-1)ln[x] - a ln[θ].
Loglikelihood is: n ln[a] + (a-1) ∑ ln[xi] - n a ln[θ].

Setting the partial derivative with respect to a equal to zero:
0 = n/a + ∑ ln[xi] - n ln[θ].
a^ = ln[θ] - n/ ∑ ln[xi] = n
-n 113
.
∑ ln[xi / θ]
i=1
Exercise: For a = 1 and θ fixed and known, fit a Beta Distribution via maximum likelihood.
[Solution: f(x) = b (1 - x/θ)b-1 / θ. ln [f(x)] = ln[b] + (b-1)ln[1 - x/θ] - ln[θ].
Loglikelihood is: n ln[b] + (b-1) ∑ ln[1 - xi / θ] - n ln[θ].

Setting the partial derivative with respect to b equal to zero:
∑ -n
0 = n/b + ln[1 - xi / θ] . ⇒ b^ = n .
∑ ln[1 - xi / θ]
i=1
^
Comment: The formula for b follows from that for a^ , and the change of variables: y = θ - x.
Note that S(x) = (1 - x/θ)b , 0 < x < θ, which is a Modified DeMoivreʼs Law.]
112
I discuss the Beta Distribution in “Mahlerʼs Guide to Loss Distributions.”
113
See CAS3, 11/05, Q.18, and 4, 11/04, Q. 6.
⎛ x⎞ α
Modified DeMoivreʼs Law: S(x) = ⎜1 - ⎟ , 0 ≤ x ≤ ω, α > 0.
⎝ ω⎠
-n
In general, for ω fixed, when fitting the exponent α via maximum likelihood: α
^ =
n .114
∑ ln[1 - xi / ω]
i=1
For b = 2 and θ fixed and known: f(x) = a(a+1) xa-1 (1 - x/θ) / θa.
ln [f(x)] = ln[a] + ln[a+1] + (a-1)ln[x] + ln[1 - x/θ] - a ln[θ].
Loglikelihood is: n ln[a] + n ln[a+1] + (a-1) ∑ ln[xi] - ∑ ln[1 - xi / θ] - n a ln[θ].

Setting the partial derivative with respect to a equal to zero:
n/a + n/(a+1) + ∑ ln[xi] - n ln[θ] = 0. ⇔ n/a + n/(a+1) + ∑ ln[xi / θ] = 0.
Exercise: You observe 5 values: 12, 33, 57, 70, 81.
For b = 2 and θ = 100, fit a Beta Distribution via maximum likelihood.
[Solution: 5/a + 5/(a+1) + ln[12/100] + ln[33/100] + ln[57/100] + ln[70/100] + ln[81/100] = 0 ⇒
5/a + 5/(a+1) - 4.358 = 0. ⇒ 4.358 a2 - 5.642 a - 5 = 0.

5.642 + 5.6422 - (4)(4.358)(-5)
Taking the positive root of this quadratic equation: a^ = = 1.90.]
(2)(4.358)
Exercise: You observe 5 values: 21, 33, 47, 60, 71.

For a = 2 and θ = 100, fit a Beta Distribution via maximum likelihood.
[Solution: f(x) = b(b+1) x (1 - x/θ)b-1 / θ2.

ln [f(x)] = ln[b] + ln[b+1] + (b-1)ln[1 - x/θ] + ln[x] - 2 ln[θ].
Setting the partial derivative of the loglikelihood with respect to b equal to zero:
n/b+ n/(b+1) + ∑ ln[1 - xi / θ] = 0. ⇒
5/b + 5/(b+1) - 3.425 = 0. ⇒ 3.425 b2 - 6.575 b - 5 = 0.
6.575 + 6.5752 - (4)(3.425)(-5)
Taking the positive root of this quadratic equation: b^ = = 2.50.]
(2)(3.425)
114
See CAS 3L, 11/11, Q. 18.
Formulas for Some Examples of Maximum Likelihood fits to Ungrouped Data:115
Distribution Parameters
Exponential θ = Σxi/N = X , same as the method of moments

N
Inverse Exponential θ=
∑1/ xi
N
Single Parameter Pareto α=
∑ ln[xi / θ]
Gamma, α fixed θ = X /α, same as the method of moments
N
Pareto, θ fixed α=
θ+x
∑ ln[ θ i ]
Weibull, τ fixed θ= (∑ x / N)
i
τ 1/ τ
∑ (xi )
- X 2
Normal µ = X , σ2 = , same as the method of moments
N
∑ lnxi , σ2 = ∑ (lnxi - µ)2

LogNormal µ=
N N
Inverse Gaussian, θ fixed µ= X

1
Inverse Gaussian µ = X, θ=
∑ 1/ xi - 1/
X
N
α
Inverse Gamma, α fixed θ=
∑ 1/ xi
N
Uniform on [0, b] b = maximum of the xi

Uniform on [a, 0] a = minimum of the xi
Uniform on [a, b] a = minimum of the xi, b = maximum of the xi
115
In the absence of truncation and/or censoring, as well as in the absence of grouping.
⎛ N ⎞ 1/ τ
Inverse Weibull, τ fixed θ= ⎜ ⎟
⎜
⎝ ∑ xi- τ ⎟
⎠
N
Inverse Pareto, θ fixed τ=
∑ ln[1 + θ / xi ]
-n
Beta, b = 1, θ fixed a= n
∑ ln[xi / θ]
i=1
-n
Beta, a = 1, θ fixed b= n
∑ ln[1 - xi / θ]
i=1
Maximum Likelihood versus Method of Moments and Percentile Matching:
Method of Moments and Percentile Matching each match one or more statistics of the data to the
fitted distribution. They have the advantage of being relatively simple to perform. While Method of
Moments and Percentile Matching do a fairly good job for lighter tailed data sets, they generally do
not perform as well for data sets with heavy right hand tails, such as are common in casualty
insurance. Also the choice of percentiles at which to perform the matching can be somewhat
arbitrary.
The method of maximum likelihood uses all the information contained in the data set, and thus as will
be discussed in a subsequent section has many desirable statistical properties when applied to
large samples. Also as will be discussed in subsequent sections, the method of maximum likelihood
can be applied to situations involving data combined from different policies with different deductibles
and/or different maximum covered losses.
In a majority of cases, maximum likelihood requires the use of a computer. The computer program
will usually require a starting set of parameters in order to numerically maximize the loglikelihood.
Percentile Matching or Method of Moments are often used to provide such a starting set of
parameters.
Restricted Maximum Likelihood:116
If one has two or more data sets, one can fit separate distributions to each data set. Instead, one can
fit similar distributions to all of the data combined, with restrictions on the relationships of the
parameters of the distributions of the related data sets, such as in the following example.
We have two regions of a state, A and B.
Region Number of Claims Total Average

A 50 $20,000 $400
B 80 $72,000 $900
You assume that the distribution of the size of loss is Exponential in each region.
Exercise: Using maximum likelihood, separately for region A fit via maximum likelihood an
Exponential Distribution. What is the corresponding maximum loglikelihood?
[Solution: For an Exponential Distribution, ln f(x) = -x/θ - ln(θ).
The loglikelihood for A is: Σ{-xi /θ - ln(θ)} = (-1/θ)Σxi - n ln(θ) = -20,000/θ - 50 ln(θ).
Setting the partial derivative of the loglikelihood with respect to θ equal to zero:
0 = 20,000/θ2 - 50/θ. θ = 20,000/50 = 400. (The same result as the method of moments.)
The maximum loglikelihood is: -20,000/400 - 50 ln(400) = -349.573.]
Similarly, if we separately estimate θ for Region B, θ = 900.

The corresponding maximum loglikelihood is: -72,000/900 - 80 ln(900) = -624.192.117
Let us assume based on some outside information that the expected size of loss in Region B is
twice that in Region A.
Using maximum likelihood applied to all of the data, estimate θ for Region A restricted by this
assumption. What is the corresponding maximum loglikelihood?
116
See for example 4, 11/00, Q.34.
117
In general, the maximum loglikelihood for an Exponential and n ungrouped data points is: -n(1 + ln( x )).
[Solution: For the Exponential Distribution, f(x) = e-x/θ/θ. ln f(x) = -x/θ - ln(θ).
Assuming θB = 2θA, then the loglikelihood is: ∑ {-xi / θA - ln(θ A)} + ∑ {-xi / (2θA) - ln(2θA)}
A B
= -20,000/θA - 50 ln(θA) - 72,000/ (2θA) - 80 ln(2θA) = -56,000/θA - 130 ln(θA) - 80 ln(2).
Setting the partial derivative of the loglikelihood with respect to θA equal to zero:
0 = 56,000/θA2 - 130/θA. ⇒
θA = 56,000 / 130 = 430.77. ⇒ θB = 2θA = 861.54.

The maximum loglikelihood is:
-20,000/430.77 - 50 ln(430.77) - 72,000/861.54 - 80 ln(861.54) = -973.976.]
The unrestricted maximum loglikelihood is: -349.573 - 624.192 = -973.765, somewhat better than
the restricted maximum likelihood of -973.976.
It is not surprising that without the restriction we can do a somewhat better job of fitting the data.
The unrestricted model involves two Exponentials, while the restricted model is a special case in
which one of the Exponentials has twice the mean of the other.118
There is a shortcut for restricted maximum likelihood for this Exponential case.
Put the Region B losses on the assumed Region A by dividing by 2: 72,000/2 = 36,000.
Then one can apply the method of moments: θA = (20,000 + 72,000/2) / (50 + 80) = 430.77.
While this shortcut will also work for a Gamma Distributions with alpha fixed, it does not work in
general. For example, it will not work for a Single Parameter Pareto Distribution.119
118
The restricted model has one parameter, while the unrestricted model has two parameters.
As will be discussed in a subsequent section, one can apply the Likelihood Ratio Test to this situation.
119
See 4, 11/04, Q.18 (2009 Sample Q.146).
Problems:
10.1 (3 points) You observe 4 claims of sizes: 2, 5, 7 and 10. For this data, which of the following
Pareto distributions has the largest likelihood?
A. Pareto with α = 1, θ = 10 B. Pareto with α = 1.5, θ = 12
C. Pareto with α = 2, θ = 15 D. Pareto with α = 2.5, θ = 18
E. Pareto with α = 3, θ = 20
10.2 (3 points) A random variable X is given by the density function:

f(x) = (q+2)(q+1) xq (1-x), 0 ≤ x ≤ 1.
A random sample of three observations of X yields: 0.2, 0.3 and 0.6.
Determine the maximum likelihood estimator of q.
A. less than 0
E. at least 1.5
10.3 (2 points) You observe the following 6 claims: 162.22, 151.64, 100.42, 174.26, 20.29,
34.36. A distribution: F(x) = 1 - e-qx, x > 0, is fit to this data via the Method of Maximum Likelihood.
A. less than 0.006
E. at least 0.009
10.4 (3 points) 10 Claims have been observed: 1500, 5500 3000, 3300, 2300, 6000, 5000,
4000, 3800, 2500. The underlying distribution is assumed to be Gamma, with parameters
α = 8 and θ unknown. Determine the maximum likelihood estimate of θ.
A. 430 B. 440 C. 450 D. 460 E. 470
10.5 (3 points) 0, 3, and 8 are three independent random draws from Normal Distributions.
Each Normal Distribution has the same mean.
The variances of the Normal Distributions are respectively: 1/θ, 1/(2θ), and 1/(3θ).
Determine the maximum likelihood estimate of θ.
A. 0.03 B. 0.04 C. 0.05 D. 0.06 E. 0.07
10.6 (2 points) Losses are uniformly distributed on [1, b]. You observe 4 losses: 1.7, 3.1, 3.4, 4.6.
What is the maximum likelihood estimate of b?
A. less than 4
E. at least 7
Use the following information to answer the following two questions:

You observe the following five claims: 6.02, 7.56, 7.88, 8.42, 8.72.
(x -µ)2
exp[- ]
2σ2 .
The normal distribution has its probability density function given by: f(x) =
σ 2π
10.7 (1 point) Using the method of maximum likelihood, a Normal distribution is fit to this data.
A. less than 7.8
E. at least 8.1
10.8 (2 points) Using the method of maximum likelihood, a Normal distribution is fit to this data.
A. less than 0.6
E. at least 0.9
10.9 (3 points) You have the following data from three states:
State Number of Claims Dollars of Loss
Bay 200 200,000
Empire 400 500,000
Granite 100 75,000
You assume that the mean claim size for Empire State is 1.4 times that for Bay State and 1.7 times
that for Granite State. You assume the size of claim distribution for each state is Exponential.
Estimate the mean claim size for Empire State via the method of maximum likelihood applied to the
data of all three states.
(A) 1100 (B) 1150 (C) 1200 (D) 1250 (E) 1300
Use the following information to answer the following two questions:

You observe the following five claims: 410, 1924, 2635, 4548, 6142.
10.10 (1 point) Using the method of maximum likelihood, a LogNormal distribution is fit to this data.
A. less than 7.8
E. at least 8.1
10.11 (2 points) Using the method of maximum likelihood, a LogNormal distribution is fit to this data.
A. less than 0.6
E. at least 0.9
A Pareto distribution has been fit to a set of data xi, i = 1 to n, using the method of maximum
likelihood.
For the fitted parameter θ, let:
∑ ln(1 + xi / θ) ∑1/ (1 + xi / θ)
v= w=
n n
10.12 (2 points) Which of the following equations is satisfied?

A. w + 1/v = 1 B. (1/w) + v = 1 C. w - v = 1 D. (1/w ) - (1/v) = 1 E. None of A,B, C, or D.
10.13 (2 points) For the fitted parameters, α is equal to which of the following?
A. v/(v+w) B. w/(w+1) C. 1/v D. w/(w-1) E. v/(v-w)
10.14 (3 points) You observe the following 10 claims:

1729, 101, 384, 121, 880, 3043, 205, 132, 214, 82
You fit to this data via the method of maximum likelihood the Distribution Function
⎛ 1000 ⎞ α
F(x) = 1 - ⎜ ⎟ . In which of the following intervals is α?
⎝ 1000 + x ⎠
A. less than 2.5
E. at least 2.8
• 5 Claims have been observed: 1500, 5500, 3000, 3300, 2300.

• An Inverse Gaussian Distribution with parameters µ and θ is fit to this data via
maximum likelihood.
10.15 (2 points) Determine the value of the fitted µ.

A. less than 3200
E. at least 3500
10.16 (2 points) Determine the value of the fitted θ.

A. less than 15,000
E. at least 18,000
10.17 (2 points) If µ is fixed as 4000, determine the value of the fitted θ.

A. less than 13,000
E. at least 16,000
Use the following information to answer the next two questions:

• An insurer writes a policy with a coinsurance factor of 90%.
• There are 7 payments: 3236, 3759, 10769, 22832, 28329, 36703, 72369.
• The losses prior to the effect of the coinsurance factor are assumed to follow a
LogNormal Distribution.
• A LogNormal Distribution is fit via the Method of Maximum Likelihood.
10.18 (2 points) What is the fitted µ parameter?

A. less than 9.7
E. at least 10.0
10.19 (2 points) What is the fitted σ parameter?

A. less than 0.7
E. at least 1.0
10.20 (3 points) A sample of 200 losses has the following statistics:

200 200
∑ xi -2 = 0.0045641 ∑ xi = 130,348
i=1 i=1
200 200
∑ xi -1.5 = 0.046162 ∑ xi1.5 = 3,832,632
i=1 i=1
200 200
∑ xi -1 = 0.59041 ∑ xi2 = 120,252,097
i=1 i=1
You assume that the losses come from a Weibull distribution with τ = 1.5.
Determine the maximum likelihood estimate of the Weibull parameter θ.
(A) Less than 700
(E) At least 1000

• You observe 1000 claims of sizes xi, i =1, 2, 3.... 1000.
1000 1000
• ∑ xi = 150,000 ∑ lnxi = 4,800
i=1 i=1
1000 1000
∑ (lnxi)2 = 24,000 ∑ xi2 = 30 million
i=1 i=1
• A Gamma Distribution is fit to this data via the method of maximum likelihood
• Where ψ(y) = d (ln Γ(y) ) / dy is the digamma function:
y ln(y) - ψ(y) y ln(y) - ψ(y)
2.0 0.270 2.6 0.204
2.1 0.257 2.7 0.196
2.2 0.244 2.8 0.189
2.3 0.233 2.9 0.182
2.4 0.223 3.0 0.176
2.5 0.213
10.21 (3 points) What is the fitted value of α?

A. less than 2.5
E. at least 2.8
10.22 (2 points) What is the fitted value of θ?

A. less than 35
E. at least 65
10.23 (3 points) Two friends Bert and Ernie work at different insurers. They are each analyzing
similar data at their insurers. They have each calculated the following Negative Loglikelihoods for
Weibull Distributions using the data at their own insurer.
Negative Loglikelihoods for Bertʼs data:

Theta τ = 0.3 τ = 0.5 τ = 0.7 τ = 0.9 τ = 1.1
3000 1473.07 1447.75 1477.03 1562.76 1728.75
5000 1473.95 1443.55 1455.49 1500.83 1583.94
7000 1476.27 1446.16 1454.12 1487.66 1545.46
9000 1478.81 1450.43 1458.22 1488.05 1536.25
11000 1481.31 1455.09 1464.11 1493.31 1537.83
Negative Loglikelihoods for Ernieʼs data:

Theta τ = 0.3 τ = 0.5 τ = 0.7 τ = 0.9 τ = 1.1
3000 1100.98 1061.03 1046.93 1052.74 1079.22
5000 1098.15 1051.90 1026.09 1012.43 1008.35
7000 1097.72 1050.28 1022.01 1004.09 993.23
9000 1098.07 1050.96 1022.82 1004.60 992.77
11000 1098.72 1052.51 1025.44 1008.27 997.36
If they were to fit a Weibull Distribution to their combined data via maximum likelihood, what would
be the survival function at 22,000?
A. 1% B. 5% C. 10% D. 15% E. 20%
10.24 (3 points) Let x1 , x2 , ..., xn and y1 , y2 , ..., ym denote independent random samples of
severities from Region 1 and Region 2, respectively.
Pareto distributions with θ = 1, but different values of α, are used to model severities in these
regions. Past experience indicates that the average severity in Region 2 is half the
average severity in Region 1. You intend to calculate the maximum likelihood
estimate of α for Region 1, using the data from both regions.
Which of the following equations must be solved?
(A) n/α - Σln(xi) + 2m/(2α - 1) - 2Σln(yi) = 0.
(B) n/α - Σln(xi) + m/(α - 1) - Σln(yi) = 0.
(C) n/α - Σln(1 + xi) + 2m/(2α - 1) - 2Σln(1 + yi) = 0.
(D) n/α - Σln(1 + xi) + m/(α - 1) - Σln(1 + yi) = 0.


(i) Claim counts follow a Poisson distribution with mean µ.
(ii) Claim sizes follow an Exponential distribution with mean 10µ.
(iii) Claim counts and claim sizes are independent, given µ.
For a given policyholder you observe the following claims:
Year 1: 10, 70.
Year 2: No claims.
Year 3: 20, 30, 50.
Estimate µ for this policyholder, using maximum likelihood.
A. 1.0 B. 1.5 C. 2.0 D. 2.5 E. 3.0
10.26 (3 points) You are observe the following five sizes of loss:
29 55 61 182 270
Fit via maximum likelihood an Inverse Weibull Distribution with τ = 4.
What is the fitted value of θ?
A. 40 B. 42 C. 44 D. 46 E. 48
10.27 (3 points) Slippery Elm, Ent and expert treasure finder, searches for treasure in either the
ruins of Orthanc or the ruins of Minas Morgul.
He has made 12 trips to Orthanc and found a total of 8200 worth of treasure.
He has made 7 trips to Minas Morgul and found a total of 3100 worth of treasure.
The value of treasure that Slippery Elm finds on a trip to either location has a Gamma Distribution
with α = 3. However, the expected value of treasure found on a trip to Minas Morgul is assumed to
be 50% more than that on a trip to Orthanc.
Determine the maximum likelihood estimate of θ for a trip to Orthanc.
A. 160 B. 165 C. 170 D. 175 E. 180
10.28 (3 points) You observe the following five sizes of loss:

11 17 23 38 54
Fit via maximum likelihood a LogNormal Distribution with σ = 0.6 and µ unknown.
Use the fitted distribution in order to estimate the survival function at 75.
A. 1% B. 2% C. 3% D. 4% E. 5%

(i) Low-hazard risks have an exponential claim size distribution with mean 0.8θ.
(ii) Medium-hazard risks have an exponential claim size distribution with mean θ.
(iii) High-hazard risks have an exponential claim size distribution with mean 1.5θ.
(iv) Two claims from low-hazard risks are observed, of sizes 100 and 200.
(v) Two claims from medium-hazard risks are observed, of sizes 50 and 300.
(vi) Two claims from high-hazard risks are observed, of sizes 150 and 400.
(A) 180 (B) 190 (C) 200 (D) 210 (E) 220
10.30 (3 points) You are observe the following six sizes of loss:
9 15 25 34 56 90
Fit via maximum likelihood an Inverse Gamma Distribution with α = 3 and θ unknown.
What is mean of the fitted distribution?
A. less than 30
E. at least 45
10.31 (4 points)
You observe the following ten losses: 27, 98, 21, 219, 195, 33, 316, 11, 247, 45.
10 10 10 10
∑ xi = 1212. ∑ xi2 = 260,860. ∑ln[xi] = 42.5536. ∑ln[xi]2 = 193.948.
i=1 i=1 i=1 i=1
Using the method of maximum likelihood, a LogNormal distribution is fit to this data.
Using the method of moments, another LogNormal distribution is fit to this data.
Each LogNormal distribution is used to estimate the probability that a loss will exceed 100.
What is the absolute difference in these two estimates?
A. 0.01 B. 0.03 C. 0.05 D. 0.07 E. 0.09
19 45 64 186 370
Fit via maximum likelihood an Inverse Exponential Distribution.
A. 30 B. 35 C. 40 D. 45 E. 50

(i) Annual claim counts for a policyholder follow a Poisson distribution with mean λ.
(ii) Claim sizes follow an Exponential distribution with mean θ.
(iii) Claim counts and claim sizes are independent.
For 100 policyholders you observe four claims in a year, with the following sizes:
1000, 3000, 6000, 8000.
Using maximum likelihood, estimate λ and θ.
10.34 (3 points) The following data have been collected for a large insured:
1 100 10,000
2 200 12,500
Inflation increases the size of all claims by 10% per year.
A Gamma distribution with parameters α = 3 and θ is used to model the claim size
distribution.
Estimate θ for Year 3 using the method of maximum likelihood.
(A) 4000 (B) 4400 (C) 4800 (D) 5200 (E) 5600
32 45 71 120 178
Fit via maximum likelihood an Inverse Pareto Distribution with θ = 30.
What is the fitted value of τ?
A. 2.6 B. 2.8 C. 3.0 D. 3.2 E. 3.4
10.36 (2 points) You are given the following three observations:

1 2 3
You fit a distribution with the following density function to the data:
f(x) = (p+1) x2 {1 - (x/6)3 }p / 72, 0 < x < 6, p > -1
Determine the maximum likelihood estimate of p.
(A) 12 (B) 14 (C) 16 (D) 18 (E) 20
10.37 (3 points) Assume that the heights of maize plants are Normally Distributed.
You measure the heights of 10 mature self-fertilized maize plants:
35, 39, 45, 47, 48, 50, 51, 52, 54, 60.
You measure the heights of 10 mature cross-fertilized maize plants:
63, 64, 66, 67, 70, 72, 73, 74, 76, 82.
You assume that the two Normal distributions have the same coefficient of variation but the height of
cross-fertilized plants is on average 1.4 times the height of self-fertilized plants.
You fit via maximum likelihood using the data from both samples.
What is the fitted value of σ for the cross-fertilized plants?
A. 7 B. 8 C. 9 D. 10 E. 11
10.38 (3 points) You are observe the following n sizes of loss: x1 , x2 , ... , xn .
You assume a survival function of the form: S(x) = (1 + 5βx)-3, x > 0, β > 0.
Determine an equation that must be satisfied by the maximum Iikelihood β.

(i) Annual claim counts follow a Poisson distribution with mean θ.
(ii) Claim sizes follow an Exponential distribution with mean θ.
(iii) Claim counts and claim sizes are independent, given θ.
For a given policyholder you observe the following six claim sizes in a year:
2, 3, 5, 8, 9, 15.
Estimate θ for this policyholder, using maximum likelihood.
A. 5.0 B. 5.5 C. 6.0 D. 6.5 E. 7.0
10.40 (3 points) You are given a data set of size 80:

80 80
∑ xj = 4400. ∑ xj2 = 274,000.

j=1 j=1
For a Normal Distribution, determine the maximum loglikelihood.

(A) -390 (B) -370 (C) -350 (D) -330 (E) -310
1
10.41 (2 points) f(x) = , -θ < x < 2θ, θ > 0.
3θ
Determine the maximum likelihood value of θ for a sample of size n.
10.42 (3 points) You are given the following two observations: 100, 300.
You fit a Pareto Distribution with α = 4.
A. 550 B. 600 C. 650 D. 700 E. 750
10.43 (3 points) Let x1 , x2 ,..., x50 and y1 , y2 ,..., y80 denote independent random samples of
losses from State X and State Y, respectively.
Weibull distributions with the same θ, but different values of τ, are used to model losses in these
states. τ = 1 for State X, and τ = 2 for State Y.
50 80 80
∑ xi = 10,000. ∑ yi = 17,000. ∑ yi2 = 4,000,000.

i=1 i=1 i=1
Calculate the maximum likelihood estimate of θ, using the data from both states.
A. 200 B. 210 C. 220 D. 230 E. 240
10.44 (2, 5/83, Q.26) (1.5 points) Let X1 , X2 , . . . , Xn be a random sample from a distribution with
(x - θ)2 2
density function f(x) = exp[- ] , for x ≥ θ.
2 π
What is the maximum likelihood estimator for θ?

A. X B. min(X1 , X2 , . . . , Xn ) C. max(X1 , X2 , . . . , Xn ) D. X /2 E. 2 X
10.45 (2, 5/83, Q.32) (1.5 points) Let X1 , X2 , X3 , and X4 be a random sample from a distribution
(x - 4)
exp[- ]
β
with density function f(x) = , for x > 4, where β > 0.
β
If the data from this random sample are 8.2, 9.1, 10.6, and 4.9, respectively, what is the maximum
likelihood estimate of β?
A. 4.2 B. 7.2 C. 8.2 D. 12.2 E. 28.8
10.46 (2, 5/85, Q.49) (1.5 points) A random sample X1 , . . . , Xn is taken from a distribution with
density function f(x) = (θ + 1)xθ for 0 < x < 1, where θ > 0.

What is the maximum likelihood estimator of θ?
n n n n n
A. -1 - n/ ∑ ln xi B. n/ ∑ ln xi C. ∑ ln xi / n D. 1 + n/ ∑ ln xi E. 1 - ∑ ln xi /n
i=1 i=1 i=1 i=1 i=1
10.47 (4, 5/85, Q.54) (3 points) Fit via maximum likelihood a Weibull distribution with probability
density function: f(x; q) = q exp(-q x1/2) / (2 x1/2), x > 0, to the observations 1, 4, 9 and 64.
In which of the following ranges is the maximum likelihood estimate of q?
A. Less than 0.28
E. At least 0.31
10.48 (2, 5/88, Q. 27) (1.5 points) Let X1 , . . ., Xn be a random sample of size n from a continuous
θ exp[-θ x]
distribution with density function f(x) = for 0 < x, where 0 < θ.
2x
What is the maximum likelihood estimate of θ?

n n n n n
A. n / ∑ xi B. n / ∏ xi C. ∑ (1/ xi ) D. 2n/ ∑ (1/ xi ) E. ∑ xi / n
i=1 i=1 i=1 i=1 i=1
10.49 (4, 5/88, Q.55) (2 points) The following six observations came from a gamma distribution
that has its first parameter, α, equal to 3.0: 1.0, 2.0, 2.2, 2.8, 3.0, 4.1.
What is the maximum likelihood estimate of the second parameter, θ, assuming that α is 3.0?
A. Less than 0.80
E. 0.95 or more
10.50 (4, 5/89, Q.48) (2 points) A sample of n independent observations with values x1 ,...,xn
came from a distribution with a probability density function of:
f(x; q) = 2 q x exp[-q x2 ], x ≥ 0.
What is the maximum likelihood estimator for the unknown parameter q?
n n
∑ xi ∑ xi2
n n
A. i=1 B. n C. n D. i=1
n ln2
n
∑ xi ∑ xi2
i=1 i=1

10.51 (2, 5/90, Q.7) (1.7 points) Let X1 , . . . , X4 be a random sample from a normal distribution
with mean 3 and unknown variance σ2 > 0. If the sample values are 4, 8, 5, and 3, what is the value
of the maximum likelihood estimate of σ2?

A. 7/2 B. 9/2 C. 14/3 D. 5 E. 15/2
10.52 (4, 5/90, Q.45) (1 point) Let x1 , x2 , ... , xn be a random sample taken from a normal
distribution with mean µ = 0 and variance σ2 .
(x -µ)2
exp[- ]
2σ2 .
The normal distribution has its probability density function given by: f(x) =
σ 2π
Which of the following is the maximum likelihood estimator of σ?

n n n
∑ xi2 ∑ xi2 ∑ xi2
i=1 i=1
A. B. C. i=1
n n -1 n
n n
∑ xi ∑ xi2
i=1
D. E. i=1
n -1 n -1
10.53 (4, 5/91, Q.36) (2 points) Given the cumulative distribution function:
F(x) = xp for 0 ≤ x ≤ 1, and a sample of n observations, x1 , x2 , ... xn ,
what is the maximum likelihood estimator of p?
-n n ⎪⎧ n ⎪⎫ n
A. n B. n C. ⎨∏xi⎬
⎪⎩ i=1 ⎪⎭
∑ lnxi ∑ lnxi
i=1 i=1
n n
∑ lnxi ∑ xi
i=1 i=1
D. E.
n n
10.54 (4, 5/91, Q.47) (2 points) The following sample of 10 claims is observed: 1500, 6000,
3500, 3800, 1800, 5500, 4800, 4200, 3900, 3000. The underlying distribution is assumed to be
Gamma, with parameters α = 12 and θ unknown.
In what range does the maximum likelihood estimator of θ fall?
A. less than 300
E. at least 330
10.55 (2, 5/92, Q. 41) (1.7 points) Let X be a single observation from a continuous distribution
3 x
with density function f(x) = - , θ ≤ x ≤ 3θ, where θ > 0.
2θ 2θ2
What is the maximum likelihood estimator of θ?
A. X/3 B. 3X/5 C. 2X/3 D. X E. 3X
10.56 (4B, 5/92, Q.21) (2 points) A random sample of n claims, x1 , x2 , ..., xn , is taken from the
following exponential distribution: f(x) = e-x/θ/θ, x > 0.

Determine the maximum likelihood estimator for θ.
n n n n n
∑ lnxi ∑ xi ∑ lnxi ∑ xi ∑ exp[xi]
A. i=1 B. i=1 C. i=1 D. i=1 E. i=1
n n n n n
10.57 (4B, 5/92, Q.27) (2 points) The random variable X has the density function with parameter
(x / β) 2
exp[- ]
β given by f(x; β) = x 2 ; x > 0, β > 0.
β2
Where E[X] = (β/2) 2 π and the variance of X is: 2β2 - (π/2)β2 .

You are given the following observations of X: 4.9, 1.8, 3.4, 6.9, 4.0.
Determine the maximum likelihood estimate of β.
A. Less than 3.00
E. At least 3.45
10.58 (4B, 5/93, Q.7) (2 points) A random sample of n claims x1 , x2 , ..., xn , is taken from the
(xi - 1000)2
exp[- ]
probability density function f(xi) = 2θ , -∞ < xi < ∞.
2πθ
Determine the maximum likelihood estimator of θ.

n n n
∑ (xi -1000)2 ∑ (xi -1000)2 ∑ ln[(xi - 1000)2 ]
i=1 i=1 i=1
A. B. C.
n n n
n n
∑ ln[(xi - 1000)2 ] ∑ ln[(xi - 1000)2 ]
i=1
D. E. i=1
n n
10.59 (4B, 11/93, Q.8) (2 points) A random sample of 5 claims x1 ,..., x5 is taken from the
probability density function
αλ α
f(xi) = , α, λ, xi > 0.
(λ + xi)α + 1
In ascending order the observations are: 43, 145, 233, 396, 775.
Given that λ = 1000, determine the maximum likelihood estimate of α.
A. Less than 2.2
E. At least 3.7
• The random variable X has the exponential distribution given by
f(x) = λe-λx, x > 0.
• A random sample of three observations of X yields the values 0.30, 0.55, 0.80.
Determine the value of the maximum likelihood estimator of λ.
A. Less than 0.5
E. At least 2.0
f(x) = 2 (θ - x) / θ2 , 0 < x < θ.
A. Less than 0.45
E. At least 1.95
10.62 (2, 2/96, Q.12) (1.7 points) Let X1 , . . . , Xn be a random sample from a continuous
distribution with density function f(x) = α2α/xα+1, x ≥ 2, where α > 0.

Determine the maximum likelihood estimator of α.
n
∑Xi
n
A. min( X1 , . . . , Xn ) B. i=1 C. n
n
∑ ln[Xi]
i=1
n
D. max(X1 , . . . , Xn ) E. n
∑ ln[Xi] - n ln[2]
i=1
10.63 (2, 2/96, Q.33) (1.7 points) Let X1 ,..., Xn be a random sample from a continuous
distribution with density f(x) = e-x /(1 - e−θ), for 0 < x < θ, where 0 < θ < ∞.
n
A. ( ∏ Xi)1/ n
i=1
B. X
C. -In(1 - e-x)
D. minimum(X1 ,..., Xn )
E. maximum(X1 ,..., Xn )
• The random variable X has the density function f(x) = (1/θ) e-x/θ , x > 0.
• A random sample of three observations of X yields the values x1 , x2 , and x3 .
A. (x1 + x2 + x3 )/3
B. (ln x1 + ln x2 + ln x3 )/3
C. (1/x1 + 1/x2 + 1/x3 )/3
D. exp(-(x1 + x2 + x3 )/3)
E. (x1 + x2 + x3 )1/3
10.65 (4B, 5/96, Q.26) (1 point) Which of the following statements regarding loss distribution
models are true?
1. Method of moments estimators provide good starting values for iterative maximum likelihood
estimation.
2. A weight function may be used with minimum distance estimation.
3. A two-parameter model may be preferable to a three-parameter model in some cases.
A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
β2
β exp[- ]
f(x) = 2x , 0 < x < ∞, β > 0.
2π x3
• A random sample of three observations of X yields the values 100, 150, and 200.
Determine the maximum likelihood estimate of β .
A. Less than 11.5

E. At least 13.0
• The random variable X has one of the following three density functions:
f1 (x) = 1, 0 < x < 1
f2 (x) = 2x, 0< x < 1
f3 (x) = 3x2 , 0 < x < 1
Using the likelihood function, rank the three density functions from most likely to least likely based on
the two observations.
A. f1 (x), f2 (x), f3 (x) B. f1 (x), f3 (x), f2 (x) C. f2 (x), f1 (x), f3 (x)
D. f2 (x), f3 (x), f1 (x) E. f3 (x), f2 (x), f1 (x)
α
f(x) = , 0 < x < ∞, α > 0.
(x +1)α + 1
• A random sample of size n is taken of the random variable X.
Determine the limit of the maximum likelihood estimator of α, as the sample mean goes to infinity.
A. 0 B. 1/2 C. 1 D. 2 E. ∞
f(x) = e-x/θ /θ , 0 < x < ∞ , θ > 0.
• θ is estimated by maximum likelihood based on a large random sample of size n.
• p is the proportion of the observations in the sample that are greater than 1.
• The probability that X is greater than 1 is estimated by the estimator exp(-1/θ) .
Determine the estimator for the probability that X is greater than 1.
n
∑ xi
n
A. i=1
n
B. exp − n
[
∑ xi ] C. p D. - ln p E. -1/ ln p
i=1
10.70 (4B, 5/99, Q.14) (2 points)You are given the following:

• Claim sizes follow a distribution with density function f(x) = e-x/θ / θ , 0 < x < ∞, θ > 0.
• A random sample of 100 claims yields total aggregate losses of 12,500.
Using the maximum likelihood estimate of θ, estimate the proportion of claims that are greater than
250.
A. Less than 0.11
E. At least 0.14
f(x) = wf1 (x) + (1-w)f2 (x), 0 < x < ∞, 0 ≤ w ≤1.
• A single observation of the random variable X, yields the value 1.
∞
• ∫ x f1 (x) dx = 1
0
∞
• ∫ x f2 (x) dx = 2
0
• f2 (x) = 2f1 (x) ≠ 0

Determine the maximum likelihood estimate of w.
A. 0 B. 1/3 C. 1/2 D. 2/3 E. 1
10.72 (3, 5/00, Q.28) (2.5 points) For a mortality study on college students:
(i) Students entered the study on their birthdays in 1963.
(ii) You have no information about mortality before birthdays in 1963.
(iii) Dick, who turned 20 in 1963, died between his 32nd and 33rd birthdays.
(iv) Jane, who turned 21 in 1963, was alive on her birthday in 1998, at which time she left the study.
(v) All lifetimes are independent.
(vi) Likelihoods are based upon the Illustrative Life Table in Appendix 2A of Actuarial Mathematics.
Selected values of lx are as follows:
l20 = 9,617,802 l21 = 9,607,896 l22 = 9,597,695 l23 = 9,587,169
l30 = 9,501,381 l31 = 9,486,854 l32 = 9,471,591 l33 = 9,455,522
l55 = 8,640,861 l56 = 8,563,435 l57 = 8,479,908 l58 = 8,389,826
Calculate the likelihood for these two students.
(A) 0.00138 (B) 0.00146 (C) 0.00149 (D) 0.00156 (E) 0.00169
10.73 (4, 11/00, Q.6) (2.5 points) You have observed the following claim severities:
11.0 15.2 18.0 21.0 25.8
You fit the following probability density function to the data:
(x - µ)2
exp[- ]
f(x) = 2x , x > 0, µ > 0.
2πx
Determine the maximum likelihood estimate of µ.

(A) Less than 17
(E) At least 20
10.74 (4, 11/00, Q.34) (2.5 points) Phil and Sylvia are competitors in the light bulb business.
Sylvia advertises that her light bulbs burn twice as long as Philʼs. You were able to test 20 of Philʼs
bulbs and 10 of Sylviaʼs. You assumed that the distribution of the lifetime (in hours) of a light bulb is
^
exponential, and separately estimated Philʼs parameter as θ P = 1000 and Sylviaʼs parameter as
^
θ S = 1500 using maximum likelihood estimation.
Determine θ*, the maximum likelihood estimate of θP restricted by Sylviaʼs claim that
θS = 2 θP.
(A) Less than 900
(E) At least 1050
10.75 (4, 5/01, Q.16) (2.5 points)

A sample of ten losses has the following statistics:
10 10
∑X -2 = 0.00033674 ∑X0.5 = 488.97
i=1 i=1
10 10
∑X - 1 = 0.023999 ∑X = 31,939
i=1 i=1
10 10
∑X - 0.5 = 0.34445 ∑X 2 = 211,498,983
i=1 i=1
You assume that the losses come from a Weibull distribution with τ = 0.5.
Determine the maximum likelihood estimate of the Weibull parameter θ .
(A) Less than 500
(E) At least 3500
10.76 (2 points) In the previous question, 4, 5/01, Q.16, instead assume that the losses come from
an Inverse Gaussian distribution with θ = 4000.
Determine the maximum likelihood estimate of the Inverse Gaussian parameter µ .
(A) Less than 500
(E) At least 3500
10.77 (4, 5/01, Q.30) (2.5 points)

The following are ten ground-up losses observed in 1999:
18 78 125 168 250 313 410 540 677 1100
You are given:
(i) The sum of the ten losses equals 3679.
(ii) Losses are modeled using an exponential distribution with maximum likelihood estimation.
(iii) 5% inflation is expected in 2000 and 2001.
(iv) All policies written in 2001 have an ordinary deductible of 100 and a maximum covered loss of
1000. (The maximum payment per loss is 900.)
Determine the expected amount paid per loss in 2001.
(A) 256 (B) 271 (C) 283 (D) 306 (E) 371
10.78 (4, 11/01, Q.40 & 2009 Sample Q.79) (2.5 points) Losses come from a mixture of an
exponential distribution with mean 100 with probability p and an exponential distribution with mean
10,000 with probability 1- p. Losses of 100 and 2000 are observed.
Determine the likelihood function of p.
⎛ p e - 1 (1- p) e- .01⎞ ⎛ p e -20 (1- p) e- 0.2 ⎞
(A) ⎜ ⎟⎜ ⎟
⎝ 100 10,000 ⎠ ⎝ 100 10,000 ⎠
⎛ p e - 1 (1- p) e- .01⎞ ⎛ p e- 20 (1- p) e- 0.2 ⎞

(B) ⎜ ⎟ + ⎜ ⎟
⎝ 100 10,000 ⎠ ⎝ 100 10,000 ⎠
⎛p e - 1 (1- p) e- .01 ⎞ ⎛ p e- 20 (1- p) e- 0.2 ⎞

(C) ⎜ + ⎟⎜ + ⎟
⎝ 100 10,000 ⎠ ⎝ 100 10,000 ⎠
⎛p e - 1 (1- p) e- .01 ⎞ ⎛ p e -20 (1- p) e- 0.2 ⎞

(D) ⎜ + ⎟ + ⎜ + ⎟
⎝ 100 10,000 ⎠ ⎝ 100 10,000 ⎠
⎛ e-1 e- .01 ⎞ ⎛ e -20 e- 0.2 ⎞

(E) p ⎜ + ⎟ + (1- p) ⎜ + ⎟
⎝ 100 10,000 ⎠ ⎝ 100 10,000 ⎠
10.79 (4, 11/02, Q.10 & 2009 Sample Q. 37) (2.5 points) A random sample of three claims from
a dental insurance plan is given below: 225 525 950
Claims are assumed to follow a Pareto distribution with parameters θ = 150 and α.
Determine the maximum likelihood estimate of α.
(A) Less than 0.6
(E) At least 0.9
(i) Low-hazard risks have an exponential claim size distribution with mean θ.
(ii) Medium-hazard risks have an exponential claim size distribution with mean 2θ.
(iii) High-hazard risks have an exponential claim size distribution with mean 3θ.
(iv) No claims from low-hazard risks are observed.
(v) Three claims from medium-hazard risks are observed, of sizes 1, 2 and 3.
(vi) One claim from a high-hazard risk is observed, of size 15.
(A) 1 (B) 2 (C) 3 (D) 4 (E) 5
10.81 (4, 11/04, Q.6 & 2009 Sample Q.137) (2.5 points)
You are given the following three observations:
0.74 0.81 0.95
You fit a distribution with the following density function to the data:
f(x) = (p+1) xp , 0 < x < 1, p > -1
Determine the maximum likelihood estimate of p.
(A) 4.0 (B) 4.1 (C) 4.2 (D) 4.3 (E) 4.4
10.82 (CAS3, 5/05, Q.18) (2.5 points)

The following sample is taken from the distribution f(x, θ) = (1/θ)e-x/θ.
Observation 1 2 3 4 5 6 7
x 0.49 1.00 0.47 0.91 2.47 5.03 16.09
Determine the Maximum Likelihood Estimator of c, where P(X > c) = 0.75.
A. Less than 1.0
E. 1.6 or more
10.83 (CAS3, 11/05, Q.1) (2.5 points) The following sample was taken from a distribution with
probability density function f(x) = θxθ−1, where 0 < x < 1 and θ > 0.
0.21 0.43 0.56 0.67 0.72
Let R and S be the estimators of θ using the maximum likelihood and method of moments,
respectively. Calculate the value of R - S.
A. Less than 0.3
E. At least 0.6
10.84 (CAS3, 11/06, Q.2) (2.5 points)

Call center response times are described by the cumulative distribution function
F(x) = xθ+1, where 0 ≤ x ≤ 1 and θ > -1.
A random sample of response times is as follows:
0.56 0.83 0.74 0.68 0.75
Calculate the maximum likelihood estimate of θ.
A. Less than 1.4
E. At least 2.0
10.85 (CAS3, 5/07, Q.10) (2.5 points) Let Y1 , Y2 , Y3 , Y4 , ... ,Yn , represent a random sample
from the following distribution with p.d.f.
f(x) = e-x+θ, θ < x < ∞, -∞ < θ < ∞.
Which one of the following is a maximum likelihood estimator for θ?
n
A. ∑Yi
1
n
B. ∑Yi2
1
n
C. ∏ Yi
1
D. Minimum [Y1 , Y2 , Y3 , Y4 , ... ,Yn ]

E. Maximum [Y1 , Y2 , Y3 , Y4 , ... ,Yn ]
10.86 (CAS3, 5/07, Q.11) (2.5 points) The proportion of allotted time a student takes to
complete an exam, x, is described by the following distribution:
f(x) = (θ + 1) xθ, 0 ≤ x ≤ 1 and θ > -1.
A random sample of five students produced the following observations:
Student Proportion of Allotted Time
1 0.92
2 0.79
3 0.90
4 0.65
5 0.86
Using the sample data, calculate the maximum likelihood estimate of θ.
A. Less than 0
E. At least 3.0
10.87 (CAS3, 11/07, Q.6) (2.5 points)

Waiting times at a bank follow an exponential distribution with a mean equal to θ.
The first five people in line are observed to have had the following waiting times: 10, 5, 21, 10, 7.
• θ^ A = Maximum Likelihood Estimator of θ
• θ^ B = Method of Moments Estimator of θ
Calculate θ^ A - θ^ B.
A. Less than -0.6
E. At least 0.6
• A random sample of claim amounts:
8,000 10,000 12,000 15,000
• Claim amounts follow an inverse exponential distribution, with parameter θ.
Calculate the maximum likelihood estimator for θ.
A. Less than 9,000
E. At least 12,000
• A random variable X has probability density function: f(x; θ) = θ xθ−1, where 0 < x < 1 and θ > 0.
• A random sample of five observations from this distribution is shown below:
0.25 0.50 0.40 0.80 0.65
Calculate the maximum likelihood estimator for θ.
A. Less than 1.00
E. At least 1.30
• A random variable, X, has the following probability density function:
f(x, θ) = θ xθ−1, 0 < x < 1, 0 < θ < ∞.
• A random sample from this distribution is shown below:

0.10 0.25 0.50 0.60 0.70
A. Less than 0.4
E. At least 1.0
10.91 (CAS3L, 11/09, Q.18) (2.5 points)

You are given the following five observations from an inverse exponential distribution:
3 9 13 33 51
The probability density function of the inverse exponential distribution is:
θ e- θ / x
f(x) = .
x2
Calculate the maximum likelihood estimate for θ.
A. Less than 10
E. At least 25
• A distribution has density function:
f(x) = (θ + 1)(1 - x)θ for 0 < x < 1
• You observe the following four values from this distribution:

0.05 0.10 0.20 0.50
Calculate the maximum likelihood estimate of the parameter θ.
A. Less than 0.5
E. At least 3.5
⎛ x ⎞k
• Mortality follows the survival function S(x) = ⎜1 - ⎟ , 0 ≤ x ≤ 90, k > 0.
⎝ 90 ⎠
• For a sample size of two, the deaths are recorded as one at age 10 and one at age 50.
Calculate the maximum likelihood estimate of k.
A. Less than 1.0
E. At least 2.5
10.94 (2 points) In the previous question, CAS3L, 11/11, Q.18, calculate the method of moments
estimate of k.
10.95 (CAS3L, 11/11, Q.19) (2.5 points) You are given the following five observations:
2.3 3.3 1.2 4.5 0.7
A uniform distribution on the interval [a,b] is fit to these observations using maximum likelihood
estimation.
This produces parameter estimates a^ and b^ .
Calculate b^ - a^ .
A. Less than 4.0
E. At least 4.6
10.96 (CAS3L, 5/12, Q.18) (2.5 points)

You are given a distribution with the following probability density function where α is unknown:
1 1/α
f(x; α) = (1 + ) x , 0 < x < 1, α > 0.
α
You are also given a random sample of four observations:
0.2 0.5 0.6 0.8
Estimate α by the maximum likelihood method.
A. Less than 2.9
E. At least 3.2
10.97 (CAS3L, 11/12, Q.20) (2.5 points) You are given the following random sample:
0.15 0.25 0.55 0.60 1.10
The probability density function given below is selected to be fit to the random sample:
1/ 2 for 0 ≤ x ≤ a
f(x) =
1 for a < x ≤ 1 + a/ 2
where 0 ≤ a ≤ 2.
Select the range of the maximum likelihood estimate of a.
A. Less than 0.15
E. At least 0.30
10.98 (CAS3L, 5/13, Q.18) (2.5 points) You are given a sample from a random process:
5 10 7 4
The underlying distribution is assumed to be an Inverse Weibull distribution,
where τ = 2 and θ > 0 is unknown.
Determine the maximum likelihood estimate of θ for this distribution using this data.
A. Less than 3.0
E. At least 6.0
10.99 (CAS ST, 5/14, Q.6) (2.5 points) You are given the following probability distribution:
X P(X=x)
0 θ
1 2θ
2 1 - 3θ
A random sample was taken with the following results:
• The value 0 was observed 2 times.
A. Less than 0.15
E. At least 0.24
10.1. E. For the Pareto, f(x) = (αθα) (θ + x) − (α + 1).

For each given set of parameters, compute the likelihood for each of the four values of claim sizes
observed and multiply the result:
Size Pareto Pareto Pareto Pareto Pareto
1 1.5 2 2.5 3
10 12 15 18 20
2 0.0694 0.0850 0.0916 0.0961 0.1025
5 0.0444 0.0523 0.0563 0.0589 0.0614
7 0.0346 0.0396 0.0423 0.0440 0.0452
10 0.0250 0.0275 0.0288 0.0296 0.0296
0.00000267 0.00000484 0.00000627 0.00000736 0.00000842
Comment: Note that one can work with the sum of the loglikelihoods instead.
10.2. B. Maximize the loglikelihood. ln f(x) = ln(q+2) + ln(q+1) + q ln(x) + ln(1-x).
Σ ln f(x) = 3ln(q+2) + 3ln(q+1) + q {ln(0.2) + ln(0.3) + ln(0.6)} + ln(1-0.2) + ln(1-0.3)+ ln(1-0.4).

Setting the partial derivative with respect to q of the sum of the loglikelihoods equal to zero:
0 = 3/(q+2) + 3 /(q+1) + {ln(.2) + ln(.3) + ln(.6)}.
⇒ 0 = 3(q+1) + 3(q+2) - 3.32421(q+1)(q+2) ⇒ q2 + 1.195q - .7074 = 0.

Thus q = {-1.195 ± 1.1952 + (4)(0.7074) } / 2 = 0.434 or -1.629.
However, for q ≤ -1, the density function doesn't integrate to unity over [0,1];
for q ≤ -1 this integral is infinite.
Therefore q = 0.434 rather than -1.629.
Comment: This is a Beta Distribution, with a = q+1, b = 2, θ =1.
One can verify numerically that q = 0.434 corresponds to the maximum likelihood:
q f(.2) f(.3) f(.6) Likelihood
-0.100 1.6069 1.3502 0.7198 1.5617
-0.050 1.6062 1.3772 0.7602 1.6815
0.000 1.6000 1.4000 0.8000 1.7920
0.050 1.5889 1.4187 0.8393 1.8919
0.100 1.5733 1.4336 0.8780 1.9802
0.150 1.5537 1.4448 0.9160 2.0564
0.200 1.5307 1.4525 0.9534 2.1199
0.250 1.5047 1.4570 0.9901 2.1707
0.300 1.4759 1.4585 1.0261 2.2088
0.350 1.4449 1.4571 1.0612 2.2344
0.400 1.4120 1.4531 1.0956 2.2480
0.434 1.3887 1.4489 1.1185 2.2506
0.450 1.3775 1.4466 1.1292 2.2500
0.500 1.3416 1.4378 1.1619 2.2413
0.500 1.3416 1.4378 1.1619 2.2413
0.550 1.3048 1.4269 1.1938 2.2224
10.3. E. The Method of Maximum Likelihood is equal to the Method of Moments for the
Exponential Distribution fit to ungrouped data. Thus 1/q = 107.2, q = 1 / 107.2 = 0.0093.
Applying the Method of Maximum Likelihood, the density function for this Exponential Distribution is
f(x) = qe-qx. The loglikelihood is Σ lnf(xi) = Σ {lnq - qxi}.
To maximize this, set the partial derivative with respect to q equal to zero.
Σ{1/q - xi} = 0. ⇒ N/q - Σxi = 0. ⇒ q = N / Σxi = 6 / 643.19 = 0.0093.
10.4. D. Maximize the loglikelihood. f(x) = θ−αxα−1 e−x/θ / Γ(α) = θ−8x7 e−x/θ / Γ(8).
ln f(x) = -8ln(θ) +7ln(x) -x/θ - ln(7!).
loglikelihood = Σ ln f(xi) = -80ln(θ) + 7Σln(xi) - (1/θ)Σxi - 10ln(5040).
Setting the partial derivative with respect to θ of the loglikelihood equal to zero:
0 = 80/θ - (1/θ2)Σxi. Therefore, θ = Σxi / 80 = 36,900 / 80 = 461.
Comment: For the Gamma distribution with α known, the maximum likelihood estimate of θ is:
{Σxi /n} / α = (observed mean) / α = method of moments estimator.
10.5. C. With variance v, the density of the Normal Distribution is:

f(x) = exp(-0.5(xi - µ)2 /v) 2 πv .
With variance 1/(θmi), the density of the Normal Distribution is:
f(xi) = exp(-0.5(xi - µ)2 θmi) (θmi) / (2π)
ln f(xi) = -0.5(xi - µ)2 θmi + 0.5ln(θ) + 0.5ln(mi) - .5ln(2π).
∂ ∑ ln[f(xi)] = θΣ(xi - µ)mi = 0. ⇒ µ = Σximi/Σmi = {(0)(1) + (3)(2) + (8)(3)}/(1 + 2 + 3) = 5.

∂µ
∂ ∑ ln[f(xi)] = -0.5Σ(xi - µ)2mi + 0.5Σ(1/θ) = 0.

∂θ
⇒ θ = n/{Σ(xi - µ)2 mi} = 3/ {(0 - 5)2 (1) + (3 - 5)2 (2) + (8 - 5)2 (3)} = 3/60 = 0.05.
10.6. B. The density is 1/(b - 1). The likelihood is: f(1.7)f(3.1)f(3.4)f(4.6) = 1/(b - 1)4 .
Since we observe a loss of size 4.6, b ≥ 4.6. For b ≥ 4.6, the likelihood is maximized for b = 4.6.
10.7. A. & 10.8. E. f(x) = (1/σ) (1/ 2 π ) exp(-0.5{(x-µ)/σ}2 ).
ln f(x) = -0.5{(x-µ)2 /σ2} - ln(σ) - (1/2)ln(2π).
Σ ln f(xi) = -0.5{Σ(xi-µ)2 /σ2} - nln(σ) - (n/2)ln(2π).

Set the partial derivatives of the sum of loglikelihoods equal to zero.
∂ ∑ ln[f(xi)] = Σ(xi-µ)2/σ3 - n/σ = 0.
∂σ
∂ ∑ ln[f(xi)] = Σ(xi-µ)/σ2 = 0.
∂µ
Therefore Σ(xi-µ) = 0. µ = (1/n)Σxi = (6.02 + 7.56 + 7.88 + 8.42 + 8.72) / 5 = 7.72.
Therefore σ = ∑ (xi - µ)2 / n = 0.886 = 0.94.
Comment: Notice that for the Normal Distribution the Method of Moments and the Method of
Maximum Likelihood applied to ungrouped data give the same result.
10.9. E. For an Exponential Distribution, ln f(x) = -x/θ - ln(θ).

Assuming θB = θE/1.4, and θG = θE /1.7, then the loglikelihood is:
∑ {-xi 1.4 / θE - ln(θE / 1.4)} + ∑ {-xi / θE - ln(θ E)} + ∑ {-xi 1.7 / θE - ln(θE / 1.7)} =
Bay Empire Granite
-200000(1.4)/θE - 200ln(θE/1.4) - 500000/θE - 400ln(θE) - 75000(1.7)/θE - 100ln(θE/1.7) =

-907500/θE - 700ln(θE) + 200ln(1.4) + 100ln(1.7) .
Setting the partial derivative of the loglikelihood with respect to θE equal to zero:
0 = 907500/θE2 - 700/θE. θE = 907500/ 700 = 1296.

Comment: Similar to 4, 11/00, Q. 34. One could just multiply the losses observed for
Bay State by 1.4 and those for Granite State by 1.7, in order to get them up to the level of Empire
State. Then for the Exponential Distribution, the method of maximum likelihood equals the method
of moments: ((1.4)(200000) + 500000 + (1.7)(75000))/(200 + 400 + 100) = 1296.
Instead of states with different cost levels, it could have been years with different cost levels due to
inflation.
(lnx - µ)2
exp[- ]
2σ2
10.10. A. & 10.11. E. f(x) = .
x σ 2π
ln f(x) = -0.5{(ln(x)-µ)2 /σ2} - ln(σ) - ln(x) - (1/2)ln(2π).
Σ ln f(xi) = -0.5{Σ(ln(xi)-µ)2 /σ2} - nln(σ) - Σln(xi) - (n/2)ln(2π).

Set the partial derivatives of the sum of loglikelihoods equal to zero.
∂ ∑ ln[f(xi)] = Σ(ln(xi)-µ)2/σ3 - n/σ = 0.
∂σ
∂ ∑ ln[f(xi)] = Σ(ln(xi)-µ)/σ2 = 0.
∂µ
Therefore Σ(ln(xi)-µ) = 0. µ = (1/n)Σln(xi) = (6.02 + 7.56 + 7.88 + 8.42 + 8.72) / 5 = 7.72.
Therefore σ = ∑ (xi - µ)2 / n = 0.886 = 0.94.
Comment: Notice that for the LogNormal Distribution the Method of Maximum Likelihood gives the
same result as the Method of Maximum Likelihood applied to the Normal Distribution and the log of
the claim sizes. In general, the Method of Maximum Likelihood is invariant under such changes of
variables. In particular, if a set of parameters maximizes the likelihood of a LogNormal Distribution,
then they also maximize the likelihood of the corresponding Normal, as seen for example in this pair
and the previous pair of questions. In addition, since in the case of the Normal the Method of
Maximum Likelihood and the Method of Moments (applied to ungrouped data) produce the same
result, applying the Method of Maximum Likelihood to the LogNormal is the same as applying the
Method of Moments to the underlying Normal.
10.12. E. and 10.13. C. Setting up the maximum likelihood equations for the Pareto in terms of:
v = { Σ ln(1 +xi / θ) }/n, and w = { Σ 1/ (1+xi / θ) }/n.
The first step is to write down the likelihood function: f(xi) = α θα / (xi +θ)α+1.
Then write the sum of the log likelihoods: Σ {ln(α) + α ln(θ) - (α+1) ln(xi +θ)}
Then take the partial derivatives with respect to the two parameters α and θ and set them equal to
zero: Σ {1/α + ln(θ) - ln (xi +θ) } = 0, and Σ {α/θ − (α+1) / (xi +θ) } = 0.
The second equation becomes: nα/θ = (α+1) Σ{1 / (xi +θ)}.
α/(α + 1) = (1/n) Σ{1/(1 + xi / θ)} = w. Thus 1 + 1/α = 1/w
The first equation becomes: Σ1/α = Σ { ln (xi +θ) - ln(θ)}.
1/α = (1/n)Σ ln {(1 + xi / θ} = v. Thus α = 1/v.
So the solution to the second question is C.

Putting this into the second equation: 1+ v = 1/w.
So the solution to the first question is E (none of the above).
Comment: In the case of the ungrouped data in Section 2, the maximum likelihood fit was
determined to be α = 1.702 and θ = 240,151. Given θ = 240,151, one can calculate v = 0.5877
and w = 0.6298, and verify that in this case, α = 1/v and 1 + v = 1/w.
10.14. A. This is a Pareto Distribution with θ fixed at 1000.
The density function is: f(x) = (αθα)(θ + x) − (α + 1) = (α1000α)(1000 + x)−(α + 1).
The log likelihood is: Σ { ln(α) + α ln(1000) − (α+1)ln(1000 + xi)}.
The derivative with respect to α is: Σ { (1/α) + ln(1000) - ln(1000 + xi)}.
Setting this derivative equal to zero: 0 = (n/α) - Σ ln{(1000 +xi)/1000}.
Solving for alpha: α = n / Σ ln {(1000 +xi)/1000} = 10 / 4.151 = 2.41.

x = Size of Claim 1 +(x/1000) ln [1 +(x/1000)]
1729 2.729 1.004
101 1.101 0.096
384 1.384 0.325
121 1.121 0.114
880 1.880 0.631
3043 4.043 1.397
205 1.205 0.186
132 1.132 0.124
214 1.214 0.194
82 1.082 0.079
SUM 4.151
10.15. A. & 10.16. B. f(x) = {θ/ (2πx3 )}0.5 exp[- θ({x − µ} / µ)2 / 2x].
ln f(x) = 0.5 ln(θ) - 0.5ln(2π) - 1.5ln(x) - θ({x − µ} / µ)2 / 2x.
Σ ln f(xi) = (n/2) ln(θ) - (n/2) ln(2π) - 1.5Σ ln(xi) - (θ /2) Σ(xi /µ2 - 2/µ + 1/ xi).
∂ ∑ ln[f(xi)] = -(θ /2)Σ(-2xi /µ3 + 2/µ2 ) = 0.
∂µ
∂ ∑ ln[f(xi)] = n/(2θ) - (1/2)Σ(xi /µ2 - 2/µ + 1/xi) = 0.

∂θ
The first equation is: Σ 2/µ2 = Σ2xi /µ3. Therefore, n µ = Σ xi. ⇒ µ = Σ xi /n.
The second equation is: θ = n / Σ(xi /µ2 - 2/µ + 1/xi) = n /{nE[X]/µ2 - 2n/µ + nE[1/X]} =
1 1
= .
E[X]/ E[X]2 - 2 / E[X] + E[1/ X] E[1/ X] - 1/ E[X]
X 1/X
1500 0.00066667
5500 0.00018182
3000 0.00033333
3300 0.00030303
2300 0.00043478
Average 3120 0.00038393
1 1
Therefore, µ = 3120. θ = = = 15,769.
E[1/ X] - 1/ E[X] 0.00038393 - 1/ 3120
Comment: For the Inverse Gaussian, one can solve for the maximum likelihood parameters in
closed form. The parameter µ, which is equal to the mean of Inverse Gaussian, is set equal to the
observed mean. The fitted parameter θ is a function of the observed mean and the observed
negative first moment.
10.17. A. Σ ln f(xi) = (n/2) ln(θ) - (n/2) ln(2π) - 1.5Σ ln(xi) - (θ /2) Σ(xi /µ2 - 2/µ + 1/ xi).
Set the partial derivative of the loglikelihood with respect to θ equal to zero:
∂ ∑ ln[f(xi)] = n/(2θ) - (1/2)Σ(xi /µ2 - 2/µ + 1/xi) = 0.

∂θ
θ = n / (Σxi /µ2 - 2n/µ + Σ1/xi) = 5 / (15600/40002 - 10/4000 + 0.001920) = 12,670.
Comment: When the parameter µ is fixed, the fitted value of θ is different than when both µ and θ
are fit.
10.18. B. & 10.19. E. Convert the payments to the losses prior to the effect of the coinsurance
factor of 90%: (3236, 3759, 10769, 22832, 28329, 36703, 72369) / .9 =
3595, 4177, 11965, 25369, 31477, 40781, 80410.
(lnx - µ)2
exp[- ]
2σ2
f(x) = .
x σ 2π
ln f(x) = -0.5{(ln(x)-µ)2 /σ2} - ln(σ) - ln(x) - (1/2)ln(2π).
Σ ln f(xi) = -0.5{Σ(ln(xi)-µ)2 /σ2} - nln(σ) - Σln(xi) - (n/2)ln(2π).

Set the partial derivatives of the sum of loglikelihoods equal to zero:
∂ ∑ ln[f(xi)] = Σ(ln(xi)-µ)2/σ3 - n/σ = 0.
∂σ
∂ ∑ ln[f(xi)] = Σ(ln(xi)-µ)/σ2 = 0. ⇒ Σ(ln(xi)-µ) = 0. ⇒ µ = (1/n)Σln(xi) =

∂µ
{ln(3595)+ln(4177)+ln(11965)+ln(25369)+ln(31477)+ln(40781) + ln(80410)} / 7 = 9.76.

Therefore, σ = ( Σ(ln(xi)-µ)2 / n )0.5 = 1.175 = 1.08.
Comment: In general, fitting a LogNormal Distribution via Maximum Likelihood is equivalent to fitting
a Normal Distribution to the log sizes via the Method of Moments.
The original losses are all -ln.9 more than the logs of the payments. Therefore, if one fits a Normal
Distribution via Methods of Moments to the log payments, getting a mean of 9.655, one needs to
add -ln.9 in order to get the mean of the logs of the original losses. (Adding a constant to a Normal
Distribution gives another Normal Distribution; one adds a constant to the mean and leaves the
standard deviation the same.)
10.20. B. f(x) = τxτ−1θ−τ exp(-(x/θ)τ). ln f(x) = lnτ + (τ - 1) ln x - τlnθ - (x/θ)τ.

Set the partial derivative with respect to θ of the loglikelihood equal to zero:
∂ ∑ ln[f(xi)] = Σ −τ/θ + τxτ/θτ+1 = 0.

∂θ
Nτ/θ = (τ/θτ+1)Σ xτ. ⇒ θ = {(1/N)Σ xτ }1/τ = (3832632/200)1/1.5 = 716.

10.21. B. & 10.22. D. For the Gamma Distribution, the log density is:
ln f(x) = -αlnθ + (α−1)lnxi - (x/θ) - ln(Γ(α)).
Setting equal to zero the partial derivative of Σ ln (f(xi)) with respect to α one obtains:
0 = Σ {-lnθ + lnxi - ψ(α) } = -n lnθ - n ψ(α) + Σ lnxi.
Setting equal to zero the partial derivative of Σ ln (f(xi)) with respect to θ one obtains:
0 = Σ{-α/θ + xi θ−2} = -nα/θ - θ−2Σ xi.
This implies that: θ = {(1/n)Σxi} / α.
Substituting for θ in the first equation one obtains: ψ(α) - lnα = (1/n)Σ ln xi - ln ((1/n)Σxi).
Substituting in the particular values for this data set of 1000 points:
ψ(α) - ln(α) = (1/n) Σln(xi) - ln( (1/n) Σxi ) = 4800/1000 - ln(150000/1000) = -.211.
Interpolating in the given table of values of: ln(y) - ψ(y) gives α = 2.52.
Therefore, θ = (Σxi / n)/ α = 150 / 2.52 = 59.5.
Comment: Beyond what you are likely to be asked on the exam.
10.23. C. The loglikelihood for the combined data is the sum of the loglikelihoods for the individual
sets of data:
Negative Loglikelihoods for combined data:
Theta τ = 0.3 τ = 0.5 τ = 0.7 τ = 0.9 τ = 1.1
3000 2574.05 2508.78 2523.96 2615.50 2807.97
5000 2572.10 2495.45 2481.58 2513.26 2592.29
7000 2573.99 2496.44 2476.13 2491.75 2538.79
9000 2576.88 2501.39 2481.04 2492.65 2529.02
11000 2580.03 2507.60 2489.55 2501.58 2535.19
The best loglikelihood is for θ = 7000 and τ = 0.7. S(22000) = Exp[-(22000/7000).7] = 10.8%.
Comment: Since we are only shown a grid of parameter values, we only know that the actual
maximum likelihood would occur somewhere near θ = 7000 and τ = 0.7.
10.24. C. f(x) = (αθα)(θ + x)−(α + 1) = α(1 + x)−(α + 1). ln[f(x)] = ln(α) - (α+1)ln(1 + x).
Let α be the parameter for Region 1, and a be the parameter for Region 2.
Mean for region 1 is 1/(α-1). Mean for region 2 is 1/(a-1).
We are given: 1/(a-1) = 0.5/(α-1). ⇒ a = 2α - 1.
Loglikelihood is: n ln(α) − (α+1)Σln(1 + xi) + m ln(a) − (a+1)Σln(1 + yi) =
n ln(α) - (α+1)Σln(1 + xi) + m ln(2α - 1) − (2α)Σln(1 + yi).
Set the derivative with respect to α equal to zero:
n/α - Σln(1 + xi) + 2m/(2α - 1) - 2Σln(1 + yi) = 0.

Comment: Similar to 4, 11/04, Q.18, involving a Single Parameter Pareto Distribution.
10.25. D. The density of the Poisson is: e−µµn /n!.
The density of the Exponential is: e-x/(10µ)/(10µ).

Therefore, the contributions to the likelihood are:
Year 1: {e−µµ2 /2!} {e-10/(10µ)/(10µ)} { e-70/(10µ)/(10µ)}.
Year 2: e−µ.
Year 3: { e−µµ3 /3!} {e-20/(10µ)/(10µ)} {e-30/(10µ)/(10µ)} {e-50/(10µ)/(10µ)}.
Therefore, the likelihood is proportional to: e−3µµ5 e-18/µ/µ5 = e−3µ e-18/µ.

Therefore, ignoring constants, the loglikelihood is: -3µ - 18/µ.
Setting the derivative equal to zero: 0 = -3 + 18/µ2 . ⇒ µ = 6 = 2.45.

Comment: Setup taken from 4, 11/03, Q.11, a Buhlmann Credibility Question.
10.26. B. f(x) =
xτ + 1
exp [ ]
- ⎜ ⎟ .
⎝ x⎠
lnf(x) = ln[τ] + τ ln[θ] - (τ+1)ln[x] - (θ/x)τ.
∂lnf(x)
= τ/θ - τθτ−1/xτ.
∂θ
Setting the partial derivative of the loglikelihood equal to zero: τN/θ = τθτ−1 ∑ xi-τ . ⇒
⎛ N ⎞ 1/ τ 5
1/ 4
⎛ ⎞
θ= ⎜ ⎟
∑
= = 42.07.
⎜ xi- τ ⎟ ⎝ 29 - 4 + 55 - 4 + 61- 4 + 182 - 4 + 270 - 4 ⎠
⎝ ⎠
10.27. E. f(x) = x2 e-x/θ / (2θ3).

ln f(x) = 2ln(x) - x/θ - 3ln(θ) - ln(2).
Mean for Minas Morgul is 1.5 times that for Orthanc. ⇒ 3θM = 3(1.5θO). ⇒ θM = 1.5θO.
Loglikelihood is:
∑ 2ln(xi) - x i / θO - 12{3ln(θO)} - 12ln(2) + ∑ 2ln(xi) - xi / (1.5θO) - 7{3ln(1.5θO)} - 7ln(2).

Orthanc M.M.
Set the derivative of the loglikelihood with respect to θO equal to zero:
0= ∑ xi / θ O2 - 36/θO + ∑ xi / (1.5θ O2) - 21/θO.

Orthanc M.M.
57/θO = 8200 / θO2 + 3100 / (1.5θO2).
θO = {8200 + (3100 / 1.5)}/57 = 180.1.

Alternately, bring the value of the treasure from Minas Morgul to the Orthanc level: 3100/1.5 =
2066.7. For the Gamma with alpha fixed, method of moments is equal to the method of maximum
likelihood: 3θ = (8200 + 2066.7)/(12 + 7). ⇒ θ = 180.1.
Comment: This trick of adjusting the losses applies to the Gamma with alpha fixed, including its
special case the Exponential. It does not apply to severity distributions in general.
10.28. C. f(x) = exp[-.5 ({ln(x)−µ} /σ)2] / {xσ 2 π }.
ln f(x) = -0.5 ({ln(x)−µ} /σ)2 - ln(x) - ln(σ) - ln(2π)/2 = -1.3889({ln(x)−µ}2 - ln(x) - ln(0.6) - ln(2π)/2.
Setting the partial derivative of the loglikelihood with respect to µ equal to zero:
0 = 2.7778Σ{ln(xi)−µ}. µ = Σln(xi)/5 = {ln(11) + ln(17) + ln(23) + ln(38) + ln(54)}/5 = 3.199.
S(75) = 1 - Φ[(ln(75) - 3.199)/.6] = 1 - Φ[1.86] = 3.1%.

Alternately, take the logs of the losses and fit to a Normal, via maximum likelihood which is the same
as the method of moments.
µ = average of log losses = {ln(11) + ln(17) + ln(23) + ln(38) + ln(54)}/5 = 3.199.
S(75) = 1 - Φ[(ln(75) - 3.199)/.6] = 1 - Φ[1.86] = 3.1%.
10.29. A. For an Exponential with mean µ, f(x) = e-x/µ/µ and ln f(x) = -x/µ - lnµ.
Thus the loglikelihood is:
{-100/(0.8θ) - ln(0.8θ)} + {-200/(.8θ) - ln(0.8θ)} + {-50/θ - ln(θ)} + {-300/θ - ln(θ)} +
{-150/(1.5θ) - ln(1.5θ)} + {-400/(1.5θ) - ln(1.5θ)}=
-1091.67/θ - 6lnθ - 2ln0.8 - 2ln1.5.
Setting the derivative equal to zero: 0 = 1091.67/θ2 - 6/θ. ⇒ θ = 1091.67/6 = 182.

Alternately, convert all of the data to the medium-hazard level (which has mean θ.)
100 and 200 from 0.8θ. ⇔ 100/0.8 = 125 and 250 from θ.
150 and 400 from 1.5θ. ⇔ 150/1.5 = 100 and 266.67 from θ.
For the Exponential, Maximum Likelihood Method of Moments:
θ = (125 + 250 + 50 + 300 + 100 + 266.67)/6 = 182.
Comment: Similar to 4, 11/03 Q.34.
10.30. B. f(x) = (θ/x)3 e-θ/x/(x 2).

ln f(x) = 3ln(θ) - 4ln(x) - θ/x - ln(2).
0 = N 3/θ - Σ1/xi. θ = (6)(3)/Σ1/xi = 18/(1/9 + 1/15 + 1/25 + 1/34 + 1/56 + 1/90) = 65.2.
mean = θ/(α - 1) = 65.2/2 = 32.6.
10.31. D. Method of moments: Mean = 1212/10 = 121.2 = exp[µ + σ2/2].
Second Moment = 260,860/10 = 26,086 = exp[2µ + 2σ2].

exp[2µ + 2σ2]/exp[2µ + σ2] = exp[σ2] = 26,086/121.22 = 1.7758.
⇒ σ = 0.758. ⇒ µ = 4.510. S(100) = 1 - Φ[(ln(100) - 4.510)/0.758] = 1 - Φ[0.13] = 44.83%.

Fitting the LogNormal via the method of maximum likelihood is equivalent to fitting a Normal
Distribution via the method of moments to ln(xi): µ = E[ln(xi)] = 42.5536/10 = 4.255.
σ= E[ln(xi)2 ] - E[ln(xi )]2 = 193.948 / 10 - 4.255362 = 1.134.

S(100) = 1 - Φ[(ln(100) - 4.255)/1.134] = 1 - Φ[0.31] = 37.83%.
Absolute difference in the two estimates of S(100) is: |44.83% - 37.83%| = 7.00%.
10.32. E. f(x) = θx-2e-θ/x, x > 0. ln f(x) = ln(θ) - 2ln(x) - θ/x.
The loglikelihood is equal to: N ln(θ) - 2 Σ lnxi - θ Σ 1/xi.
0 = N/θ - Σ1/xi. ⇒ θ = N/Σ1/xi = 5/(1/19 + 1/45 + 1/64 + 1/186 + 1/370) = 50.7.
Alternately, if X is Inverse Exponential with parameter θ, then 1/X is Exponential with mean 1/θ.
Fitting an Exponential via maximum likelihood is the same as the method of moments.
Fitting an Exponential distribution with mean 1/θ via the method of moments to 1/X:
1/θ = (1/19 + 1/45 + 1/64 + 1/186 + 1/370)/5. ⇒ θ = 50.7.
^
10.33. For frequency, maximum Iikelihood is equal to method of moments: λ = 4/100 = 0.04.
For severity, maximum Iikelihood is equal to method of moments:
^
θ = (1000 + 3000 + 6000 + 8000) / 4 = 4500.
Comment: The terms in the likelihood involving lambda, multiply those involving theta.
⇒ In the loglikelihood, there is a sum of terms, some involving lambda and others involving theta.
When we set equal to zero the partial derivative with respect to lambda, we just get the usual result
for a Poisson.
When we set equal to zero the partial derivative with respect to theta, we just get the usual result for
an Exponential.
10.34. B. Since we want to estimate θ for Year 3, inflate all of the data to the cost level of Year 3:
10000(1.1)2 = 12100. 12500(1.1) = 13750.
The total inflated losses are: (100)(12100) + (200)(13750) = 3,960,000.
Average claim size = 3,960,000 /(100+200) = 13,200.
For the Gamma with alpha fixed, maximum likelihood is equal to method of moments.
3θ = 13,200. ⇒ θ = 13,200/3 = 4,400.
Alternately, f(x) = x2 e-x/θ /(2 θ3). ln f(x) = 2 lnx - x/θ - ln(2) - 3 ln(θ).
loglikelihood is: 2 Σ lnxi - Σxi/θi - N ln(2) - 3 Σ ln(θi), where the thetas differ by year.
Let θ = the year 3 theta. Let yi be the loss xi inflated to the year 3 level.
Then xi/θi = yi/θ.
Also, θi = θ/1.1c, where c is either 1 or 2. ⇒ ln(θi) = ln(θ/1.1c) = ln(θ) + constants.
loglikelihood is: 2 Σ lnxi - Σyi/θ - N ln(2) - 3 Σ ln(θi) = -3,960,000/θ - (3)(300) ln(θ) + constants.
Setting the derivative of the loglikelihood with respect to θ equal to zero:
0 = 3,960,000/θ2 - 900/θ. ⇒ θ = 3,960,000 / 900 = 4,400.

τθ x τ − 1
10.35. A. f(x) = .
(x + θ ) τ + 1
lnf(x) = ln[τ] + ln[θ] + (τ-1)ln[x] - (τ+1)ln[x+θ].
∂lnf(x)
= 1/τ + ln[x] - ln(x+θ).
∂τ
Setting the partial derivative equal to zero: N/τ = ∑ ( ln[xi + θ] - ln[xi] ) = ∑ ln[1 + θ / xi ]. ⇒
N
τ= =
∑ ln[1 + θ / xi ]
5
= 2.63.
ln[1 + 30 / 32] + ln[1 + 30 / 45] + ln[1 + 30 / 71] + ln[1 + 30 / 120] + ln[1 + 30 / 178]
N
Comment: For a Pareto Distribution, with θ fixed: α = .
θ + xi
∑ ln[ θ ]
10.36 .C. ln f(x) = ln[p+1] + 2 ln[x] + p ln[1 - (x/6)3 ] - ln[72].
∂ln[f(x)]
= 1/(p+1) + ln[1 - (x/6)3 ].
∂p
0= ∑ ∂ln[f(x
∂p
i)]
= 3/(p+1) + ln[1 - 1/216] + ln[1 - 1/27] + ln[1 - 1/8]. ⇒
3/(p+1) = 0.1759. ⇒ p = 16.05.

Comment: A Generalized Beta Distribution as per Appendix A of Loss Models,
with a = 1, b = p+1, θ = 6, and τ = 3.
10.37. B. For the Normal Distribution, f(x) =

exp - [(
x - µ)2
2σ2 ] .
σ 2π
(x - µ)2
ln f(x) = - - ln(σ) - ln(2π)/2.
2σ2
The first Normal for the self-fertilized plants has parameters: µ/1.4 and σ/1.4.
The second Normal for the cross-fertilized plants has parameters: µ and σ.
(The Normals have the same CV, but the mean of the second is 1.4 times the mean of the first.)
Let the two samples be xi and yi.
Then the loglikelihood over the two samples combined is:
10
(xi - µ / 1.4)2
10
(y i - µ)2
- ∑ 2 (σ / 1.4)2
- ∑ 2 σ2
- 20ln(σ) + constants.
i=1 i=1
Setting the partial derivative with respect to µ equal to zero:

10
(y i - µ) 10
(1.4x i - µ) 10 10
0= ∑ σ2
+ ∑ σ2
. ⇒ 20 µ = ∑ 1.4xi + ∑ yi = (1.4)(481) + 707. ⇒ µ = 69.02.
i=1 i=1 i=1 i=1
Setting the partial derivative with respect to σ equal to zero:

10
(1.4x i - µ)2 10
(y i - µ)2
0= ∑ σ3
+ ∑ σ3
- 20/σ. ⇒
i=1 i=1
20σ2 =
10 10
∑ (1.4x i - µ)2 + ∑ (y i - µ)2 + = (63 - 69.02)2 + (64 - 69.02)2 + ... + {(1.4)(60) - 69.02}2 .
i=1 i=1
σ2 = 64.4796. ⇒ σ = 8.03.
Alternately, put the self-fertilized heights on the cross-fertilized level by multiplying by 1.4:
49, 54.6, 63, 65.8, 67.2, 70, 71.4, 72.8, 75.6, 84. Now fit to the combined sample.
For the Normal Distribution, method of moment is equal to maximum likelihood.
µ = (63 + 64 + ... + 82 + 49 + 54.6 + ... + 84) / 20 = 69.02.
σ2 = {(63 - 69.02)2 + (64 - 69.02)2 + .. + (84 - 69.02)2 } / 20 = 64.4796. ⇒ σ = 8.03.

Comment: One can get the mean and variance of the combined sample using the stat functions of
the calculator. However, be careful; here we want the biased estimator of the variance rather than the
sample variance.
10.38. f(x) = 15β (1 + 5βx)-4. ln[f(x)] = ln(15) + ln(β) - 4 ln[1 + 5βx].

∂ ln[f(x)]
= 1/β - 20x / (1 + 5βx).
∂β
Setting equal to zero the partial derivative of the loglikelihood with respect to beta:
n
∑ 1 + 5βx
ix
n/β = 20 .
i
i=1
Comment: A Pareto Distribution with α = 3, and θ = 1/(5β).
10.39. D. The density of the Poisson is: e−θ θn /n!.
The density at six of the Poisson is: e−θ θ6 /6!
The density of the Exponential is: e-x/θ/θ.

Therefore, the likelihood:
(e−θ θ6 /6!) (e-2/θ/θ) (e-3/θ/θ) (e-5/θ/θ) (e-8/θ/θ) (e-9/θ/θ) (e-15/θ/θ) = exp[-θ - 42/θ] / 720.
Therefore, the loglikelihood is: -θ - 42/θ - ln[720].
Setting the derivative with respect to theta equal to zero: 0 = -1 + 42/θ2 . ⇒ θ = 42 = 6.48.
10.40. C. For a Normal Distribution, maximum likelihood is equal to the method of moments.
∧
X = 4400/80 = 55. ⇒ µ = 55.
∧
Second moment is: 274,000/80 = 3425. ⇒ σ = 3425 - 552 = 20.
(x -µ)2
exp[- ]
2σ2 . ln[f(x)] = - (x - µ)2 - ln[σ] - ln[2π]/2.
For a Normal Distribution, f(x) =
σ 2π 2σ2
Thus in this case with n = 80, the loglikelihood is:

80
∑ (xi - µ)2 - 80 ln[σ] - 40 ln[2π]

-1
2σ2
i=1
80 80
∑ ∑
-1 µ 40 µ2
= xj2 + 2 xj - - 80 ln[σ] - 40 ln[2π]
2σ2 σ σ2
j=1 j=1
= -137,000/σ2 + 4400 µ/σ2 - 40 µ2 / σ2 - 80 ln[σ] - 40 ln[2π].

For the maximum likelihood fit, the loglikelihood is:
-137,000/202 + (4400)(55) / 202 - (40)(552 ) / 202 - 80 ln[20] - 40 ln[2π] = -353.174.
1
10.41. The likelihood is: . This is a decreasing function of θ, for θ > 0.
(3θ)n
Thus we would like θ as small as possible. However we must have: -θ < x < 2θ.
Therefore, 2θ ≥ maximum of sample. ⇒ θ ≥ Maximum / 2.
Also, -θ ≤ minimum of sample. ⇒ θ ≥ -Minimum.
Therefore, θ^ = Max[-Minimum, Maximum / 2].
10.42. E. The log density of the Pareto is: ln(4) + 4 ln(θ) - 5 ln(x + θ).
Thus the loglikelihood is: 2ln(4) + 8ln(θ) - 5 ln(100 + θ) - 5 ln(300 + θ).
Setting the partial derivative of the loglikelihood with respect theta equal to zero:
0 = 8/θ - 5/(100 + θ) - 5/(300 + θ). ⇒
0 = 8(100 + θ) (300 + θ) - 5 (θ)(300 + θ) - 5(θ)(100 +θ) = -2θ2 + 1200θ + 240,000. ⇒

600 + 6002 - (4)(-120,000)
θ2 - 600θ - 120,000 = 0. ⇒ θ = = 758.
2
10.43. C. For the Weibull, ignoring constants: ln f(x) = -τ ln[θ] - (x/θ)τ.

n
loglikelihood is: - n τ ln[θ] - ∑ xiτ / θτ.
i=1
State X contribution to the loglikelihood is: -(50)(1) ln[θ] - 10,000/θ.
State Y contribution to the loglikelihood is: -(80)(2) ln[θ] - 4,000,000/θ2.
Thus the total loglikelihood is: -(50)(1) ln[θ] - 10,000/θ - (80)(2) ln[θ] - 4,000,000/θ2.
Set equal to zero the derivative with to theta of the loglikelihood:
-210/θ + 10,000/θ2 + 8,000,000/θ3 = 0. ⇒ 210θ2 - 10,000θ - 8,000,000. ⇒
10,000 + 10,0002 - (4)(210)(-8,000,000)
θ= = 220.44.
(2)(210)
10.44. B. f(x) = exp[-(x - θ)2 /2] 2 / π , for x ≥ θ. ln f(x) = -(x - θ)2 /2 - ln(2/π)/2.
loglikelihood is: -Σ(xi - θ)2 /2 - n ln(2/π)/2.
Set the partial derivative with respect to θ equal to zero: Σ(xi - θ) = 0. ⇒ θ = X .
Since x ≥ θ, θ ≤ min(X1 , X2 , . . . , Xn ) ≤ X .
The loglikelihood increases as θ increases towards X .
⇒ The maximum likelihood occurs for the largest possible θ, min(X1 , X2 , . . . , Xn ).

Comment: A case where one has to carefully check the endpoints, in order to find the maximum. If y
has a Normal Distribution with mean 0 and variance 1, then x = θ + |y| has the given distribution.
10.45. A. f(x) = exp[-(x-4)/β]/β. ln f(x) = -(x-4)/β - lnβ.
Loglikelihood is: -Σ(xi - 4)/β - n lnβ. Set the partial derivative with respect to β equal to zero:
Σ(xi - 4)/β2 - n/β = 0. ⇒ β^ = Σ(xi - 4)/n = {(8.2 - 4) + (9.1 - 4) + (10.6 - 4) + (4.9 - 4)}/4 = 4.2.
Comment: Let y = x - 4, then y follows an Exponential Distribution.
⇒ Maximum likelihood equals the Method of Moments: β^ = Y = 4.2.
10.46. A. ln f(x) = ln(θ + 1) + θ ln x. loglikelihood is: n ln(θ + 1) + θ Σln xi.
Set the partial derivative of the loglikelihood with respect to θ equal to zero:
n n n
0 = n/(θ + 1) + ∑ ln(xi ) ⇒ n/(θ + 1) = - ∑ ln(xi ). ⇒ (θ + 1)/n = -1/ ∑ ln(xi ).
i=1 i=1 i=1
n
⇒ θ = -1 - n/
^
∑ ln(xi ).
i=1
10.47. B. The log density is: ln f(x;q) = ln(q) - qx1/2 - ln(2) - ln(x1/2).
Thus the loglikelihood for the four observed values is:
ln f(1) + ln f(4) + ln f(9) + ln f(64) = 4 ln(q) - q(1 + 2 + 3 + 8) - 4ln(2) - ln((1)(2)(3)(8)).
To maximize the loglikelihood, set the partial derivative with respect to q equal to zero.
4/q - 14 = 0. Thus q = 4/14 = 0.286.
10.48. A. f(x) = θ exp[-θ x ]/ 2x . ln f(x) = lnθ - θ x - ln(2x)/2.

n n
loglikelihood is: n lnθ - θ ∑ xi - ∑ ln(2xi) / 2 .
i=1 i=1
n n
Setting the partial derivative with respect to theta equal to zero: 0 = n/θ - ∑ xi . ⇒ θ = n/
^
∑ xi .
i=1 i=1
10.49. B. f(x) = θ−αxα−1 e−x/θ / Γ(α) = θ−3x2 e−x/θ / 2.

∂ln[f(x)]
ln f(x) = -3 ln(θ) + 2 ln(x) - x/θ - ln(2). = -3/θ +x/θ2.
∂θ
n
Setting the derivative of the loglikelihood equal to zero: 0 = -3n/θ + (1/θ2) ∑ xi .
i=1
n
Therefore θ = ∑ xi / (3n) = (1 + 2 + 2.2 + 2.8 + 3 +4.1) / {(3)(6)} = 0.839.
i=1
Comment: For the Gamma with fixed α, the method of maximum likelihood applied to ungrouped
data is equal to the method of moments.
∂ln[f(x)]
10.50. C. ln f(x;q) = ln(2) + ln(q) + ln(x) - qx2 . = 1/q - x2 .
∂q
Setting the partial derivative with respect to q of the loglikelihood equal to zero:
n n
0=n/q- ∑ xi2 . Thus q = n / ∑ xi2 .
i=1 i=1
Comment: This is a Weibull Distribution with τ = 2 fixed and θ = q-1/2.
10.51. E. f(x) = exp[-0.5(x-3)2 /σ2]/{σ 2 π ).
loglikelihood = -0.5Σ(xi - 3)2 /σ2 - n ln(σ) - n(1/2)ln(2π).
Set the partial derivative with respect to σ of the loglikelihood equal to 0:
0 = Σ(xi - 3)2 /σ3 - n/σ. σ2 = (1/n)Σ(xi - 3)2 = {(4-3)2 + (8-3)2 + (5-3)2 + (3-3)2 }/4 = 7.5.
Comment: In this case, the maximum likelihood estimate is equal to the method of moments.
10.52. A. For µ = 0, f(x) = (1/σ) (1/ 2 π ) exp(-0.5{x2 /σ2 }).
ln f(x) = -0.5{x2 /σ2} - ln(σ) - (1/2)ln(2π). Σ ln f(xi) = -0.5{Σxi2 /σ2} - nln(σ) - (n/2)ln(2π).
∂ ∑ ln[f(xi)] = Σxi2/σ3 - n/σ = 0. Therefore σ = ( Σxi2 / n )0 . 5.

∂σ
Comment: We get for our estimate of σ2 the usual estimate for the variance of a distribution, which is
biased due to the use of n rather than n-1 in the denominator.
10.53. A. F(x) = xp , therefore f(x) = pxp-1. ln f(x) = ln p +(p-1)ln(x).
Set 0 =
∂ ∑ ln[f(xi)] = Σ (1/p + ln(xi)). ⇒ p = -n / Σ ln(xi).
∂p
Comment: Note that this is a Beta Distribution with a = p, b = 1, and θ = 1.
10.54. C. For the Gamma Distribution f(x) = θ−αxα−1 e−x/θ / Γ(α) .

ln f(x) = -α ln(θ) + (α−1) ln(x) -x/θ - ln Γ(α). Setting the partial derivative with respect to theta of the
loglikelihood equal to zero will maximize the likelihood:
0=
∂ ∑ ln[f(xi)] = Σ {−α/ θ + xi/θ2}. ⇒ θ = Σ xi / (nα) = 38000 / {(10)(12)} = 317.
∂θ
Comment: For the Gamma Distribution with fixed α, the method of maximum likelihood is equal to
the method of moments.
10.55. C. For a single observation, likelihood = f(x) = 3/(2θ) - x/(2θ2), θ ≤ x ≤ 3θ.

θ ≤ x. The largest possible value of θ is x, in which case the likelihood is: 3/(2x) - x/(2x2 ) = 1/x.
x ≤ 3θ. The smallest value of θ is x/3, in which case the likelihood is: 9/(2x) - 9x/(2x2 ) = 0.
∂ ln[f(x)]
= -3/(2θ2) - x/(θ3) = 0. ⇒ θ = 2x/3.
∂θ
In which case the likelihood is: 9/(4x) - 9/(8x) = 9/(8x) > 1/x.
This is indeed the maximum, so the maximum likelihood estimator of θ is 2X/3.
Comment: Choice E is not possible, since then θ = 3X ⇒ X = θ/3, which is outside the given
domain. One can try the other choices and see which one produces the largest likelihood.
For example, if X = 3, then 1 ≤ θ ≤ 3, and here is a graph of the likelihood as a function of θ:
Likelihood
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
Theta
1.5 2 2.5 3
The maximum likelihood occurs at: 2X/3 = 2.
10.56. B. One maximizes the likelihood by maximizing the loglikelihood.

n n n
The loglikelihood is: ∑ ln[f(xi)] = - ∑ ln(θ) + xi / θ = -n ln(θ) - (1/θ) ∑ xi .
i=1 i=1 i=1
We set the partial derivative of the loglikelihood with respect to θ equal to zero:
n n
0 = -n/θ +(1/θ2 ) ∑ xi . θ = (1/n) ∑ xi .
i=1 i=1
Comment: In this case, maximum likelihood is equivalent to the method of moments.

10.57. C. The log density is: ln f(x) = (-2 lnβ) + lnx - .5 (x/β)2 . The partial derivative with respect to
β of the loglikelihood is: Σ{(-2/β) + xi2 / β3 } = (-2n/β) + Σ xi2 / β3 .
Setting this equal to zero will maximize the loglikelihood: β = {Σ xi2 / (2n) }1/2.
For the observed data: Σ xi2 = 24.01 + 3.24 + 11.56 + 47.62 + 16 = 102.43, and the number of
points n = 5. Therefore the estimated β = {102.43 / 10 }1/2 = 3.20

Comment: A Weibull Distribution, with τ = 2 fixed.
n n
10.58. B. The loglikelihood function is: ∑ ln[f(xi)] = (-0.5n)ln(2πθ) - ∑ (xi - 1000)2 /(2θ)
i=1 i=1
n
= (-n/2)ln(2π) + (-n/2)ln(θ) - ∑ (xi - 1000)2 /(2θ) .
i=1
n
The partial derivative with respect to θ is: (-n/2)(1/θ) + ∑ (xi - 1000)2 /(2θ2) .
i=1
n
Setting this partial derivative equal to zero and solving for theta: θ = ∑ (xi - 1000)2 /n.
i=1
Comment: Maximizing the Loglikelihood is equivalent to maximizing the likelihood.

This a Normal distribution with fixed mean 1000 and with the variance rather than the standard
deviation as the parameter. The maximum likelihood estimate of θ is the usual estimate of the
variance from a sample.
10.59. E. Set equal to zero the partial derivative of the loglikelihood with respect to the single
parameter alpha; then solve for alpha.
f(xi) = α1000α(1000 + xi)−α−1. ln f(xi) = ln(α) + αln(1000) - (α+1)ln(1000 + xi).

n n
∑ ∑ {(1/ α) + ln(1000) - ln(1000 + xi)} = 0.

∂ln[f(xi)]
=
∂α
i=1 i=1
n
Therefore, α = n / ∑ ln[(1000 + xi) / 1000] = 5 /1.294 = 3.86.
i=1
xi (1000 + xi)/1000 ln((1000 + xi)/1000)

43 1.043 0.042
145 1.145 0.135
233 1.233 0.209
396 1.396 0.334
775 1.775 0.574
SUM 1.294
Comment: Note that this is a Pareto distribution with the scale parameter fixed at 1000.
10.60. D. For the exponential distribution with ungrouped data, the method of maximum likelihood
equals the method of moments. The mean of the exponential is in this case 1/λ.
Therefore λ = 1/ X = 3 / (0.3 + 0.55 + 0.80) = 1.82.
Comment: One can add up the log densities of ln(λ) - λx, set the partial derivative with respect to λ
equal to zero and solve for λ. This use of the method of maximum likelihood takes a little longer, but
yields the same solution.
10.61. D. Note that for the single parameter distribution defined in this question, θ appears in the
support (0, θ). Thus we desire θ > .9, so that the observed values will both be less than θ and thus
within the support of the distribution.
One can maximize the likelihood by maximizing the loglikelihood.
n n
The loglikelihood is: ∑ ln[f(xi)] = ∑ {ln(2) + ln(θ - xi) - 2ln(θ)} .
i=1 i=1
n n
∑ ∑ {1/
∂ln[f(xi)]
= (θ - xi) - 2 / θ} = 1/ (θ - 0.5) + 1/ (θ - 0.9) - 4 / θ.
∂θ
i=1 i=1
Setting the partial derivative of the loglikelihood equal to zero:

1/ (θ - 0.5) + 1/ (θ - 0.9) = 4 / θ. Thus, θ (θ - 0.9) + θ (θ - 0.5) = 4 (θ - 0.5) (θ - 0.9).
Thus, 2θ2 - 4.2 θ + 1.8 = 0. Thus, θ = {4.2 ± {4.22 - (4)(2)(1.8)}.5} / {(2)(2)} = 1.05 ± 0.45.
Thus, θ = 1.50 or .60. However, as discussed above θ > .9, so we reject θ = .6 and the maximum
likelihood estimate for θ is 1.50.
Comment: Note that if θ = 0.6, then the density at 0.9 is in fact zero, so that the likelihood, which is
f(0.5)f(0.9), would be zero rather than a maximum. For θ = 1.50 the likelihood is: f(0.5)f(0.9) =
(0.888)(0.533) = 0.473. One can check numerically that this is in fact the maximum likelihood:
θ f(.5) f(.9) Likelihood
1.0 1.0000 0.2000 0.2000
1.1 0.9917 0.3306 0.3278
1.2 0.9722 0.4167 0.4051
1.3 0.9467 0.4734 0.4482
1.4 0.9184 0.5102 0.4686
1.5 0.8889 0.5333 0.4741
1.6 0.8594 0.5469 0.4700
1.7 0.8304 0.5536 0.4598
1.8 0.8025 0.5556 0.4458
1.9 0.7756 0.5540 0.4297
2.0 0.7500 0.5500 0.4125
10.62. E. ln f(x) = ln(α) + α ln(2) - (α+1)ln(x).

n n
Loglikelihood = ∑ ln[f(xi)] = n ln(α) + n α ln(2) - (α+1) ∑ ln(xi ).
i=1 i=1
Set equal to zero the derivative of the loglikelihood with respect to alpha:
n n
0 = n/α + n ln(2) - ∑ ln(xi ). α = n / { ∑ ln(xi) - n ln(2)}.
i=1 i=1
10.63. E. e−θ is a decreasing function of θ. ⇒ 1 - e−θ is an increasing function of θ.
⇒ e-x /(1 - e−θ) is a decreasing function of θ.

The likelihood is a decreasing function of θ, so we want θ to be as small as possible.
We know each Xi < θ, and therefore θ ≥ maximum(X1 ,..., Xn ).
The best θ is maximum(X1 ,..., Xn ).

Comment: An example of where it is important to check the endpoints in order to find the maximum.
n n
∑ ln[f(xi)] = Σ(−lnθ - xi/θ). ⇒ ∑

∂ln[f(xi)]
10.64. A. f(x) = (1/θ)e-x/θ . ⇒ = Σ(−1/θ + xi/θ2 ).
∂θ
i=1 i=1
Setting the partial derivative with respect to θ of the loglikelihood equal zero:
n n n
0= ∑ (-1/ θ + xi / θ2) = -n/θ + (1/θ2 ) ∑ xi . Therefore θ = (1/n) ∑ xi = (x1 + x2 + x3 )/3.
i=1 i=1 i=1
Comment: For the Exponential Distribution, the Method of Maximum Likelihood applied to
ungrouped data is the same as the Method of Moments.
10.65. E. 1. True. 2. True. 3. True.

Comment: Adding additional parameters always allows one to fit the data better, but one should
only do so when they provide a significant gain in accuracy. “The principle of parsimony,” states
one should use the minimum number of parameters that get the job done. Thus in some cases a
two-parameter model such as a Gamma may be preferable to a three-parameter model such as a
Transformed Gamma.
10.66. B. ln f(x) = ln β - (1/2) ln(2π) - (3/2) ln(x) - β2/(2x).

n n n
∑ ∑ 1/ xi ∑ 1/ xi =
∂ln[f(x)] ∂ln[f(xi)]
= 1/β - β / x. 0 = = n/β - βΣ Therefore, β2 =n/
∂β ∂β
i=1 i=1 i=1
3/ {(1/100)+ (1/150) + (1/200)} = 3/(0.01+ 0.00667 + 0.005) = 138.44. ⇒ β = 11.77.
Comment: This is an Inverse Gamma Distribution, with α = 1/2 and θ = β2/2.
10.67. C. For ungrouped data, the likelihood function is the product of the density function at the
observed points. In this case the likelihood function is f(0.5)f(0.6).
Density f(0.5) f(0.6) Likelihood = f(0.5)f(0.6)
f1 (x) = 1 1 1 1
f2 (x) = 2x 1 1.2 1.2
f3 (x) = 3x2 0.75 1.08 0.81
Thus ranking the likelihoods from most likely to least likely: f2 (x), f1 (x), f3 (x). The second density is
the most likely with the largest likelihood of 1.2, while the third density is least likely with the smallest
likelihood of 0.81.
Comment: Note that 0.5 and 0.6 are both in the support of each of the density functions.
n n n
∑ ∑ {1/ α - ln(xi + 1) } = n/α - ∑ ln(xi + 1).

∂ln[f(xi)]
10.68. A. ln f(x) = lnα - (α+1)ln(x+1). =
∂α
i=1 i=1 i=1
Setting the partial derivative of the loglikelihood equal to zero and solving:
n
α=n/ ∑ ln(xi + 1). Since xi > 0, ln(xi+1) > 0.
i=1
n
Therefore, ∑ ln(xi + 1) > ln(1+largest claim) ≥ ln(1+ sample mean) > ln( sample mean).
i=1
n n
⇒ ∑ ln(xi + 1)/n > ln(sample mean)/n. ⇒ as sample mean goes to ∞ so does ∑ ln(xi + 1)/n.
i=1 i=1
Thus the denominator of α goes to infinity and α goes to 0.

Comment: The denominator of α is the log of the geometric average of one plus the claim sizes:
n n
∑ ln(xi + 1)/n = ln[{ ∏ ln(xi + 1)}1/n]. As the sample mean goes to infinity, so does the geometric
i=1 i=1
average of 1 + claim sizes, and therefore so does the log of the geometric average of 1 + claim
sizes. Thus the denominator goes to infinity and α goes to 0.
This old exam question could have been worded better. If the actual α were less than or equal to
one, then the actual mean is infinite. If that were the case, then for many finite samples, the sample
mean would be very big. We are asked what happens to our estimate of α as we look at those
samples in which the sample mean is very big.
10.69. B. For ungrouped data, the maximum likelihood estimator for the exponential is equal to the
n
method of moments. The theoretical mean of λ is set equal to the observed mean of ∑ xi / n.
i=1
n
θ= ∑ xi / n. The survival function is S(x) = exp(-x/θ).
i=1
n
S(1) = exp(-1/θ). Therefore the estimate of S(1) is: exp(-n/ ∑ xi ).
i=1
Comment: The answer is a probability and therefore should always be in the interval [0, 1].
Only choices B and C have that property.
10.70. D. Since for ungrouped data for the exponential distribution the Method of Maximum
Likelihood is the same as the Method of Moments, and since the mean is θ:
θ = 12500 / 100 = 125. F(x) = 1 - e-x/θ. 1 - F(250) = e-250/125 = 0.135.
10.71. A. The likelihood is wf1 (1) + (1-w)f2 (1) = wf1 (1) + (1-w)(2f1 (1)) = f1 (1)(2-w).
For w in [0,1], the likelihood is maximized for w = 0.
Comment: One has to assume that f1 has no parameters and that f2 has no parameters; we have
only one (remaining) parameter w. Since there is only a single observation, the likelihood is
maximized by applying all the weight to whichever density is larger at that observation. In this case,
the second density is larger at 1, so all the weight is applied to the second density.
10.72. C. The likelihood associated with Dick is:

(l32 - l33)/ l20 = (9,471,591 - 9,455,522)/9,617,802 = 0.001671.
The likelihood associated with Jane is: l56/ l21 = 8,563,435 /9,607,896 = 0.8913.
Therefore, the combined likelihood is: (.001671)(.8913) = 0.00149.
10.73. A. ln f(x) = -(x-µ)2 / (2x) - ln(x)/2 - ln(2π)/2.

n n n
∑ ∑ 1 - µ / xi = n - µ∑ 1/ xi .
∂ln[f(x)] ∂ln[f(xi)]
= (x-µ)/ x = 1 - µ/x. =
∂µ ∂µ
i=1 i=1 i=1
Setting the partial derivative equal to zero and solving for µ:

n
µ= n / ∑ 1/ xi = 5 / (1/11 + 1/15.2 + 1/18 + 1/21 + 1/25.8) = 16.74.
i=1
Comment: This is a Reciprocal Inverse Gaussian Distribution. See Insurance Risk Models by
Panjer and WiIlmot. Here is a graph of the loglikelihood as a function of µ:
Loglikelihood
- 15.4
- 15.5
- 15.6
- 15.7
- 15.8
- 15.9
mu
15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5
10.74. B. Let TP be the total lifetimes for Philʼs and TS be the total lifetimes for Sylviaʼs bulbs. 20
bulbs for Phil and 10 bulbs for Sylvia.
For the Exponential Distribution applied to ungrouped data, the Method of Moments equals the
Method of Maximum Likelihood. Therefore, TP/20= 1000 and TS/10 = 1500.
So TP = (20)(1000) = 20000 and TS = (10)(1500) = 15000.
For the Exponential Distribution, f(x) = e-x/θ/θ. ln f(x) = -x/θ - ln(θ).

Assuming θS = 2θP, then the loglikelihood is:
∑ {-xi / θP - ln(θ P)} + ∑ {-xi / 2θ P - ln(2θP)} = -TP/θP - 20ln(θP) - TS/ 2θP - 10{ln(θP)+ ln(2)}.
Phil Sylvia
0 = TP/θP2 - 20/θP + TS/ 2θP2 - 10/θP.
θP = (TP + TS/ 2)/(20 + 10) = (20000 + 15000/2)/(20 + 10) = 917.

Comment: One could just half the time observed for Sylviaʼs bulbs; 15,000 hours would have only
been 7500 hours if the bulbs had been Philʼs. Then one can apply the method of moments:
(20,000 + 15,000/2)/(20 + 10) = 917.
A graph of the loglikelihood as a function of θP:
Loglikelihood
- 242
- 243
- 244
- 245
- 246
- 247
- 248
ThetaPhil
600 800 1000 1200 1400
10.75. C. f(x) = τxτ−1θ−τ exp(-(x/θ)τ). ln f(x) = lnτ + (τ - 1) ln x - τlnθ - (x/θ)τ.

n n n
∑ ∑ -τ / θ + ∑ xiτ
∂ln[f(xi)]
= τxiτ / θτ + 1 = 0. ⇒ Nτ/θ = (τ/θτ+1)
∂θ
i=1 i=1 i=1
n n
θ = {(1/N) ∑ xiτ }1/τ = {(1/10) ∑ xi0.5 }1/0.5 = (488.97/10)2 = 2391.
i=1 i=1
Comment: For Maximum Likelihood for a Weibull Distribution with τ fixed,
θτ = the τth moment of the observed data.
10.76. D. For the Inverse Gaussian distribution with θ fixed, maximum likelihood is equal to the
method of moments. µ = X = 31,939/10 = 3194.
Alternately, f(x) = (θ/ 2πx3 ).5 exp[- θ({x − µ} / µ)2 / 2x].

n n
∑ ln[f(xi)] = (n/2) ln(θ) - (n/2) ln(2π) - 1.5Σ ln(xi) - (θ /2) ∑ (xi / µ2 - 2 / µ + 1/ xi) .
i=1 i=1
Set the partial derivative of the loglikelihood with respect to µ equal to zero:
n n
∑ ∑ (-2xi / µ 3 + 2 / µ2) = 0.
∂ln[f(xi)]
=-
∂µ
i=1 i=1
n n
⇒ nµ= ∑ xi ⇒ µ = ∑ xi /n = X = 31,939/10 = 3194.
i=1 i=1
10.77. C. For the Exponential Distribution, the method of maximum likelihood applied to
ungrouped data is the same as the method of moments: θ = mean.
Apply the method of maximum likelihood in 1999: θ1999 = 3679/10 = 367.9.
A 1000 maximum covered loss in 2001 is equivalent to a 1000/1.052 = 907.03 maximum covered
loss in 1999. Deflating the deducible and maximum covered loss, the average payment per loss is:
E[ X ∧ 907.03] - E[X ∧ 90.70] = θ(1 - e−907.03/θ) - θ(1 - e−90.70/θ) =
367.9(e-90.70/367.9 - e-907.03/367.9) = 256.25.
Inflating to year 2001 dollars: (256.25)(1.052 ) = 282.5.
Alternately, work in the year 2001. All of the loss amounts are multiplied by 1.052 .
Thus applying the method of maximum likelihood: θ2001 = (1.052 )(3679)/10 = 405.61.
Then the average payment per loss is: E[ X ∧ 1000] - E[X ∧ 100] =
θ(1 - e−1000/θ) - θ(1 - e−100/θ) = 405.61(e-100/405.61 - e-1000/405.61) = 282.5.
Alternately, if θ1999 = 367.9, this scale parameter is multiplied by the inflation factor over 2 years of
1.052 . Therefore, θ2001 = (1.052 )(367.9) = 405.61. Proceed as above.

10.78. C. f(x) = pe-x/100/100 + (1-p)e-x/10000/10000. The likelihood is: f(100) f(2000)

= {(pe- 1/100) + (1-p)e- . 0 1/10000}{(pe- 2 0/100) + (1-p)e- . 2/10000}.
Comment: The Maximum Likelihood occurs for p = 0.486.
loglikelihood
- 16.5
- 17.0
- 17.5
- 18.0
- 18.5
p
0.2 0.4 0.6 0.8 1.0
loglikelihood
- 16.3745
- 16.3750
- 16.3755
- 16.3760
- 16.3765
p
0.47 0.48 0.49 0.50 0.51
n n
10.79. B. f(x) = α θα (θ+x)−(α+1). ∑ ln[f(xi)] = ∑ {ln(α) + α ln(θ) - (α +1)ln(θ + xi)} .
i=1 i=1
n n
The derivative with respect to α is: ∑ {1/ α + ln(θ) - ln(θ + xi)} = N/α - ∑ ln[(θ + xi) / θ]}.
i=1 i=1
n
Setting this derivative equal to zero: 0 = (N/α) − ∑ ln[(θ + xi) / θ].
i=1
n
Solving for alpha: α = N / ∑ ln[(θ + xi) / θ] =
i=1
3/{ln((150 + 225)/150) + ln((150 + 525)/150) + ln((150 + 950)/150)} =

3/{ln(2.5) + ln(4.5) + ln(7.333)} = 0.68.
10.80. B. For an Exponential with mean µ, f(x) = e-x/µ/µ and ln f(x) = -x/µ - lnµ.
The claim of size 1 from the medium hazard level contributes: -1/(2θ) - ln(2θ).
The claim of size 15 from the high hazard level has µ = 3θ and x = 15, and contributes:
-15/(3θ) - ln(3θ). Thus the loglikelihood is:
{-1/(2θ) - ln(2θ)} + {-2/(2θ) - ln(2θ)} + {-3/(2θ) - ln(2θ)} + {-15/(3θ) - ln(3θ)} = -8/θ - 4lnθ - 3ln2 - ln3.
Setting the derivative equal to zero: 0 = 8/θ2 - 4/θ. ⇒ θ = 2.

Alternately, convert all of the data to the low-hazard level.
1, 2, and 3 from 2θ ⇔ 1/2, 1, 3/2 from θ. 15 from 3θ ⇔ 15/3 = 5 from θ.
For the Exponential, Maximum Likelihood equals Method of Moments:
θ = (1/2 + 1 + 3/2 + 5)/4 = 2.
Comment: The absence of any reported claims from the low hazard level, contributes nothing to the
loglikelihood. In the log density, x is the size of claim, not the aggregate loss. When estimating
severity, the number of claims is the amount of data; no claims provides no data for estimating
severity.
10.81. D. ln f(x) = ln(p+1) + p ln(x).

n n
∑ ∑ {1/ (p + 1) + ln(xi)} = 0.
∂ln[f(xi)]
=
∂p
i=1 i=1
n
⇒ p = -1 + n/ ∑ ln(xi ) = -1 - 3/{ln(.74) + ln(.81) + ln(.95)} = 4.327.
i=1
Comment: A Beta Distribution with a = p+1, b = 1, and θ = 1.
10.82. B. For the Exponential, maximum likelihood is equal to the method of moments.
θ^ = X = 26.46/7 = 3.78. 0.75 = Prob[X > c] = e-c/3.78. ⇒ c = 1.087.
10.83. A. This is a Beta Distribution with a = θ, b = 1, and what Loss Models calls θ = 1.
The mean is: a/(a + b) = θ/(θ + 1). X = (.21 + .43 + .56 + .67 + .72)/5 = 0.518.
θ/(θ + 1) = 0.518. ⇒ θ = 1.0747.

This is the method of moments estimator of θ, so that what the question calls S is 1.0747.
n
ln f(x) = lnθ + (θ - 1)ln(x). Loglikelihood = n lnθ + (θ - 1) ∑ ln(xi ).
i=1
Set the derivative of the loglikelihood with respect to θ equal to zero: n/θ + Σ ln(xi) = 0.
n
⇒ θ = -n/ ∑ ln(xi ) = -5/(ln.21 + ln.43 + ln.56 + ln.67 + ln.72) = 1.3465.
i=1
This is the maximum likelihood estimator of θ, so that what the question calls R is 1.3465.
R - S = 1.3465 - 1.0747 = 0.272.
Comment: One can calculate the mean of the given density by integrating x f(x) from 0 to 1.
10.84. D. F(x) = xθ+1. f(x) = (θ + 1)xθ. lnf(x) = ln(θ + 1) + θln(x).

n
loglikelihood is: n ln(θ + 1) + θ ∑ ln(xi ).
i=1
n
Set the partial derivative with respect to theta equal to zero: 0 = n/(θ + 1) + ∑ ln(xi ).
i=1
n
θ + 1 = -n/ ∑ ln(xi ) = -5/{ln(0.56) +ln(0.83) + ln(0.74) + ln(0.68) + ln(0.75)} = 2.873. ⇒ θ = 1.873.
i=1
Comment: A Beta Distribution with a = θ + 1 and b = 1.
n
10.85. D. ln f(x) = θ - x, θ < x. The loglikelihood is: n θ - ∑ xi , θ < xi.
i=1
The loglikelihood is an increasing function of θ, therefore we want the largest possible value of θ.
θ < xi. ⇒ θ < Minimum [Y1 , Y2 , Y3 , Y4 , ... ,Yn ].
Therefore, the largest possible value of θ is Minimum [Y1 , Y2 , Y3 , Y4 , ... ,Yn ].
Comment: One needs to be careful, since θ is part of the support of the density.
If any xi ≤ θ, then f(xi) = 0, and the likelihood is zero.
A shifted exponential distribution.
n
10.86. E. ln f(x) = ln(θ + 1) + θ lnx. The loglikelihood is: n ln(θ + 1) + θ ∑ ln(xi )
i=1
n
Set the partial derivative with respect to θ equal to zero: 0 = n/(θ + 1) + ∑ ln(xi ).
i=1
n
θ = -n/ ∑ ln(xi ) - 1 = - 5/{ln(.92) + ln(.79) + ln(.90) + ln(.65) + ln(.86)} - 1 = 3.97.
i=1
Comment: A Beta Distribution with a = θ + 1 and b = 1.

10.87. C. For the Exponential Distribution, method of moments ⇔ maximum likelihood.
θ A = θ B. ⇒ θ A - θ B = 0.
^ ^ ^ ^
Comment: θ^ = (10 + 5 + 21 + 10 + 7)/5 = 10.6.
10.88. C. f(x) = θe-θ/x/x2 . ln f(x) = ln(θ) - θ/x - 2ln(x).

∂ln[f(x)]
= 1/θ - 1/x.
∂θ
Set the partial derivative of the loglikelihood equal to zero:
n n
∑ ∑ 1/ xi .
∂ln[f(xi)]
0= = n/θ -
∂θ
i=1 i=1
n
θ = n/ ∑ 1/ xi = 4/(1/8000 + 1/10,000 + 1/12,000 + 1/15,000) = 10,667.
i=1
Alternately, if X is Inverse Exponential with parameter θ, then 1/X is Exponential with mean 1/θ.
Fitting an Exponential via maximum likelihood is the same as the method of moments.
Fitting an Exponential distribution with mean 1/θ via the method of moments to 1/X:
1/θ = (1/8000 + 1/10,000 + 1/12,000 + 1/15,000)/4. ⇒ θ = 10,667.
n
10.89. E. ln f(x) = ln(θ) + (θ - 1) ln(x). Loglikelihood is: n ln(θ) + (θ - 1) ∑ ln(xi ).
i=1
n
n/θ + ∑ ln(xi ) = 0.
i=1
n
θ = -n/ ∑ ln(xi ) = - 5/{ln(.25) + ln(.5) + ln(.4) + ln(.8) + ln(.65)} = 1.37.
i=1
Comment: This is a Beta Distribution as per Appendix A of Loss Models,

with a = θ, b = 1, and θ = 1.
10.90. D. ln f(x) = ln(θ) + (θ - 1)lnx. Therefore, for the given sample, the loglikelihood is:
5ln(θ) + (θ - 1) {ln0.1 + ln0.25 + ln0.5 + ln0.6 + ln0.7} = 5ln(θ) - 5.2495(θ - 1).
Setting the partial derivative with respect to theta equal to zero:
5/θ - 5.2495 = 0. ⇒ θ = 0.952.
Comment: A Beta Distribution as per Loss Models with a = θ, b = 1, and θ = 1.
10.91. A. lnf(x) = ln(θ) - θ/x - 2ln(x).

n n
loglikelihood is: n ln(θ) - θ ∑ 1 - 2 ∑ ln[xi] .
i=1 xi i=1
Set the partial derivative of the loglikelihood with respect to theta equal to zero:
n n
0 = n/θ - ∑ 1 . ⇒ θ^ = n . ⇒ θ^ = 5/(1/3 + 1/9 + 1/13 + 1/33 + 1/51) = 8.75.
i=1 xi 1
∑ xi
i=1
Alternately, let Y = 1/X. Then Y follows an Exponential Distribution with mean 1/θ.
The maximum likelihood fit to this Exponential is the same as the method of moments:
1/ θ^ = (1/3 + 1/9 + 1/13 + 1/33 + 1/51)/5. ⇒ θ^ = 8.75.
10.92. D. ln f(x) = ln(θ + 1) + θ ln(1-x).

∂ lnf(x)
= 1/(θ+1) + ln(1 - x).
∂θ
Σ ∂ lnf(x)
∂θ
= n/(θ+1) + Σln(1 - xi).
Setting equal to zero, the partial derivative of the log density with respect to theta:
0 = n/(θ+1) + Σln(1 - xi).
-n -4
⇒θ= -1= = 2.728.
∑ ln(1- xi) ln(0.95) + ln(0.9) + ln(0.8) + ln(0.5)
Comment: A Beta Distribution, with parameters a = 1, b = θ + 1, and θ = 1.

d S(x) k ⎛ x ⎞ k- 1
10.93. D. f(x) = - = ⎜1 - ⎟ .
dx 90 ⎝ 90 ⎠
ln f(x) = ln[k] - ln[90] + (k-1) ln[1 - x/90].
∂ ln f(x)
= 1/k + ln[1 - x/90].
∂k
Setting equal to zero the partial derivative of the loglikelihood with respect to k:
0 = 1/k + ln[1 - 10/90] + 1/k + ln[1 - 50/90].
-2
⇒k= = 2.154.
ln[8 / 9] + ln[4 / 9]
Comment: This is a Modified DeMoivre's Law, with ω = 90.

This is a Beta Distribution with parameters a = 1, b = k, and θ = ω = 90.
-n
In general, for ω fixed, when fitting the exponent k via maximum likelihood: k^ = n .
∑ ln[1 - xi / ω]
i=1
10.94. This is a Beta Distribution with parameters a = 1, b = k, and θ = 90.
The mean is: θ a / (a+b) = 90 / (1 + k). Set (10 + 50)/2 = 90 / (1 + k). ⇒ k = 2.

90 90 x = 90
x ⎞k x ⎞ k + 1⎤
∫0 S(x) dx = ∫0
⎛ 90 ⎛
Alternately, mean = ⎜1 - ⎟ dx = - ⎜1 - ⎟ ⎥⎦ = 90 / (1 + k).
⎝ 90 ⎠ k +1 ⎝ 90 ⎠
x =0
Set (10 + 50)/2 = 90 / (1 + k). ⇒ k = 2.
10.95. A. For the uniform distribution on [a, b], the estimate of b is the maximum of the sample,
and the estimate of a is the minimum of the sample.
Thus, b^ - a^ = 4.5 - 0.7 = 3.8.
Alternately, if a > 0.7, then the likelihood is zero.
If b < 4.5, then the likelihood is zero.
For a ≤ 0.7 and b ≥ 4.5, the likelihood is: (b-a)-5.
The likelihood is a decreasing function of b, so we want to take the smallest possible b,
which is 4.5.
The likelihood is an increasing function of a, so we want to take the largest possible a,
which is 0.7.
Thus, b^ - a^ = 4.5 - 0.7 = 3.8.
10.96. D. ln f(x) = ln[1 + 1/α] + ln[x]/α.

Loglikelihood is: 4 ln[1 + 1/α] + (ln[0.2] + ln[0.5] + ln[0.6] + ln[0.8])/α = 4 ln[1 + 1/α] + ln[0.048]/α.
Setting the partial derivative of the loglikelihood with respect to alpha equal to zero:
-1 4 -ln[0.048] -4
+ = 0. ⇒ 1 + 1/α = = 1.3173. ⇒ α = 3.15.
α 1 + 1/ α
2 α 2 ln[0.048]
Comment: This is a Beta distribution with b = 1, θ = 1, and a = 1 + 1/α.

-n -4
a^ = n = = 1.317. ⇒ α^ = 1/0.317 = 3.15.
ln[0.2] + ln[0.5] + ln[0.6] + ln[0.8]
∑ ln[xi / θ]
i=1
10.97. C. In order for the sample to contain 1.1, we must have 1.10 ≤ 1 + a/2.
Thus a ≥ 0.2; otherwise f(1.1) = 0.
The likelihood is: f(0.15) f(0.25) f(0.55) f(0.60) f(1.10).
If for example a = 0.2, then the likelihood is: (1/2)(1)(1)(1)(1) = 1/2.
If a = 0.25, then the likelihood is: (1/2)(1/2)(1)(1)(1) = 1/4.
If a = 0.55, then the likelihood is: (1/2)(1/2)(1/2)(1)(1) = 1/8.
The maximum likelihood occurs for: 0.2 ≤ a < 0.25.
Comment: An unusual question.
One can verify that the given density does integrate to one over its support.
In general, be very careful when given a density you have never seen before, particularly if the
parameter appears in the definition of the support and/or the support is finite.
⎧ 0, a < 0.2
⎪ 1/ 2, 0.2 ≤ a < 0.25
⎪
⎪1/ 4, 0.25 ≤ a < 0.55
The likelihood is: ⎨ .
⎪ 1/ 8, 0.55 ≤ a < 0.6
⎪ 1/ 16, 0.6 ≤ a < 1.1
⎪
⎩ 1/ 32, 1.1 ≤ a ≤ 2
10.98. D. For the Inverse Weibull: f(x) =
xτ + 1
[ ]
exp - ⎜ ⎟ .
⎝ x⎠
ln f(x) = ln[τ] + τ ln[θ] - (τ+1)ln[x] - (θ/x)τ.
∂lnf(x)
= τ/θ - τθτ−1/xτ.
∂θ
Setting the partial derivative of the loglikelihood equal to zero: τN/θ = τθτ−1 ∑ xi-τ . ⇒
⎛ N ⎞ 1/ τ 4
1/ 2
⎛ ⎞
θ= ⎜ ⎟
∑
= -2 = 5.486.
⎜ xi- τ ⎟ ⎝5 + 10 - 2 + 7 - 2 + 4 - 2 ⎠
⎝ ⎠
Comment: One could plug in τ = 2 and/or the data earlier than I did.
10.99. D. loglikelihood is: 2 ln(θ) + 5 ln(2θ) + 3 ln(1-3θ) = 7 ln(θ) + 5 ln(2) + 3 ln(1-3θ).

Set the partial derivative of the loglikelihood with respect to theta equal to zero:
0 = 7/θ + 3(-3)/(1 - 3θ). ⇒ 9θ = 7 - 21θ. ⇒ θ = 7/30 = 0.233.
2016-C-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/22/15, Page 328
Section 11, Fitting to Grouped Data by Maximum Likelihood
In order to fit a chosen type of size of loss distribution by maximum likelihood to grouped data, you
maximize the likelihood or equivalently you maximize the loglikelihood.
For grouped data, the likelihood for an interval [ai, bi] with ni claims is: {F(bi) - F(ai)} n i .
The likelihood is the product of terms from each interval.
The contribution to the loglikelihood from an interval is: ni ln[F(bi) - F(ai)].
For grouped data, the loglikelihood is a sum of terms over the intervals:
(number of observations in the interval) ln(probability covered by the interval).
To fit a distribution via maximum likelihood to grouped data you find the set of parameters such
that either Π{F(bi) - F(ai) } n i or Σ ni ln[F(bi) - F(ai)] is maximized.120
For example for the Weibull Distribution: F(x) = 1 - exp[-(x/θ)τ].
ln[F(bi) - F(ai)] = ln[exp(-(ai/θ)τ) - exp(-(bi/θ)τ)].
For a particular pair of values of θ and τ one can compute ni ln[F(bi) -F(ai)] for each of the observed
intervals and add up the results. For example, the second interval for the grouped data in
Section 3 contributes: 2247 ln[ F(10000) - F(5000) ].
For a Weibull with θ = 15,000 and τ = 1.1, this is: 2247 ln[0.4728 - 0.2582] = -3458.
Adding up the contributions from all the intervals gives a loglikelihood: -18,638.
Bottom of Top of # claims Probability Negative

Interval Interval in the F(lower) F(upper) for the Loglikelihood
$ Thous. $ Thous. Interval Interval
0 5 2208 0.0000 0.2582 0.2582 2990
5 10 2247 0.2582 0.4728 0.2146 3458
10 15 1701 0.4728 0.6321 0.1593 3124
15 20 1220 0.6321 0.7465 0.1143 2646
20 25 799 0.7465 0.8269 0.0805 2013
25 50 1481 0.8269 0.9767 0.1498 2812
50 75 254 0.9767 0.9972 0.0205 988
75 100 57 0.9972 0.9997 0.0025 342
100 Infinity 33 0.9997 1.0000 0.0003 266
10000 1.0000 18,638
120
ni is just the number of observed claims for the ith interval. Loss Models has the ith interval from (ci-1, ci].
10,000 might be the top of one interval and then 10,000 would not be included the next interval, rather 10,001
would be the smallest size included in the next interval. For ease of exposition, I have not been that precise here. In
practical applications, one would need to know in which interval a loss of size 10,000 is placed.
Note that since we do not know the size of each loss, we could not work with the density.
Rather we work with the differences in the Distribution Function at the end points of intervals.
For a selected set of parameter values, the loglikelihoods for the Weibull for the ungrouped data in
Section 3 are:
Tau
Theta 0.9 1.0 1.1 1.2
14,000 -18,864 -18,721 -18,731 -18,881
15,000 -18,843 -18,669 -18,638 -18,738
16,000 -18,857 -18,663 -18,605 -18,670
17,000 -18,899 -18,694 -18,618 -18,659
18,000 -18,962 -18,752 -18,667 -18,694
The values of θ and τ which maximize the loglikelihood are close to θ = 16,000 and τ = 1.1.121
It is worthy of note that this process makes no use of any information on dollars of loss in each
interval. This is in contrast to the method of moments. Thus for grouped data the method of
moments and the method of maximum likelihood do not produce the same fitted exponential
distribution, unlike the case with ungrouped data.
Usually one maximizes the loglikelihood by standard numerical algorithms,122 rather than by solving
the equations in closed form.123
121
A more exact computation yields θ = 16,148 and τ = 1.100.
122
For grouped data (as well as discrete frequency data) one can use the Method of Scoring.
123
The lack of access to a computer, restricts the variety of possible exam questions.
Examples of Fitted Distributions:
The calculated parameters of other distributions fit to the grouped data in Section 3 by the Method
of Maximum Likelihood are:
Distribution Parameters Fit via Maximum Likelihood

to the Grouped Data in Section 3
Exponential θ = 15,636
Pareto N.A. N.A.
Weibull θ = 16,184 τ = 1.0997
Gamma α = 1.2286 θ = 12,711
LogNormal µ = 9.2801 σ = 0.91629
Inverse Gaussian µ = 15,887 θ = 14,334
Transformed Gamma α = 3.2536 θ = 1861 τ =0 .59689
Generalized Pareto α = 8.0769 θ = 73,885 τ = 1.5035
Burr α = 3.9913 θ = 40,467 γ = 1.3124
As discussed with respect to the method of moments, the Pareto distribution has too heavy of a
righthand tail to fit the grouped data set from Section 3. The mean, coefficient of variation and
skewness for the maximum likelihood curves are as follows:
Grouped Maximum Likelihood Fitted Curves

Data Expon. Weibull Gamma TGamma Burr GenPar InvGauss LogNor
Mean ($000) 15.7 15.6 15.6 15.6 15.7 15.7 15.7 15.9 16.3
CV ≈1 1.00 0.91 0.90 0.95 0.98 0.97 1.05 1.15
Skewness ≈3 2.00 1.73 1.80 2.39 3.19 2.73 3.16 4.95
Just as with ungrouped data, the values of loglikelihoods can be usefully looked at in order to
compare the maximum likelihood curves. The negative loglikelihoods fit to the grouped data in
Section 3 are:
Distribution Neg. Log Likelihood Distribution Neg. Log Likelihood

Burr 18533.9 Weibull 18604.3
Generalized Pareto 18536.0 LogNormal 18627.4
Transformed Gamma 18541.0 Inverse Gaussian 18656.4
Gamma 18580.5 Exponential 18660.9
Based on this criterion the Burr is the best fit to the grouped data in Section 3.
Below are shown the survival functions for some of these distributions fit via maximum likelihood, as
well as the empirical survival function.
Survival Functions at x times 1000

5 10 15 20 25 50 75 100 150
Data 0.7792 0.5545 0.3844 0.2624 0.1825 0.0344 0.0090 0.0033
Exponential 0.7263 0.5275 0.3832 0.2783 0.2021 0.0409 0.0083 0.0017 0.0001
Weibull 0.7597 0.5549 0.3986 0.2830 0.1993 0.0315 0.0045 0.0006 0.0000
Gamma 0.7701 0.5584 0.3971 0.2795 0.1953 0.0308 0.0047 0.0007 0.0000
Trans. Gam. 0.7809 0.5478 0.3801 0.2648 0.1859 0.0360 0.0082 0.0021 0.0002
Burr 0.7798 0.5536 0.3830 0.2636 0.1825 0.0348 0.0091 0.0030 0.0005
Data 0.7792 0.5545 0.3844 0.2624 0.1825 0.0344 0.0090 0.0033
It appears that the Burr distribution fits best. This will be seen more clearly when the mean excess
losses are compared.
Limited Expected Value, Empirical versus Fitted Values:
The empirical Limited Expected Values at the endpoints of the intervals were computed previously
for the grouped data in Section 3. For the curves fit by maximum likelihood to this grouped data, the
Limited Expected Values are as follows:
Limited Expected Value ($000)

x Trans. Gen. Log
$ Thous. Data Expon. Weibull Gamma Gamma Pareto Burr Normal
5 4.5 4.3 4.4 4.4 4.5 4.5 4.5 4.6
10 7.8 7.4 7.7 7.7 7.8 7.8 7.8 7.9
15 10.1 9.6 10.0 10.1 10.1 10.1 10.1 10.1
20 11.7 11.3 11.7 11.8 11.7 11.7 11.7 11.6
25 12.8 12.5 12.9 13.0 12.9 12.8 12.8 12.7
50 15.0 15.0 15.2 15.2 15.1 15.0 15.0 15.0
75 15.5 15.6 15.5 15.6 15.6 15.5 15.5 15.7
100 15.6 15.6 15.6 15.6 15.7 15.6 15.6 16.0
150 15.6 15.6 15.6 15.7 15.7 15.7 16.2
200 15.6 15.6 15.6 15.7 15.7 15.7 16.3
250 15.6 15.6 15.6 15.7 15.7 15.7 16.3
Infinity 15.7 15.6 15.6 15.6 15.7 15.7 15.7 16.3
For example, for the LogNormal distribution,

With µ = 9.28 and σ = 0.916, E[X ∧ 25000] = exp(9.6995)Φ[0.00833] + 25000 {1 - Φ[0.9243]} =
(16310)(0.5033) + (25000)(0.1777) = 12,651.
Mean Excess Loss, Empirical versus Fitted Values:
The empirical Mean Excess Losses at the endpoints of the intervals were computed previously for
the grouped data in Section 3. For the curves fit by maximum likelihood to this grouped data, the
mean excess losses are as follows:
e(x) $ Thousand
x Trans. Gen. Log
$ Thousand Data Expon. Weibull Gamma Gamma Pareto Burr Normal
0 15.8 15.6 15.6 15.6 15.7 15.7 15.7 16.3
5 14.4 15.6 14.7 14.5 14.3 14.3 14.4 14.6
10 14.3 15.6 14.3 14.1 14.4 14.3 14.2 15.8
15 14.6 15.6 14.0 13.8 14.7 14.6 14.5 17.4
20 15.2 15.6 13.7 13.7 15.0 15.1 15.0 19.0
25 15.9 15.6 13.5 13.6 15.4 15.6 15.7 20.6
50 21.2 15.6 12.8 13.2 17.2 18.7 20.0 28.1
75 27.1 15.6 12.4 13.1 18.8 22.0 25.0 35.0
100 30.2 15.6 12.1 13.0 20.1 25.5 30.3 41.3
150 15.6 11.7 12.9 22.5 32.4 41.3 53.2
200 15.6 11.4 12.9 24.5 39.5 52.6 64.3
250 15.6 11.1 12.8 26.2 46.5 64.1 74.8
For example, for the LogNormal distribution,

e(x) = exp(µ + σ2/2) {1 - Φ[(lnx − µ − σ2)/σ] / {1 - Φ[(lnx − µ)/σ]} - x.
With µ = 9.28, σ = 0.916, e(25000) = exp(9.6995)(1-Φ[.00833]) / {1 - Φ[.9243]} - 25000
= (16310)(.4967)/0.1777 - 25000 = 20,589.
The following graph compares the mean excess losses for the Burr (solid), the Transformed
Gamma (dashed) and the data (points):
35000
30000
25000
20000
15000
20000 40000 60000 80000 100000 120000

With Only Two Groups:
Exercise: You observe that out of 50 claims, 5 are of size greater than 1000.
An Exponential distribution is fit to this data via the method of maximum likelihood.
[Solution: The likelihood is: (1 - e-1000/θ)45 (e-1000/θ)5 .
Let y = e-1000/θ. Then the likelihood is: (1 - y)45 y5 .

Set the derivative with respect to y equal to zero: 0 = -45(1 - y)44 y5 + 5(1 - y)45 y4 .
⇒ 0 = -45y + 5(1 - y). ⇒ y = 5/50 = 0.1. ⇒ θ = -1000/ln0.1 = 434.]
Notice that for the fitted θ = 434, F(1000) = 1 - e-1000/434 = 90.0% = 45/50
= empirical distribution function at 1000.
In general, when we have data grouped into only two intervals, one can fit a single parameter via
boundary between the two intervals. This is mathematically equivalent to percentile matching.
Assume there are n1 claims from 0 to b and n2 claims from b to ∞.

Then the likelihood is: F(b) n 1 {1 - F(b)} n 2 . Let y = F(b).
Then the likelihood is y n 1 (1 - y) n 2 , a Beta Distribution with a = n1 + 1, b = n2 + 1, and θ = 1.
The mode of this Beta Distribution is: (a - 1)/(a + b - 2) = n1 /(n1 + n2 ).124
Thus the likelihood is maximized for F(b) = y = n1 /(n1 + n2 ) = empirical distribution function at b.
124
Fitting a Two Parameter Distribution to Three Groups:
Assume we have data grouped into three intervals.

Assume there are n1 claims from 0 to a, n2 claims from a to b, and n3 claims from b to ∞.
Make the change of variables: β = F(a) and γ = S(b).

The loglikelihood is: n1 lnF(a) + n2 ln[F(b) - F(a)] + n3 lnS(b) = n1 lnβ + n2 ln[1 - β - γ] + n3 lnγ.
Setting the partial derivatives with respect to β and γ of the loglikelihood equal to zero:
0 = n1 /β - n2 /(1 - β - γ). ⇒ β/(1 - β - γ) = n1 /n2 .
0 = -n2 /(1 - β - γ) + n3 /γ. ⇒ γ/(1 - β - γ) = n3 /n2 .
Solving: β = n1 /( n1 + n2 + n3 ), and γ = n3 /( n1 + n2 + n3 ).
In other words, set F(a) = n1 /( n1 + n2 + n3 ) = observed proportion from 0 to a, and

S(b) = n1 /( n1 + n2 + n3 ) = observed proportion from b to ∞.
This is mathematically equivalent to percentile matching.
Exercise: Fit a Weibull Distribution via maximum likelihood to the following grouped data.
(0, 200) 32
[200, 500) 50
[500, ∞) 18
[Solution:
1 - exp[-(200/θ)τ] = F(200) = 32/100. ⇒ (200/θ)τ = 0.3857. ⇒ τ ln200 - τlnθ = -0.9523.
exp[-(500/θ)τ] = S(500) = 18/100. ⇒ (500/θ)τ = 1.7148. ⇒ τ ln500 - τlnθ = 0.5393.

0.5393 + 0.9523
Therefore, τ = = 1.628. ⇒ θ = 359.]
ln(500) - ln(200)
Problems:
11.1 (3 points) You observe that out of 60 claims, 10 are of size greater than 5.
A Weibull distribution assuming τ = 3, is fit to this data via the method of maximum likelihood.
A. less than 3.7
E. at least 4.1
11.2 (3 points) The following 100 losses are observed in the following intervals:
[0, 1] 40
[1, 2] 20
[2, 5] 25
[5, ∞) 15
You wish to fit a Pareto Distribution to this data via the method of maximum likelihood.
Which of the following functions should be maximized?
⎧ ⎛ θ ⎞ α⎫40 ⎧⎛ θ ⎞ α ⎛ θ ⎞ α⎫60 ⎧⎛ θ ⎞ α ⎛ θ ⎞ α⎫85
A. ⎨1 - ⎜ ⎟ ⎬ ⎨⎜ ⎟ - ⎜ ⎟ ⎬ ⎨⎜ ⎟ - ⎜ ⎟ ⎬
⎩ ⎝ θ + 1⎠ ⎭ ⎩⎝ θ + 1⎠ ⎝ θ + 2 ⎠ ⎭ ⎩⎝ θ + 2 ⎠ ⎝θ + 5⎠ ⎭
⎧ ⎛ θ ⎞ α ⎫0.4 ⎧⎛ θ ⎞ α ⎛ θ ⎞ α ⎫0.6 ⎧⎛ θ ⎞ α ⎛ θ ⎞ α⎫0.85 ⎛ θ ⎞ α

B. ⎨1 - ⎜ ⎟ ⎬ ⎨⎜ ⎟ - ⎜ ⎟ ⎬ ⎨⎜ ⎟ - ⎜ ⎟ ⎬ ⎜ ⎟
⎩ ⎝ θ + 1⎠ ⎭ ⎩⎝ θ + 1⎠ ⎝ θ + 2 ⎠ ⎭ ⎩⎝ θ + 2 ⎠ ⎝θ + 5⎠ ⎭ ⎝θ + 5 ⎠
⎧ ⎛ θ ⎞ α ⎫40 ⎧⎛ θ ⎞ α ⎛ θ ⎞ α⎫20 ⎧⎛ θ ⎞ α ⎛ θ ⎞ α⎫25 ⎛ θ ⎞ 15α

C. ⎨1 - ⎜ ⎟ ⎬ ⎨⎜ ⎟ - ⎜ ⎟ ⎬ ⎨⎜ ⎟ - ⎜ ⎟ ⎬ ⎜ ⎟
⎩ ⎝ θ + 1⎠ ⎭ ⎩⎝ θ + 1⎠ ⎝ θ + 2 ⎠ ⎭ ⎩⎝ θ + 2 ⎠ ⎝θ + 5⎠ ⎭ ⎝θ + 5⎠
⎛ θ ⎞ 40α ⎛ θ ⎞ 20 α ⎛ θ ⎞ 25 α
D. ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ θ + 1⎠ ⎝θ + 2⎠ ⎝ θ + 5⎠
11.3 (4 points).
Ten observed losses have been recorded in thousands of dollars and are grouped as follows:
Interval [100,150] [150,200] [200, ∞]
Number of claims 6 1 3
The random variable underlying the observed losses has the Distribution function
F(x) = 1 - (x/100000)-q, x > 100,000. The value of the parameter q is to be determined via the
method of maximum likelihood. Which of the following equations should be solved (numerically)?
ln(3) (2 / 3)q (1/ 2)q - ln(3) (2 / 3)q
A. 0 = + - 1.
1 - (2 / 3)q (2 / 3)q - (1/ 2)q
(2 / 3)q ln(2) (1/ 2)q

B. 0 = + - 1.
1 - (2 / 3)q (2 / 3)q - (1/ 2)q
6 ln(1.5) (2 / 3)q ln(2) (1/ 2)q - ln(1.5) (2 / 3)q

C. 0 = + - 3 ln(2).
1 - (2 / 3)q (2 / 3)q - (1/ 2)q
6 (2 / 3)q (1/ 2)q

D. 0 = + - 3.
1 - (2 / 3)q (2 / 3)q - (1/ 2)q
11.4 (3 points) An exponential distribution F(x) = 1 - e-λx is fit to the following size of claim data by
the method of maximum likelihood. Which of the following functions should be maximized?
Range # of claims
0-1 6300
1-2 2350
2-3 850
3-4 320
4-5 110
over 5 70
Total 10000
A. {1 - e-λ}3000 {e-λ - e-2λ}3500 {e-2λ - e-3λ}2000 {e-3λ - e-4λ}1000 {e-4λ - e-5λ}500 e-2500λ
B. {1 - e-λ}2.1 {e-λ - e-2λ}1.49 {e-2λ - e-3λ}2.35 {e-3λ - e-4λ}3.13 {e-4λ - e-5λ}4.55 e-35.7λ
C. {1 - e-λ}6300 {e-λ - e-2λ}4700 {e-2λ - e-3λ}2550 {e-3λ - e-4λ}1280 {e-4λ - e-5λ}550 e-2100λ
D. {1 - e-λ} {e-λ - e-2λ} {e-2λ - e-3λ} {e-3λ - e-4λ} {e-4λ - e-5λ} e-5λ
E. {1 - e-λ}6300 {e-λ - e-2λ}2350 {e-2λ - e-3λ}850 {e-3λ - e-4λ}320 {e-4λ - e-5λ}110 e-350λ
11.5 (2 points) In the previous question what is the fitted value of λ?

(A) .95 (B) 1.00 (C) 1.05 (D) 1.10 (E) 1.15
11.6 (3 points) There are 45 losses of size less than 10. There are 65 losses of size at least 10.
You fit a Pareto Distribution with θ = 20 via maximum likelihood.
What is the fitted value of α?
(A) 1.1 (B) 1.2 (C) 1.3 (D) 1.4 (E) 1.5
11.7 (3 points) Losses follow a uniform distribution on [0, b].

The following 13 losses have been grouped into intervals:
(0, 1) 6
[1, 2) 4
[2, 3) 2
[3, 4) 1
[4, ∞) 0
What is the maximum likelihood estimate of b?
A. 3.00 B. 3.25 C. 3.50 D. 3.75 E. 4.00
11.8 (3 points) There are 50 losses of size less than 1000. There are 20 losses of size at least
1000. You fit a Pareto Distribution with α = 3 via maximum likelihood.
(A) Less than 1900
(E) At least 2200

(i) Losses follow an Inverse Exponential distribution, as per Loss Models.
[0, 10000] 140
(10000, 20000] 100
(20000, ∞) 160
(A) Less than 7000
(D) At least 9000, but less than 10,000
(E) At least 10,000
11.10 (3 points)
Fit a Loglogistic Distribution via maximum likelihood to the following grouped data.
(0, 10) 248
[10, 100) 427
[100, ∞) 325
Use the fitted distribution to estimate S(500).
A. 6% B. 8% C. 10% D. 12% E. 14%
• Four observations have been made of a random variable having
the density function
f(x) = 2λx exp(-λx2 ), x > 0.
• Only one of the observations was less than 2.
Determine the maximum likelihood estimate of λ.
A. Less than 0.05
D. At least 0.07
E. Cannot be determined from the given information
• Six losses have been recorded in thousands of dollars and are grouped as follows:
(0,2) 2
[2,5) 4
[5,∞) 0
• The random variable X underlying the losses, in thousands, has the density function
f(x) = λe-λx, x > 0, λ > 0.
Which of the following functions must be maximized to find the maximum likelihood estimate of λ?
A. (1- e-2λ)2 (e-2λ - e-5λ)4

B. (1- e-2λ)2 (e-2λ - e-5λ)4 ( e-5λ)6
C. (1- e-2λ)2 (e-2λ - e-5λ)4 (1 - e-5λ)6
D. (1- e-2λ)2 (e-2λ - e-5λ)4 ( e-5λ)-6
E. (1- e-2λ)2 (e-2λ - e-5λ)4 (1 - e-5λ)-6

• The random variable X has a uniform distribution on the interval [0, θ].
• A random sample of three observations of X has been recorded and grouped as follows:
Number of
Interval Observations
[0, k) 1
[k, 5) 1
[5, θ] 1
A. 5 B. 7.5 C. 10 D. 5+k E. 10-k
• Forty (40) observed losses from a long-term disability policy with a one-year
elimination period have been recorded and are grouped as follows;
Years of Disability Number of Losses
(1, 2) 10
[2, ∞) 30
• You wish to shift the observations by one year and fit them to a Pareto distribution,
with parameters θ (unknown) and α = 1.
A. 1/3 B. 1/2 C. 1 D. 2 E. 3
• The random variable X has a uniform distribution on the interval (0, θ) , θ > 2.
• A random sample of four observations of X has been recorded and grouped as follows:
Number of
Interval Observations
(0,1] 1
(1,2] 2
(2, θ) 1
A. 8/3 B. 11/4 C. 14/5 D. 20/7 E. 3
11.16 (Course 4 Sample Exam 2000, Q.37)

Twenty widgets are tested until they fail. Failure times are distributed as follows:
Interval Number Failing
(0, 1] 2
(1, 2] 3
(2, 3] 8
(3, 4] 6
(4, 5] 1
(5, ∞) 0
The exponential survival function S(t) = exp(-λt) is used to model this process.
Determine the maximum likelihood estimate of λ.
(i) Losses follow an exponential distribution with mean θ.
[0, 1000] 7
(1000, 2000] 6
(2000, ∞) 7
(A) Less than 1950
(E) At least 2400
11.18 (4, 11/06, Q.33 & 2009 Sample Q.276) (2.9 points)
For a group of policies, you are given:
(i) Losses follow the distribution function
F(x) = 1 - θ/x, θ < x < ∞.
(ii) A sample of 20 losses resulted in the following:
x ≤ 10 9
10 < x ≤ 25 6
x > 25 5
(A) 5.00 (B) 5.50 (C) 5.75 (D) 6.00 (E) 6.25
11.19 (CAS 3L, 5/12, Q.17) (2.5 points) You are given the following information:
• Claim severity follows an Inverse Exponential distribution with parameter θ.
• One claim is observed, which is known to be between 50 and 500.
• Calculate the maximum likelihood estimate of θ.
A. Less than 60
E. At least 150
11.1. E. F(x) = 1 - exp [-(x/θ)τ]. Therefore for τ = 3, F(5) = 1 - exp (-125/θ3).
S(5) = exp[-125/θ3]. There are 50 claims less than 5 and 10 claims greater than 5.
(The likelihood function is the probability of being in each interval, to the power of the number of
claims in that interval. Here we have only two intervals.)
The loglikelihood function is: 50 ln(1- exp (-125/θ3)) +10(-125/θ3).
Setting the derivative with respect to θ equal to zero:
(50){(-3θ−4)125exp(-125/θ3)} / {1 - exp(-125/θ3)} - 1250(-3θ−4) = 0.
5 exp(-125/θ3) = 1 - exp(-125/θ3). ⇒ exp(-125/θ3) = 1/6.
Therefore, -125/θ3 = -ln(6). θ = (125/ln(6))1/3 = 4.12.

Alternately, when we have data grouped into only two intervals, one can fit via maximum likelihood
by setting the theoretical and empirical distributions functions equal at the boundary between the two
groups. 1 - exp (-125/θ3) = 50/60. ⇒ θ = (125/ln(6))1/3 = 4.12.
11.2. C. F(x) = 1 - {θ/(θ+x)}α, a Pareto Distribution. One takes the difference in distribution functions
at the top and bottom of each interval and then takes the product to the power of the number of
claims in that interval.
Bottom Top Number F(Bottom F(Top Contribution
of of of of of to
Interval Interval Claims Interval) Interval) Likelihood
0 1 40 0 1 - {θ/(θ+1)}α (1 - {θ/(θ+1)}α)40
1 2 20 1 - {θ/(θ+1)}α 1 - {θ/(θ+2)}α ({θ/(θ+1)}α - {θ/(θ+2)}α)20
2 5 25 1 - {θ/(θ+2)}α 1 - {θ/(θ+5)}α ({θ/(θ+2)}α - {θ/(θ+5)}α)25
5 ∞ 15 1 - {θ/(θ+5)}α 1 {θ/(θ+5)}15α
Likelihood is the product of the contributions of each of the intervals:
(1 - {θ/(θ+1)}α)40({θ/(θ+1)}α - {θ/(θ+2)}α)20({θ/(θ+2)}α - {θ/(θ+5)}α)25 {θ/(θ+5)}15α.
11.3. C. F(150,000) = 1 - (150,000/100,000)- q = 1 - (2/3)q .

F(200,000) = 1 - (200,000/100,000)- q = 1 - (1/2)q .
Take the difference in distribution functions at the top and bottom of each interval and then take the
product to the power of the number of claims in that interval:
Bottom of Top of Number F(Bottom F(Top of Difference Contribution
Interval Interval of Claims of Interval) Interval) of Distrib. to Likelihood
100 150 6 0 1 - (2/3)q 1 - (2/3)q {1 - (2/3)q }6
150 200 1 1 - (2/3)q 1 - (1/2)q (2/3)q - (1/2)q (2/3)q - (1/2)q
200 ∞ 3 1 - (1/2)q 1 (1/2)q (1/2)3q
The likelihood is: {1 - (2/3)q }6 {(2/3)q - (1/2)q } (1/2)3q.
The loglikelihood is: 6 ln{1 - (2/3)q } + ln{(2/3)q - (1/2)q } + 3q ln(1/2).

The derivative with respect to q of the loglikelihood is:
ln(2/3){- (2/3)q } 6 /{1 - (2/3)q } + { ln(2/3)(2/3)q - ln(1/2)(1/2)q }/{(2/3)q - (1/2)q } + 3 ln(1/2)
Setting the derivative of the loglikelihood equal to zero in order to find a maximum:
0 = {ln(1.5) (2/3)q } 6 /{1 - (2/3)q } + {ln(2)(1/2)q - ln(1.5)(2/3)q }/{(2/3)q - (1/2)q } - 3 ln(2)
Comment: The numerical solution of the above equation is q ≅ 1.904, which is the maximum
likelihood estimate of the parameter q.
11.4. E. & 11.5. B. Take the difference in distribution functions at the top and bottom of each
interval and then take the product to the power of the number of claims in that interval:
Bottom of Top of Number of F(Bottom F(Top of Difference Contribution
Interval Interval Claims of Interval) Interval) of Distrib. to Likelihood
0 1 6300 0 1 - e-λ 1 - e-λ {1 - e-λ}6300
1 2 2350 1 - e-λ 1 - e-2λ e-λ - e-2λ {e-λ - e-2λ}2350
2 3 850 1 - e-2λ 1 - e-3λ e-2λ - e-3λ {e-2λ - e-3λ}850
3 4 320 1 - e-3λ 1 - e-4λ e-3λ - e-4λ {e-3λ - e-4λ}320
4 5 110 1 - e-4λ 1 - e-5λ e-4λ - e-5λ {e-4λ - e-5λ}110
5 ∞ 70 1 - e-5λ 1 e-5λ {e-5λ }70
Likelihood = {1 - e-λ}6300 {e-λ - e-2λ}2350 {e-2λ - e-3λ}850 {e-3λ - e-4λ}320 {e-4λ - e-5λ}110 {e-5λ }70.
Let y = e-λ. Likelihood = (1 - y)6300(y - y2 )2350(y2 - y3 )850(y3 - y4 )320(y4 - y5 )110(y5 )70 =

y 5800(1 - y)9930. Set the derivative equal to zero:
0 = 5800y5799(1 - y)9930 - 9930y5800(1 - y)9929. ⇒ 5800(1 - y) = 9930y.
⇒ y = 5800/15730 = .3687 ⇒ e-λ = .3687 ⇒ λ = 0.998.

Comment: Similar to 4, 11/02, Q.23. For grouped data, for an Exponential, method of moments
and maximum likelihood are not equal.
11.6. C. F(x) = 1 - {θ/(θ+x)}α = 1 - {20/(20+x)}α.

One takes the difference in distribution functions at the top and bottom of each interval, and then
takes the product to the power of the number of claims in that interval.
Likelihood: (1 - {20/(20+10)}α)45 ({20/(20+10)}α)65.
Let y = {20/(20+10)}α = (2/3)α. Then the likelihood is: (1 - y)45 y65.

In order to maximize the likelihood, set the derivative with respect to y equal to zero:
0 = -45(1 - y)44 y65 + 65(1 - y)45 y64. ⇒ y = 65/(65 + 45) = 2/7 = 0.591. ⇒
(2/3)α = .591. ⇒ α = ln(0.591)/ln(2/3) = 1.30.

groups. 1 - {20/(20+10)}α = 45/110. ⇒ (2/3)α = 0.591. ⇒ α = 1.30.
11.7. B. The likelihood is: (F(1) - F(0))6 (F(2) - F(1))4 (F(3) - F(2))2 (F(4) - F(3)).
Since there is one loss of size at least 3, b ≥ 3.
F(1) - F(0) = F(2) - F(1) = F(3) - F(2) = 1/b.
F(4) - F(3) = 1/b, provided b ≥ 4.
If b ≥ 4, F(4) - F(3) = 1/b. Therefore, if b ≥ 4, the likelihood is 1/b13. This is greatest for b = 4.
In contrast, assume for example b = 3.6.
Then F(4) - F(3) = .6/3.6, since the uniform on [0, 3.6] has no probability beyond 3.6..
If 3 ≤ b ≤ 4, F(4) - F(3) = (b - 3)/b.
If 3 ≤ b ≤ 4, the likelihood is (b-3)/b13. Setting the derivative equal to zero, this is largest for
b = (3)(13)/12 = 3.25. The likelihood at 3.25 is: 0.25/3.2513 = 5.5 x 10-8.
The likelihood at 4 is: 1/413 = 1.5 x 10-8. Thus the maximum likelihood occurs for b = 3.25.
11.8. B. F(x) = 1 - {θ/(θ+x)}α = 1 - {θ/(θ+x)}3 .

One takes the difference in distribution functions at the top and bottom of each interval, and then
takes the product to the power of the number of claims in that interval.
Likelihood: (1 - {θ/(θ+1000)}3 )50 ({θ/(θ+1000)}3 )20.
Let y = {θ/(θ+1000)}3 . Then the likelihood = (1 - y)50 y20.
In order to maximize the likelihood, set the derivative with respect to y equal to zero:
0 = -50(1 - y)49 y20 + 20(1 - y)50 y19. ⇒ y = 20/(50 + 20) = 2/7 = 0.2857. ⇒
{θ/(θ+1000)}3 = 0.2857. ⇒ θ = 1929.

groups. 1 - {θ/(θ+1000)}3 = 50/70. ⇒ {θ/(θ+1000)}3 = 0.2857. ⇒ θ = 1929.
11.9. E. F(x) = e-θ/x.

The likelihood is: (e-θ/10000)140(e-θ/20000 - e-θ/10000)100(1 - e-θ/20000)160.
Let y = e-θ/20000. ⇒ y2 = e-θ/10000.

Then the likelihood is: y280(y - y2 )100(1 - y)160 = y380(1 - y)260.
Set the derivative with respect to y of the likelihood equal to zero:
380y379(1 - y)260 - 260y380(1 - y)259 = 0. ⇒
380(1 - y) = 260y. ⇒ y = 19/32.
⇒ e-θ/20000 = 19/32. ⇒ θ = -20000ln(19/32) = 10,426.

11.10. D. Since we are fitting two parameters with only three intervals:
Set F(10) = observed proportion from 0 to 10 = 0.248,
and S(100) = observed proportion from 100 to ∞ = 0.325.
(x / θ)γ 1
F(x) = γ = γ.
1 + (x / θ) 1 + (θ / x)
1
γ = 0.248. ⇒ (θ/10) = 3.032.
γ
1 + (θ / 10)
1
γ = 0.325. ⇒ (θ/100) = 0.4815.
γ
1 + (100 / θ)
⇒ 10γ = 3.032/0.4815 = 6.297. ⇒ γ = 0.799. ⇒ θ = 40.1.

1
S(500) = = 11.75%.
1 + (500 / 40.1)0.799
Comment: This is mathematically equivalent to percentile matching.
11.11. D. This is grouped data with two intervals, one from 0 to 2 and the other from 2 to ∞.
Therefore one uses the Distribution function to compute the likelihood. The distribution function can
be obtained either by integrating the given density function or by recognizing that this is a Weibull
distribution with parameters θ = λ−1/2, τ = 2. F(x) = 1 - exp(-λx2 ).
F(2) = 1 - e-4λ. For each interval in order to compute its likelihood, one takes the difference of the
Distribution function at the top and bottom of the interval, and then takes the result to the power
equal to the number of claims observed in that interval. Finally the overall likelihood is the product of
the likelihoods computed for each interval.
Interval # Claims Likelihood
0 to 2 1 F(2) = 1 - e-4λ
2 to ∞ 3 S(2)3 = e-12λ
overall 4 e-12λ - e-16λ
Thus in this case the overall likelihood is equal to e-12λ - e-16λ . One maximizes the likelihood by
setting equal to zero its partial derivative with respect to the single parameter λ:
0 = -12e-12λ + 16 e-16λ. Solving λ = (1/4)ln(4/3) = 0.072.

groups. 1 - e-4λ = 1/4. ⇒ λ = (1/4)ln(4/3) = 0.072.
Comment: While one could maximize the loglikelihoods, in this case it seems easier to work with the
likelihoods themselves.
11.12. A. F(x) = 1 - e-λx, an exponential distribution. One takes the difference in distribution
functions at the top and bottom of each interval and then takes that to the power of the number of
claims in that interval. The likelihood is the product of all these terms.
Bottom of Top of Number of F(Bottom F(Top of Difference Contribution
Interval Interval Claims of Interval) Interval) of Distrib. to Likelihood
0 2 2 0 1 - e-2λ 1 - e-2λ {1 - e-2λ}2
2 5 4 1 - e-2λ 1 - e-5λ e-2λ - e-5λ {e-2λ - e-5λ}4
5 ∞ 0 1 - e-5λ 1 e-5λ {e-5λ}0
Likelihood = {1 - e-2λ}2 {e-2λ - e-5λ}4 {e-5λ}0 = {1 - e-2λ }2 {e-2λ - e-5λ }4 .
11.13. B. For grouped data, the likelihood is the product of terms, each of which is the difference of
the distribution function at the top and the bottom of an interval taken to the power of the number of
claims observed in that interval. The Uniform Distribution Function on [0,θ] has: F(0) = 0, F(k) = k/θ,
F(5) = 5/θ, F(θ) = 1. The Likelihood is: (k/θ - 0)1 (5/θ - k/θ)1 (1 - 5/θ)1 = (5k - k2 ) (θ−2 - 5θ−3).
In order to maximize the Likelihood as a function of θ, we set the partial derivative with respect to θ of
the Likelihood equal to zero. 0 = (5k - k2 ) (-2θ−3 + 15θ−4). Therefore, θ = 15/2 = 7.5.
Comment: If one assumed that one had one observation greater than 5, but did not know how big it
was (for example one had a maximum covered loss of 5, so the data was censored at 5), then it
would reduce to the mathematical situation presented. If [5,θ] had been changed to [5,∞); i.e., greater
than or equal to 5, the solution would have been the same. Note that with the uniform distribution, the
solution is independent of k.
11.14. E. After shifting we have 10 claims in the interval (0,1) and 30 claims in the interval [1,∞].
For grouped data, the likelihood is the product of terms, each of which is the difference of the
distribution function at the top and the bottom of an interval taken to the power of the number of
claims observed in that interval. The Pareto Distribution with α = 1 (on the shifted data) is: F(0) = 0,
F(1) = 1 - θ/(θ+1) = 1/(θ+1), F(∞) = 1. In this case the Likelihood is:
{F(1) - F(0)}10{F(∞) - F(1)}30 = {1/(θ+1)}10{θ/(θ+1)}30 = θ30/(θ+1)40.

The loglikelihood is: 30ln(θ) - 40ln(θ+1). In order to maximize the Likelihood, we set the partial
derivative with respect to θ of the loglikelihood equal to zero: 0 = 30/θ - 40/(θ+1).
Therefore, 30(θ+1) = 40θ, or θ = 3.
11.15. A. The likelihood is the product of terms each of which is the probability covered by an
interval taken to the power of the number of claims in that interval.
The likelihood = (1/θ) (1/θ)2 ((θ-2)/θ) = θ−3 - 2θ−4.
The partial derivative with respect to θ is: -3θ−4 + 8θ−5.

Setting that equal to zero and solving, θ = 8/3.
11.16. The contribution to the likelihood of interval (a,b] is: {F(b)-F(a)}# points in interval
Interval Contribution to Likelihood
(0, 1] (1 - e−λ)2
(1, 2] (e−λ - e−2λ)3
(2, 3] (e−2λ - e−3λ)8
(3, 4] (e−3λ - e−4λ)6
(4, 5] (e−4λ - e−5λ)
The likelihood is: (1 - e−λ)2 (e−λ - e−2λ)3 (e−2λ - e−3λ)8 (e−3λ - e−4λ)6 (e−4λ - e−5λ).
Let y = e−λ. Then the likelihood is (1 - y)2 y 3 (1 -y)3 y 16(1 - y)8 y 18(1 - y)6 y 4 (1 - y) = y41(1 - y)20.
Setting the derivative with respect to y equal to zero:
0 = 41y40(1 - y)20 - 20y41(1 - y)19. 41(1- y) = 20y. y = 41/61.
⇒ λ = - ln y = -ln(41/61) = 0.397.
Comment: y41(1 - y)20 is proportional to the density of a Beta Distribution with θ =1,
a = 42 and b = 21. This has mode of θ(a - 1)/(a + b - 2) = 41/61. Therefore this expression is
maximized for y = 41/61.
11.17. B. F(x) = 1 - e-x/θ. The likelihood is the product of the contributions from each interval.
Bottom of Top of Number of Difference Contribution
Interval Interval Claims of Distributions to Likelihood
0 1000 7 1 - e-1000/θ {1 - e-1000/θ}7
1000 2000 6 e-1000/θ - e-2000/θ {e-1000/θ - e-2000/θ}6
2000 ∞ 7 e-2000/θ {e-2000/θ}7
Likelihood = {1 - e-1000/θ}7 {e-1000/θ - e-2000/θ}6 {e-2000/θ}7 .
Let y = e-1000/θ. Then, likelihood = (1 - y)7 y 6 (1 - y)6 y 14 = y20(1 - y)13.

Maximize the likelihood by setting its derivative equal to zero:
0 = 20y19(1 - y)13 - 13y20(1 - y)12. ⇒ 20(1 - y) = 13y. ⇒ y = 20/33.
⇒ θ = -1000/ln(20/33) = 1997.
Alternately, the loglikelihood is: 7ln[1 - e-1000/θ] + 6ln[e-1000/θ - e-2000/θ] - 14000/θ.
Set the derivative of the loglikelihood with respect to theta equal to zero:
0 = -7e-1000/θ 1000/θ2 / (1 - e-1000/θ) +
6{e-1000/θ 1000/θ2 - e-2000/θ 2000/θ2}/(e-1000/θ - e-2000/θ) + 14000/θ2.
0 = -7e-1000/θ/(1 - e-1000/θ) + 6{e-1000/θ - 2e-2000/θ}/(e-1000/θ - e-2000/θ) + 14.
14(1 - e-1000/θ) = 7e-1000/θ - 6(1 - 2e-2000/θ). ⇒ e-1000/θ = 20/33. ⇒ θ = -1000/ln(20/33) = 1997.

Comment: Note that y20 (1 - y)13 is proportional to a Beta Distribution with
a = 21, b = 14, and θ = 1.
a - 1 21 - 1 20
The mode of this Beta Distribution is: = = .
a + b - 2 21 + 14 - 2 33
Thus the likelihood is maximized for y = 20/33.
11.18. B. For θ < 10, likelihood is: F(10)9 {F(25) - F(10)}6 S(25)5
= (1 - θ/10)9 (θ/10 - θ/25)6 (θ/25)5 = (1 - θ/10)9 θ 6 (1/10 - 1/25)6 θ 5 (1/25)5 .
loglikelihood is: 9 ln(1 - θ/10) + 11 ln(θ) + constants.
Set the derivative with respect to θ equal to zero: 0.9/(1 - θ/10) = 11/θ. ⇒ θ = 5.5.
11.19. D. This is grouped data; the likelihood is the product of the probability in each interval to the
power the number of items in each interval. Here there is only one observation, and the probability
covered by the interval from 50 to 500 is: F(500) - F(50).
Thus the likelihood is: F(500) - F(50) = e-θ/500 - e-θ/50.
Take the partial derivative with respect to θ of the likelihood and set it equal to zero:
0 = -e-θ/500 / 500 + e-θ/50 / 50. ⇒
10 = exp[-θ/500] / exp[-θ/50] = exp[θ/50 - θ/500] = exp[0.018θ]. ⇒ θ = 128.

2016-C-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/22/15, Page 352
Section 12, Chi-Square Test125
The Chi-Square Statistic provides one way to examine the fit of distributions.
(observed number - expected number)2
One sums up the contributions from each interval: .
expected number
k
The Chi-Square Statistic is: ∑(Oj 2
- Ej) / Ej .
j=1
The expected number of claims for the interval [a,b] is N{F(b) - F(a)}, where F(x) is the fitted or
assumed distribution function and N is total number of claims. The better the match, the closer the
expected and observed values of the Distribution function will be. If there is a reasonable match
over all intervals, the sum over all intervals is small. Thus the better the match, the smaller the
Chi-Square Statistic.
For example for the Burr distribution, F(x) = 1 - {1/(1 + (x/θ)γ)}α, with parameters
α = 3.9913, θ = 40,467, γ = 1.3124 fit to the grouped data in Section 3
by the Method of Maximum Likelihood:126
Bottom of Top of # claims F(upper) Fitted (Observed-

Interval Interval in the F(lower) F(upper) minus # claims Fitted)^2
$ Thous. $ Thous. Interval F(lower) /Fitted
0 5 2208 0.000000 0.220189 0.220189 2201.89 0.017
5 10 2247 0.220189 0.446379 0.226190 2261.90 0.098
10 15 1701 0.446379 0.617040 0.170661 1706.61 0.018
15 20 1220 0.617040 0.736355 0.119315 1193.15 0.604
20 25 799 0.736355 0.817547 0.081192 811.92 0.206
25 50 1481 0.817547 0.965227 0.147680 1476.80 0.012
50 75 254 0.965227 0.990915 0.025688 256.88 0.032
75 100 57 0.990915 0.996977 0.006062 60.62 0.216
100 Infinity 33 0.996977 1.000000 0.003023 30.23 0.254
10000 1.000000 10000 1.458
For example, F(25000) = 1 - {1/ (1+ (x/θ)γ)}α = 1 - {1/(1+(25000/40467)1.3124)}3.9913 =

0.81755.
125
The Chi-Square Statistic is also referred to as “Pearsonʼs goodness-of-fit statistic.”
126
When computing the Chi-Square Statistic, since the result can be sensitive to rounding, avoid intermediate
rounding. I would compute the fitted/assumed values to at least one decimal place; two decimal places would be
better when the fitted/assumed values are small.
The fitted number of claims in the interval 25000 to 50000 is:

(10,000){F(50000) - F(25000)} = (10000)(0.96523 - 0.81755) = 1476.8,
where 10,000 is the total observed claims.
The contribution to the Chi-Square Statistic from the interval 25000 to 50000 is:
(1481 - 1476.8)2 / 1476.8 = 0.012.
The Chi-Square Statistic is the sum of the contributions from each of the intervals, which is 1.458 in
this case. This value of 1.458 is small because of the close match of the fitted Burr Distribution to the
observed data.
Degrees of Freedom:
For example, when the Chi-Square test is used to test the fairness of dice, one assumes the
probability of each face coming up is the the same, prior to seeing any data. Another example is
when one tests whether the distribution fit to last yearʼs data, also fits this yearʼs new data.
For an assumed rather than fitted distribution, the computed sum follows a
Chi-Square distribution with degrees of freedom equal to the number of intervals minus
one.127 128
The number of degrees of freedom tells you what row of the Chi-Square table to use to
test the significance of the match.
Note that the Chi-Square Statistic is always non-negative. By definition the Chi-Square Statistic is
greater than or equal to zero129. The smaller the Chi-Square Statistic the better fit between the curve
and the observed data.
When you fit a distribution to this data, you lose one degree of freedom per fitted
parameter; d.f. = # intervals - 1 - # fitted parameters.130
127
We lose one degree of freedom, because the sum of the expected column always equals the sum of the
observed column.
128
If one has 9 intervals as above, then one has taken the sum of 9 Standard Unit Normals squared. This sum would
ordinarily have a Chi-Square distribution with 9 degrees of freedom. However, we lose one degree of freedom,
since the total number of fitted claims over all intervals is set equal to the total number of observed claims,10,000.
Thus in this case we have 8 degrees of freedom, if there were no fitted parameters. A derivation is given
subsequently.
129
Therefore, one performs a one-sided test, rather than a two-sided test as with the use of the Normal distribution.
130
In the case of maximum likelihood applied to grouped data one losses one degree of freedom per fitted
parameter. If one applies the method of maximum likelihood to the individual claim values of ungrouped data, then
one loses degrees of freedom, but somewhat less than one per fitted parameter. In this case, to be conservative
one can assume one loses one degree of freedom per fitted parameter, although the actual loss of degrees of
freedom may be less. See Kendallʼs Advanced Theory of Statistics, by Stuart and Ord, 5th edition, Vol. 2 ,
pp. 1166- 1172.
We only subtract the number of parameters when the distribution has been fit to the data set we are
using to compute the Chi-Square. We do not decrease the number of degrees of freedom if this
distribution has been fit to some similar but different data set.
Note that with fewer degrees of freedom, the mean of the Chi-Square distribution is less, and
therefore it is easier to reject a fitted curve.131 The point being that the parameters of the fitted
distribution have been selected precisely to fit the particular observed data, so one should be
less impressed by a low Chi-Square value than one would be in a situation where the curves
parameters were selected prior to seeing the particular data.
Reduce the number of degrees of freedom for the number of fitted parameters when:
1. You are given grouped data or group data into intervals.
2. You fit a distribution to the data in #1.
3. Compare the fitted distribution in #2 to the data in #1.
4. Number of degrees of freedom is: # of interval - 1 - number of parameters fit in #2.
For example, Kermit is computing a Chi-Square for a LogNormal Distribution (with two parameters µ
and σ) compared to the Muppet Insurance Companyʼs individual health insurance size of loss data
for the year 2000 from the state of Nebraska.
If Kermit fit a LogNormal Distribution to this data, then the number of degrees of freedom are
reduced by the number of fitted parameters, two.
However, here are some examples where in comparing a LogNormal Distribution to Muppet
Insurance Companyʼs individual health insurance data for the year 2000 from the state of Nebraska,
Kermit should not subtract the number of parameters, two, from the number of degrees of freedom:
1. Kermit is comparing this data to a LogNormal fit to the data from the year 1999, a different year.
2. Kermit is comparing this data to a LogNormal fit to data from the state of Kansas, a different state.
3. Kermit is comparing this data to a LogNormal fit to data from some other insurer.
4. Kermit is comparing this data to a LogNormal fit to data for group health rather than
individual health insurance, a different line of insurance.
5. Kermit is comparing this data to a LogNormal with parameters picked by Miss Piggy,
who has not seen the data Kermit is examining. This is an assumed distribution.
In computing the degrees of freedom, one takes the number of intervals actually used to compute
the Chi-Square Statistic.
131
The mean of a Chi-Square Distribution with v degrees of freedom is v.
1. Determine the groups to use in computing the Chi-Square statistic. Unless the exam question
has specifically told you which groups to use, use the groups for the data given in the question.
Some Intuition Behind Reducing Degrees of Freedom:
By using a lot of parameters, one can get a good match between a model and the data.
For example, one can fit a 10th degree polynomial to pass through any 11 data points.
When we fit a distribution to data, we have determined the parameters so that the distribution will
look like a good match to the data. Thus the Chi-Square Statistic will be smaller when comparing the
distribution to the data to which we have fit it.
Degrees Value of P
of
Freedom 0.900 0.950 0.975 0.990 0.995
1 2.706 3.841 5.024 6.635 7.879
2 4.605 5.991 7.378 9.210 10.597
3 6.251 7.815 9.348 11.345 12.838
4 7.779 9.488 11.143 13.277 14.860
The critical values increase as we go down each column of the Chi-Square Table.
If we have fit 2 parameters, and compare the distribution to the data to which it has been fit, then we
have done (approximately) 2 degrees of freedom worth of “cheating,” thereby making the
Chi-Square Statistic smaller than it would have otherwise been. Moving up 2 rows in the table, we
compare the statistic to smaller values, compensating for our “cheating.”
The more parameters we have fit, the larger the needed adjustment. If we had instead fit 3
parameters, we would reduce the degrees of freedom by 3, and move up 3 rows rather than 2.
Which Groups to Use in Computing the Chi-Square Statistic:132
As stated above, unless an exam question has specifically told you how to determine which groups
to use, use the groups for the data given in the question. If the exam question gives you some rule
that should be followed, you may have to combine some of the given groups.133
According to Loss Models, the Chi-Square Goodness of Fit Test works best when the expected
number of items are approximately equal for the different intervals.
Loss Models mentions various additional rules of thumbs used by different authors:134
• One should have an expected number of items in each interval of at least 5.
• One should have an expected number of items in each interval of at least 1.
• One should have an expected number of items in each interval of at least 1, and an
expected number of items of at least 5 in at least 80% of the intervals.
• When testing at a 1% significance level, one should have an average expected number of
items in each interval of at least 4.
• When testing at a 5% significance level, one should have an average expected number of
items in each interval of at least 2,
• One should have a sample size of at least 10, at least 3 intervals, and
(sample size)2 /(number of intervals) ≥ 10.
132
See page 334 of Loss Models.
133
See 4, 11/04, Q.10. In “Mahlerʼs Guide to Fitting Frequency Distributions”, 4, 5/00, Q.29 and 4, 5/01, Q.19.
134
In practical applications, there are a a number of different rules of thumb one can use for determining the groups
to use. I use one of the rules mentioned in Loss Models: One should have an expected number of claims in each
interval of 5 or more, so that the normal approximation that underlies the theory, is reasonably close; therefore,
some of the given intervals for grouped data may be combined for purposes of applying the Chi-Square test.
A Weibull Distribution Example:
Here is another example of the computation of a Chi-Square Statistic.

For the Weibull distribution fit to the grouped data in Section 3 by the Method of Maximum
Likelihood, θ = 16,184 and τ = 1.0997, F(x) = 1 - exp[-(x/16184)1.0997]:
Bottom of Top of Oi F(upper) Ei (Oi - Ei)^2 / Ei

Interval Interval F(lower) F(upper) minus
$ Thous. $ Thous. F(lower)
0 5 2208 0.00000 0.24028 0.24028 2402.8 15.8
5 10 2247 0.24028 0.44508 0.20480 2048.0 19.3
10 15 1701 0.44508 0.60142 0.15634 1563.4 12.1
15 20 1220 0.60142 0.71696 0.11553 1155.3 3.6
20 25 799 0.71696 0.80075 0.08379 837.9 1.8
25 50 1481 0.80075 0.96848 0.16774 1677.4 23.0
50 75 254 0.96848 0.99548 0.02700 270.0 0.9
75 100 57 0.99548 0.99939 0.00391 39.1 8.2
100 Infinity 33 0.99939 1.00000 0.00061 6.1 119.9
10000 1.00000 10000 204.6
With a very large Chi-Square of 204.6, the Weibull Distribution is a very poor fit to this data. By
examining the final column, one can see which intervals contributed significantly to the Chi-Square. In
this case, the interval from 100 to infinity contributed an extremely large amount; the righthand tail of
the Weibull is too light to fit this data. In this case, the fit is so poor there are also very large
contributions from many other intervals.135
Testing a Fit:
The p-value (probability value) is the value of the Survival Function of the Chi-Square Distribution
(for the appropriate number of degrees of freedom) at the value of the Chi-Square Statistic. If the
data came from the fitted/assumed distribution, then the p-value is the probability that the Chi-
Square statistic would be greater than its observed value. A large p-value indicates a good fit.
The p-value determines the significance level at which one rejects the fit.136 137
135
A contribution from an interval of 2 or more is significant. A contribution of 5 or more is large.
136
When using the Chi-Square Table, one rejects at the significance value in the table that first exceeds the
p-value. For example, with a p-value of 0.6% one rejects at 1%. Using a computer, one can get more accurate
p-values, than by using a table.
137
See the subsequent section on hypothesis testing for a general discussion of p-values.
For some Distributions fit to the grouped data in Section 3 by the Method of Maximum Likelihood,
the values are:138
# Parameters Chi-Square Statistic p-value139
Burr 3 1.46 91.8%
Generalized Pareto 3 5.70 33.7%
Transformed Gamma 3 16.45 reject at 1% 0.57%
Gamma 2 142.7 reject at 1/2% 0140
LogNormal 2 178.3 reject at 1/2% 0
Weibull 2 207.5 reject at 1/2% 0
Inverse Gaussian 2 238.5 reject at 1/2% 0
Exponential 1 255.4 reject at 1/2% 0
With 9 intervals, there are 8 - the number of fitted parameter degrees of freedom.
Thus for the Transformed Gamma with three fitted parameters, we are interested in the
Chi-Square distribution with 5 degrees of freedom.141
Degrees Significance Levels (1-P)

of Freedom 0.010 0.050 0.025 0.010 0.005
5 9.236 11.070 12.832 15.086 16.750
6 10.645 12.592 14.449 16.812 18.548
7 12.017 14.067 16.013 18.475 20.278
From the Chi-Square table, for In other words, for five degrees of freedom the critical value for 1%
is 15.086, while the critical value for 1/2% is 16.750.
In the test of the fit of the Transformed Gamma Distribution, the p-value is the survival function at
16.45. From the Chi-Square for 5 degrees of freedom, S(15.086) = 1% and S(16.750) = 0.5%,
and therefore, 1% > S(16.45) > 0.5%. Thus the p-value for the fitted Transformed Gamma is
between 1% and 0.5%. Using a computer, one can determine that S(16.45) = 0.57%. Thus the
p-value for the Transformed Gamma fit is 0.57%.
Since 16.45 > 15.086, the critical value for 1%, we can reject the Transformed Gamma fit by
maximum likelihood at a 1% level. Since 16.45 < 16.750, the critical value for 1/2%, we can not
reject the Transformed Gamma fit by maximum likelihood at a 1/2% level. In other words, we reject
the fit of the Transformed Gamma at 1% and do not reject at 1/2%.
138
See the section on maximum likelihood fitting to grouped data for the parameters.
139
Obtained using a computer. Using the table, one can determine that for the Transformed Gamma the p-value is
between 1% and 1/2 %.
140
Less than 3 x 10-28.
141
The Chi-Square distribution with 5 degrees of freedom is a Gamma Distribution with α = 5/2 and θ = 2.
There is a “mechanical” method of using the Chi-Square Table, that may help avoid mistakes on the
exam. Once one had the calculated value of Chi-Square for the fitted Transformed Gamma, one
would proceed as follows. Looking at the row of the Chi-Square Table142 for:
8 - 3 = 5 degrees of freedom, we see which entries bracket the calculated Chi-Square value of
16.45 for the fitted Transformed Gamma. In this case 15.086 < 16.45 < 16.750.
We reject at the significance level of the left hand of the two columns, which is in this case is 1%.
We do not reject at the significance level of the right hand of the two columns, which is in this case is
1/2%.
In general, look on the correct row of the table, determine which entries bracket the
statistic, and reject to the left and do not reject to the right.
Exercise: For 7 degrees of freedom, the Chi-Square Statistic is 15.52.

At what significance level do you reject the null hypothesis?
[Solution: 14.067 < 15.52 < 16.013 ⇒ reject at 5%, and do not reject at 2.5%.]
The very poor fits of the two parameter distributions can all be easily rejected at 1/2%, since the
critical value for 6 degrees of freedom is 18.548. For example, the Chi-Square value for the fitted
Gamma is 142.7, and since 142.7 > 18.548, we reject at the 1/2% significance level.
Low p-values indicate a poor fit. For example, the fits of the Transformed Gamma, Gamma,
Weibull, Inverse Gaussian, and Exponential, each have p-values less than 1%. A p-value of less
than 1% gives a strong indication that the data did not come from the fitted distribution.
Another Formula for the Chi-Square Statistic:
χ 2 = Σ (Ei - Oi)2 / Ei = Σ (Ei2 - 2OiEi + Oi2 ) / Ei = Σ (Oi2 / Ei - 2Oi + Ei) = Σ Oi2 / Ei - 2ΣOi + ΣEi =
Σ (Oi2 / Ei) - 2n + n = Σ (Oi2 / Ei) - n.143
n
O i2
χ2 = ∑ Ei - n.
i=1
Some people may find this mathematically equivalent alternate formula to be useful.144
142
See the Chi-Square Table given below, prior to the problems.
143
Where I have used the fact that Σ Ei = Σ Oi = n.
144
Chi-Square Statistic, Ungrouped Data:
When dealing with ungrouped data in order to compute the Chi-Square Statistic you must choose
intervals in which to group the data. I have chosen 10,000, 25,000, 50,000, 100,000, 250,000,
500,000, and 1,000,000 as the endpoints to use with the ungrouped data in Section 2.145
For example, here is the computation of the Chi-Square Statistic for a Weibull Distribution with
parameters θ = 230,000, τ = 0.6, compared to the ungrouped data in Section 2.
Lower Upper Observed Assumed Chi

Endpoint Endpoint # claims F(lower) F(upper) # claims Square
0 10,000 8 0.00000 0.14135 18.38 5.858
10,000 25,000 13 0.14135 0.23208 11.80 0.123
25,000 50,000 12 0.23208 0.32986 12.71 0.040
50,000 100,000 24 0.32986 0.45484 16.25 3.698
100,000 250,000 35 0.45484 0.65052 25.44 3.595
250,000 500,000 19 0.65052 0.79678 19.01 0.000
500,000 1,000,000 12 0.79678 0.91066 14.80 0.531
1,000,000 infinity 7 0.91066 1.00000 11.61 1.834
130 130 15.678
The computed Chi-Square for the Weibull Distribution is 15.678. In this case, we have not fit a
Weibull Distribution to this data; we assume the parameters were selected without looking at this
data set.146 Thus since we have (used) 8 intervals, we consult the Chi-Square Table for
8 - 1 = 7 degrees of freedom. Since 15.678 > 14.067, we reject the Weibull at 5%.
Since 15.678 ≤ 16.013, we do not reject the Weibull at 2.5%.
One can compute the Chi-Square Statistic for various distributions fit by maximum likelihood to the
ungrouped data in Section 2. The values are:147
# Parameters Chi-Square Statistic p-value148
Pareto 2 1.50 82.7%
Burr 3 1.60 80.9%
Generalized Pareto 3 1.63 80.3%
Transformed Gamma 3 2.78 59.5%
LogNormal 2 5.3 38.0%
Weibull 2 7.9 16.2%
Gamma 2 13.0 reject at 2.5% 2.3%
Exponential 1 26.4 reject at 1/2% 0.02%
145
Unfortunately, the computed Chi-Square values depend somewhat on this choice of intervals.
146
Perhaps this Weibull was fit to a similar set of data collected a year before the data in Section 2.
147
Using intervals with 0, 10,000, 25,000, 50,000, 100,000, 250,000, 500,000, 1,000,000, and infinity as
endpoints. See the section on Maximum Likelihood Ungrouped Data, for the parameters of the fitted distributions.
148
Via computer. Using the table, one can determine that for the Gamma the p-value is between 2.5% and 1 %.
The p-value of the fitted Exponential is less than 1%, strongly indicating this data did not come from
this Exponential. The p-value of the fitted Gamma is between 10% and 1%, providing some
indication that this data did not come from this Gamma, but some uncertainty as well. The p-values of
the other fits are all above 10%, providing no evidence for H1 , the alternative hypothesis that the
data did not come from the fitted distribution.149
Here is the computation of the Chi-Square statistic for the Pareto Distribution fit to the ungrouped
data in Section 2 via maximum likelihood, α = 1.702, θ = 240151:
Lower Upper Observed Fitted Chi

0 10,000 8 0.00000 0.06708 8.72 0.06
10,000 25,000 13 0.06708 0.15511 11.44 0.21
25,000 50,000 12 0.15511 0.27523 15.62 0.84
50,000 100,000 24 0.27523 0.44706 22.34 0.12
100,000 250,000 35 0.44706 0.70308 33.28 0.09
250,000 500,000 19 0.70308 0.85277 19.46 0.01
500,000 1,000,000 12 0.85277 0.93884 11.19 0.06
1,000,000 infinity 7 0.93884 1.00000 7.95 0.11
130 130 1.50
In contrast for the Exponential the Chi-Square statistic is very large at 26.4. With one parameter, we
compare to a Chi-Square Distribution with 8 - 1 - 1 = 6 degrees of freedom.150 The Exponential is
such a poor fit that we can reject it at a 1/2% significance level.151
For the Gamma distribution with two parameters, we would compare to the Chi-Square Distribution
with 8 - 1 - 2 = 5 degrees of freedom.152 Since the Chi-Square for the Gamma is 13.0, which is
greater than the 12.832 critical value for 2.5%, we can reject the Gamma at a 2.5% significance level.
On the other hand, the Gamma has a Chi-Square 13.0 < 15.086, the value at which the Chi-Square
distribution with 5 d.f. is 99%, so we can not reject the Gamma at a 1% significance level.
149
This general statement about p-values could be applied equally well to other hypothesis tests, for example the
Kolmogorov-Smirnov Statistic.
150
Due to the one fitted parameter, Iʼve subtracted 1 from the number of intervals minus one.
151
For 6 degrees of freedom the Chi-Square distribution is 99.5% at 18.548. 26.4 > 18.548 so we can reject the
Exponential Distribution at 1/2%.
152
Due to the two fitted parameters, Iʼve subtracted 2 from the number of intervals minus one. When as here one
applies the method of maximum likelihood to the individual claim values of ungrouped data, then one loses degrees
of freedom, but somewhat less than one per fitted parameter. In this case, to be conservative I have assumed one
loses the full 2 degree of freedom, although the actual loss of degrees of freedom may be less. See Kendallʼs
Advanced Theory of Statistics, by Stuart and Ord, 5th edition, Vol.2 , p. 1172.
The null hypothesis is that the ungrouped data in Section 2 was drawn from the maximum likelihood
Gamma Distribution, while the alternative hypothesis is that the data was not drawn from this
Gamma Distribution. Hypothesis tests are set up to disprove something.153
At the 10% significance level, we reject the hypothesis that the ungrouped data in Section 2 was
drawn from the maximum likelihood Gamma Distribution. In other words, if this data was drawn from
the maximum likelihood Gamma Distribution, then there is less than a 10% probability that the Chi-
Square Statistic would be at least 13.0, the observed value.
At the 10% significance level, we do not reject the hypothesis that the ungrouped data in Section 2
was drawn from the maximum likelihood Weibull Distribution.
At the 10% significance level, we do not reject the hypothesis that the ungrouped data in Section 2
was drawn from the maximum likelihood LogNormal Distribution.
Clearly, the data can not be drawn from both the LogNormal and the Weibull. It might be drawn
from one of them or neither of them, but not both. At most one of these hypotheses is true.
However, given the amount of data we have, only 130 data points, we do not have enough
information to disprove either of these hypothesis at the 10% significance level.
With more similar data to the 130 data points we have, we should be able to get more information
on which distribution generated the data.
For the LogNormal Distribution, the p-value is 38.0%. In other words, if this data was drawn from the
maximum likelihood LogNormal, then there is 38.0% probability that the Chi-Square Statistic would
be at least 5.3, the observed value.154 This does not demonstrate that the data came from this
LogNormal Distribution. Rather it provides insufficient information to disprove, at a 10% significance
level, that the data came from this LogNormal Distribution.
Some Intuition for the Chi-Square Test:
Assume that the data was drawn from the distribution F; in other words H0 is true.
For the interval ai to bi, the probability that each loss will be in that interval is: F(bi) - F(ai).
Therefore, for m losses, the number of losses in this interval is Binomial with parameters m and
q = F(bi) - F(ai). For small q, this Binomial is approximately a Poisson with mean
m{F(bi) - F(ai)}. For large m, this Poisson is approximately Normal with mean m{F(bi) - F(ai)} and
variance m{F(bi) - F(ai)}.
Number of losses expected in interval i: Ei = m {F(bi) - F(ai)}.
153
See the subsequent section on Hypothesis Testing.
154
For the Chi-Square Distribution with 8 - 1 - 2 = 5 degrees of freedom, S(5.3) = 38.0%.
Number of losses observed in interval i: Oi ≅ Normal with µ = Ei and σ2 = Ei.
(Oi - Ei) / Ei ≅ Normal with µ = 0 and σ = 1.
(Oi - Ei)2 / Ei ≅ Square of Standard Normal = Chi-Square with one degree of freedom.
The sum of ν independent Chi-Square Distributions each with one degree of freedom is a
Chi-Square Distribution with ν degrees of freedom. If we have ν intervals, we are summing ν
approximately Chi-Square Distributions with one degree of freedom. However, when we add up
the contributions from each interval, they are not independent, because ΣOi = ΣEi.
Therefore, one loses a degree of freedom.
While the above is not a derivation, hopefully it gave you some idea of where the test comes from.
Below is a derivation of the fact that as the number of items in the data set gets large,
the Chi-Square Statistic approaches a Chi-Square Distribution.
Derivation of the Chi-Square Test:155
For a claims process that followed the assumed distribution, the mean number of claims for the
interval [a,b] is N{F(b) - F(a)}, where F(x) is the assumed distribution function and N is total number
of claims. Let pi = F(bi) - F(ai) = the probability covered by interval i. Then the assumed mean
number of claims for interval i = µi = piN. The observed number of claims for interval i = xi. With k
intervals, the observed data has a multinomial distribution with probabilities: p1 , p2 ,.., pk.156
The variance of the number of observations in the first interval is: Np1 (1-p1 ).
The covariance of the number of observations in the first interval and the number of observations in
the second interval is: -Np1 p 2 .
This multinomial distribution has variance-covariance matrix C, with Cii = Npi(1-pi) and
C ij = -Npip j. Due to the linear constraint, Σpi = 1, C has rank k -1. If we eliminate the last row and last
column of C, we have a nonsingular matrix C*. Let D = the matrix inverse of C*.157 Then it turns out
that Dii = (1/pi + 1/pk)/N and Dij = (1/pk)/N.
As N gets large, this multinomial distribution approaches a multivariate Normal Distribution.

155
See for example, Volume I of Kendallʼs Advanced Theory of Statistics.
156
This is the multivariate analog of the Binomial Distribution, which involves a single variable.
157
Taking the inverse of the variance-covariance matrix in the multivariate case is analogous to taking the inverse of
the variance in the case of a Normal Distribution.
Therefore the quadratic form (x - µ)D(x - µ) has approximately a Chi-Square Distribution.158

It has number of degrees of freedom equal to the rank of D, which is: k -1 = # of intervals - 1.159
k-1 k-1 k-1

1 1 1
(x - µ)D(x - µ) = { ∑ ( + ) (xi - µi) + ∑ ∑ ( ) (xi - µ i) (xj - µj) } / N =
2
i=1 p i pk i=1 j=1, j≠i p k
(xi - µ i)2 k-1 k-1 (xi - µi) (x j - µj ) k-1 (xi - µ i)2 ⎧⎪k-1 ⎫⎪ 2
k-1
∑ Np + ∑ ∑ N pk
=∑
µi
+ ⎨∑ (xi - µi )⎬ / µk. However,
⎪⎩i=1 ⎪⎭
i=1 i i=1 j=1, j≠i i=1
k k k-1
∑ xi = N = ∑ µi . Therefore, ∑ (xi - µi ) = µk - xk. Therefore,
i=1 i=1 i=1
k
(xi - µ i)2
(x - µ)D(x - µ) = ∑ = the usual Chi-Square test statistic.
i=1 µ i
As noted above this quadratic form (x - µ)D(x - µ) has approximately a Chi-Square Distribution with
degrees of freedom = number of intervals - 1, as was to be proven.
One Rule of Thumb:
Note that this required that the multinomial distribution be approximated by a multivariate Normal
Distribution. This approximation is poor unless each µi is large enough.160 Thus as discussed
previously, one common rule of thumb is that in order to apply the Chi-Square test, each µi, the
assumed/fitted number of claims in each interval, should be at least 5.161
158
Subtracting the mean, squaring, and dividing by the variance is how one would convert a Normal Distribution to
the square of a Standard Normal Distribution, which is a Chi-Square Distribution with one degree of freedom. We
have done the analog in the multivariate case.
159
We lost one degree of freedom due to the linear constraint Σpi = 1. In the Chi-Square test, the total number of
expected claims over all intervals is set equal to the total number of observed claims.
160
This is similar to the Normal Approximation of a Binomial Distribution. This approximation is poor unless the mean
of the Binomial is large enough.
161
As discussed previously, statisticians use various different rules of thumb. See page 334 of Loss Models.
Chi-Square Distribution:
Let Z1 , Z2 , ..., Zν be independent Unit Normal variables, then Z1 2 + Z2 2 + ... + Zν2 is said to have
a Chi-Square Distribution with ν degrees of freedom.

A Chi-Square Distribution with ν degrees of freedom is the sum of ν independent Chi-Square
Distributions with 1 degree of freedom.
A Chi-Square Distribution with ν degrees of freedom is a Gamma Distribution with α = ν/2 and
θ = 2, with mean αθ = ν, and variance αθ2 = 2ν.

Therefore a Chi-Square Distribution with 2 degrees of freedom is a Gamma Distribution with
α = 2/2 = 1 and θ = 2, an Exponential Distribution with mean 2.
Therefore F(0.10) = 1 - e-0.1/2 = 5%, F(5.99) = 1 - e-5.99/2 = 95%, and F(7.38) = 1 - e-7.38/2 =
97.5%, matching the values shown in the Chi-Square Table.
Chi-Square Table:
On a subsequent page is the Chi-Square Table to be attached to your exam.

The different rows correspond to different degrees of freedom; in this case the degrees of freedom
extend from 1 to 20.162
For most exam questions, one first determines how many degrees of freedom one has, and
therefore which row of the table to use, and then ignores all of the other rows of the table.
The values shown in each row are the places where the Chi-Square Distribution Function for that
numbers of degrees of freedom has the stated P values.
The value of the distribution function is denoted by P (capital p.)
So for example, for 4 degrees of freedom, F(0.484) = 0.025 and F(13.277) = 0.990.
Unity minus the distribution function is the Survival Function; the value of the Survival Function is the
p-value (small p). For example, for 4 degrees of freedom, 13.277 is the critical value corresponding
to a significance level of 1 - 0.990 = 1%. The critical values corresponding to a 1% significance level
are in the column labeled P = 0.990. Similarly, the critical values corresponding to a 5% significance
level are in the column labeled P = 0.950.
162
One can approximate values beyond those shown in the table. For ν ≥ 40, 2 χ 2 is approximately Normal with
mean 2 ν - 1 and variance 1. A better approximation is that (χ2/ν)1/3 is approximately Normal with mean 1 - 2/(9ν)
and variance 2/(9ν). See Kendallʼs Advanced Theory of Statistics.
For example, for 4 degrees of freedom, as shown in the table, F(0.711) = 0.05.
5%
0.711 2 4 6 8 10
In other words, 0.711 is the 5th percentile of the Chi-Square Distribution with 4 degrees of freedom.
Similarly, as shown in the table for 4 degrees of freedom, F(11.143) = 0.975.
97.5%
2 4 6 8 11.143 14
Unity minus the Distribution Function is the Survival Function.

The value of the Survival Function is the p-value (small p) in the Chi-Square Goodness of
Fit Test.
For example, for 4 degrees of freedom, S(11.143) = 2.5%.
2.5%
2 4 6 8 11.143 14
11.143 is the critical value corresponding to a significance level of 1 - 0.975 = 2.5% in the
Chi-Square Goodness of Fit Test.
The critical values corresponding to a 1% significance level are in the column labeled
P = 0.990. For 4 degrees of freedom, 13.277 is the critical value corresponding to a significance
level of 1 - 0.99 = 1% in the Chi-Square Goodness of Fit Test.
1%
2 4 6 8 10 13.277 16
Similarly, the critical values corresponding to a 5% significance level for the Chi-Square Goodness of
Fit Test are in the column labeled P = 0.950.
For the following questions, use the following Chi-Square table:
χ2 Distribution P
1-P
0 χ02
The table below gives the value of χ 02 for which Prob[ χ 2 < χ 02 ] = P for a given number of degrees
of freedom and a given value of P.
Degrees of Value of P
Freedom 0.005 0.010 0.025 0.050 0.900 0.950 0.975 0.990 0.995
1 0.000 0.000 0.001 0.004 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 9.236 11.070 12.832 15.086 16.750
6 0.676 0.872 1.237 1.635 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 12.017 14.067 16.013 18.475 20.278
8 1.344 1.647 2.180 2.733 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 15.987 18.307 20.483 23.209 25.188
11 2.603 3.053 3.816 4.575 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 22.307 24.996 27.448 30.578 32.801
16 5.142 5.812 6.908 7.962 23.542 26.296 28.845 32.000 34.267
17 5.697 6.408 7.564 8.672 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.390 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 27.204 30.144 32.852 36.191 38.582
20 7.434 8.260 9.591 10.851 28.412 31.410 34.170 37.566 39.997
Problems:
12.1 (2 points) You observe the following 600 outcomes of rolling a six-sided die.
Result Observed Number
1 121
2 94
3 116
4 97
5 88
6 84
Based on the Chi-Square statistic, one tests the hypothesis H0 that the die is fair.
A. Reject H0 at α = 0.005.
B. Do not reject H0 at α = 0.005. Reject H0 at α = 0.010.
C. Do not reject H0 at α = 0.010. Reject H0 at α = 0.025.
D. Do not reject H0 at α = 0.025. Reject H0 at α = 0.050.
E. Do not reject H0 at α = 0.050.
12.2 (3 points) A baseball team plays 150 games per year and has lost the following number of
games (over these 12 periods of 5 years each):
Period Number of Games Lost Period Number of Games Lost
1901-05 402 1931-35 391
1906-10 451 1936-40 386
1911-15 412 1941-45 326
1916-20 357 1946-50 344
1921-25 370 1951-55 354
1926-30 389 1956-60 310
Total 4492
Let H0 be the hypothesis that the teamʼs results were drawn from the same distribution over time.
Using the Chi-Square statistic, which of the following is true?

12.3 (3 points) The distribution F(x) = 1 - (5/x)α, x > 5, where x is in units of thousands of dollars,
has been fit to the following grouped data, with the resulting estimate of the single parameter
α = 2.0.
Bottom of Interval Top of Interval # claims
$ Thous. $ Thous. in the Interval
5 10 7400
10 15 1450
15 20 475
20 25 230
25 50 350
50 75 50
75 100 20
100 Infinity 25
10000
Using the Chi-Square statistic, one tests the hypothesis H0 that the data was drawn from this fitted
distribution. Which of the following is true?
A. Reject H0 at a significance level of 0.5%.
B. Do not reject H0 at 0.5%. Reject H0 at 1%.
C. Do not reject H0 at 1%. Reject H0 at 2.5%.
D. Do not reject H0 at 2.5%. Reject H0 at 5%.
12.4 (2 points) You observe 5000 children, each of whom has a father with type A blood and a
mother with type B blood. 1183 children have Type A blood, 2612 have type AB blood and
1205 have type B blood. Use the Chi-Square statistic to test the hypothesis that children of a father
with type A blood and a mother with type B blood are expected to have types:
A, AB and B with probabilities 1/4, 1/2, 1/4.

12.5 (3 points) Given the grouped data below, what is the Chi-Square statistic for the density
function f(x) = 12x2 (1-x) on the interval [0,1]?
Range # of claims
0 to 0.2 35
0.2 to 0.3 65
0.3 to 0.4 95
0.4 to 0.5 120
0.5 to 0.6 135
0.6 to 0.7 160
0.7 to 0.8 185
0.8 to 0.9 155
0.9 to 1.0 50
1000
A. less than 15
E. at least 21
hypothesis H0 that the data is drawn from the density function f(x) = 12x2 (1-x) on the interval [0,1].

12.7 (3 points) 1000 claims have been grouped into intervals:

Bottom of Top of # claims in the
Interval Interval Interval
0 1 350
1 2 250
2 3 150
3 4 100
4 5 50
5 Infinity 100
An Exponential Distribution F(x) = 1 - e-x/θ has been fit to this grouped data.
The resulting estimate of θ is 2.18. What is the p-value of the Chi-Square statistic?
A. Less than 0.010
E. At least 0.100
12.8 (4 points) The Generalized Pareto Distribution, F(x) = β[τ, α; x/(θ+x)], with parameters
α = 2, θ = 400, τ = 5, is being compared to the following grouped data:
Range # of claims
0-1000 450
1000-2000 290
2000-3000 110
3000-4000 50
4000-5000 30
5000-10,000 50
over 10,000 20
1000
Use the following values of the F-distribution with 10 and 4 degrees of freedom:
y 0 0.5 1 2 3 4 5 10 20
F10,4[y] 0 0.171 0452 0.737 0.849 0.903 0.933 0.980 0.995
What is the value of the Chi-Square statistic?
Hint: β(a,b; x) = F[bx/ a(1-x)], where F is the F-Distribution with 2a and 2b degrees of freedom.
A. less than 0.5
E. at least 2.0
Use the following grouped data for the next three questions:
Bottom of Top of # claims in the
0 2.5 2625
2.5 5 975
5 10 775
10 25 500
25 Infinity 125
Total 5000
12.9 (3 points) A Pareto Distribution has been fit to the above grouped data. The resulting fitted
parameters are α = 2.3 and θ = 6.8. Using the Chi-Square statistic, one tests the hypothesis H0
that the data was drawn from this fitted distribution. Which of the following is true?
12.10 (3 points) A LogNormal Distribution has been fit to the above grouped data.
The resulting fitted parameters are µ = 0.8489 and σ = 1.251. What is the Chi-Square statistic?
A. less than 6
E. at least 9
12.11 (1 point) Using the Chi-Square statistic computed in the previous question, one tests the
hypothesis H0 that the data was drawn from this fitted distribution. Which of the following is true?


• Inverse Gaussian, LogNormal, Transformed Beta, Burr, Inverse Burr, ParaLogistic,
Transformed Gamma, Gamma, Weibull, and Exponential Distributions have each
been fit via Maximum Likelihood to the same set of data grouped into 10 intervals.
• Each interval has many more than 5 claims expected for each of the fitted Distributions.
• The values of the Chi-Square Statistic are as follows:
Distribution Chi-Square Number of Parameters
Transformed Beta 11.4 4
Transformed Gamma 11.9 3
Burr 12.7 3
Weibull 13.8 2
Gamma 14.3 2
Inverse Burr 15.0 3
ParaLogistic 16.2 2
Exponential 17.0 1
LogNormal 18.7 2
12.12 (2 points) Based on the p-values of the Chi-Square Statistic, rank the following three models
from best to worst: Inverse Burr, Weibull, Exponential.
A. Inverse Burr, Weibull, Exponential B. Weibull, Inverse Burr, Exponential
C. Inverse Burr, Exponential, Weibull D. Exponential, Weibull, Inverse Burr
from best to worst: Transformed Beta, Transformed Gamma, ParaLogistic.
A. Transformed Beta, Transformed Gamma, ParaLogistic
B. Transformed Gamma, Transformed Beta, ParaLogistic
C. Transformed Beta, ParaLogistic, Transformed Gamma
D. ParaLogistic, Transformed Gamma, Transformed Beta
from best to worst: Inverse Gaussian, Gamma, LogNormal.
A. Inverse Gaussian, Gamma, LogNormal
B. Gamma, Inverse Gaussian, LogNormal
C. Inverse Gaussian, LogNormal, Gamma
D. LogNormal, Gamma, Inverse Gaussian
12.15 (3 points) Calculate the Chi-Square statistic for the hypothesis that the underlying distribution
is Pareto with parameters alpha = 1 and theta = 2, given the 150 observations grouped below.
Class Range Frequency
1 0 to 3 75
2 3 to 7 30
3 7 to 10 10
4 10 and above 35
A. Less than 5.0
E. 8.0 or more
12.16 (3 points) One has a large amount of data split into only two intervals. The data is a random
sample from the assumed distribution, in other words the null hypothesis is true. Show that as the
number of data points goes to infinity, the distribution of the Chi-Square Statistic approaches that of
a Chi-Square Distribution with one degree of freedom.
Note: If X follows a Normal Distribution with mean 0 and variance 1, then X2 follows a
Chi-Square Distribution with one degree of freedom.
12.17 (3 points) One thousand policyholders were observed from the time they arranged a viatical
settlement until their death.
210 die during the first year, 200 die during the second year, 190 die during the third year, 200 die
during the fourth year, and 200 die during the fifth year.
t (t + 1)
Use the Chi-Square statistic to test the hypothesis, H0 , that F(t) = , 0 ≤ t ≤ 5, provides an
30
acceptable fit. Which of the following is true?
B. Do not reject H0 at 0.005; reject H0 at 0.010.
C. Do not reject H0 at 0.010; reject H0 at 0.025.
D. Do not reject H0 at 0.025; reject H0 at 0.050.
12.18 (1 point) According to Loss Models, which of the following statements are true?
1. For the chi-square goodness-of-fit test, if the sample size were to double, with each number
showing up twice instead of once, the test statistic would double and the critical values
would remain unchanged.
2. All models are wrong, but some may be useful.
3. If one fits a distribution by maximum likelihood to some grouped data, and one then computes
the chi-square statistic by comparing this fitted distribution to this data,
then the number of degrees of freedom in the approximating chi-square distribution
would be equal to the number of intervals minus the number of fitted parameters.
A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3
12.19 (2 points) A LogNormal distribution has been fit via maximum likelihood to the following
grouped data:
1-10,000 1496 4,500
10,001-25,000 365 6,437
25,001-100,000 267 13,933
100,001-300,000 99 16,488
300,001-1,000,000 15 7,207
Over 1,000,000 1 2,050
2243 50,615
The fitted LogNormal distribution has the following values of the distribution function:
F(10,000) = 0.6665, F(25,000) = 0.8261, F(100,000) = 0.9562, F(300,000) = 0.9898,
F(1,000,000) = 0.9986.
Using the Chi-Square Goodness-of-fit test, which of the following is true?
12.20 (3 points) For people with a certain type of cancer, who are age 70 at the time of diagnosis,
you have hypothesized that q70 = 0.3, q71 = 0.5, and q72 = 0.4.
1000 patients have been diagnosed with this type of cancer at age 70.
270 die within the first year, 376 die during the second year, 161 die during the third year,
and the remaining 193 patients survive more than 3 years.
Use the Chi-Square statistic to test this hypothesis.
12.21 (4 points) A LogNormal Distribution with µ = 8.3 and µ = 1.7 is compared to the following
data:
1-10,000 1124 3,082
10,001-50,000 372 7,851
50,001-100,000 83 5,422
100,001-300,000 51 7,607
300,001-1,000,000 5 2,050
Over 1,000,000 2 3,000
1637 29,012
Using the Chi-Square Goodness-of-fit test, which of the following is true?
12.22 (4 points) You are given the following random sample of 50 claims:
988, 1420, 1630, 1702, 1891, 2017, 2037, 2824, 3300, 3601, 3603, 3690, 3734,
4400, 5175, 5200, 5250, 5381, 5550, 6177, 6238, 6350, 6620, 7837, 7941, 8104,
9850, 10180, 10300, 10487, 10554, 11370, 11800, 12350, 12474, 13087, 13682,
13760, 13783, 14800, 16352, 17298, 19193, 19292, 21290, 24422, 28170, 36893,
62841, 85750.
You test the hypothesis that these claims follow a continuous distribution F(x) with the
following selected values:
x 1000 2500 5000 10,000 15,000 20,000 25,000 50,000
F(x) 0.06 0.21 0.41 0.63 0.75 0.82 0.87 0.95
You group the data using the largest number of groups such that the expected number of
claims in each group is at least 5.
Calculate the chi-square goodness-of-fit statistic.
(A) Less than 7
(E) At least 16

(i) 100 values in the interval from 0 to 1.
(ii) This data is then grouped into 5 ranges of equal length covering the interval from 0 to 1.
5
(iii) ∑O j2 = 2169.
j=1
(iv) The Chi-square goodness-of-fit test is performed, with H0 that the data was drawn from the
uniform distribution on the interval from 0 to 1.
Determine the result of the test.
(A) Do not reject H0 at the 0.10 significance level.
(B) Reject H0 at the 0.10 significance level, but not at the 0.05 significance level.
(C) Reject H0 at the 0.05 significance level, but not at the 0.025 significance level.
(D) Reject H0 at the 0.025 significance level, but not at the 0.01 significance level.
(E) Reject H0 at the 0.01 significance level.
12.24 (1/2 point) Is the following statement true?

If a Pareto distribution has been fit via maximum likelihood to some data grouped into intervals,
then no other Pareto Distribution can have a smaller Chi-Square Statistic than this Pareto Distribution.
12.25 (2 points) The following data is from Germany, on the percent of the population born by
quarters of the year, where the quarters are measured starting with the cutoff date for youth soccer
leagues.
Quarter 1 Quarter 2 Quarter 3 Quarter 4
Professional Soccer Players: 30% 25% 22% 23%
General Population: 24% 25% 26% 25%
Let H0 be the hypothesis that the distribution for professional soccer players is the same as that of
the general population.
If the p-value of a Chi-square Goodness of Fit test is 1%, how big is the sample of professional
soccer players?
A. 300 B. 350 C. 400 D. 450 E. 500

• Policies are written with a deductible of 500 and a maximum covered loss of 25,000.
• One thousand payments have been recorded as follows:
(0, 1000] 165
(1,000, 5,000] 292
(5,000, 10,000] 157
(10,000, 24,500) 200
24,500 186
• The null hypothesis, H0 , is that losses prior to the effects of the deductible and maximum
covered loss follow a Weibull Distribution, with parameters τ = 1/2 and θ = 7000.
Determine the Chi-Square Goodness of Fit Statistic.
A. Less than 1.5
E. 3.0 or more
12.27 (2 points) A sample of 2000 losses resulted in the following:

x ≤ 10 696
10 < x ≤ 25 1028
x > 25 276
Assume the data follows a Gamma Distribution with α = 3 and θ = 5.
(A) 11 (B) 12 (C) 13 (D) 14 (E) 15
12.28 (3 points) You observe 400 losses grouped in intervals.

(0, 50] 30
(50, 100] 50
(100, 250] 130
(250, 500] 190
Fit a uniform distribution on (0, ω) to this data via maximum likelihood.
Let the null hypothesis be that the data was drawn from this fitted distribution.
You test this hypothesis using the chi-square goodness-of-fit test.
(B) The hypothesis is rejected at the 0.10 significance level,
but is not rejected at the 0.05 significance level.
(C) The hypothesis is rejected at the 0.05 significance level,
(D) The hypothesis is rejected at the 0.025 significance level,
12.29 (2 points) You are given a sample of losses:

[0, 1000] 70
(1000, 2000] 60
(2000, ∞) 70
An Exponential Distribution has been fit via maximum likelihood to the above data resulting in an
estimate of θ of 1997.
Using the Chi-Square statistic, one tests the hypothesis H0 that the data was drawn from this fitted
distribution. Which of the following is true?
A. Reject H0 at a significance level of 1%.
B. Do not reject H0 at 1%. Reject H0 at 2.5%.
C. Do not reject H0 at 2.5%. Reject H0 at 5%.
D. Do not reject H0 at 5%. Reject H0 at 10%.
12.30 (1 point) Data on 300 sizes of loss have been grouped into 12 intervals.
A four parameter distribution has been fit to this data via maximum Iikelihood.
The Chi-Square goodness-of-fit statistic is then computed as 15.038.
12.31 (4, 5/87, Q.54) (3 points) You want to test the hypothesis H0 : F(x) = F0 (x), where F0 (x) is a
Pareto distribution with parameters α = 1.5, θ = 1500.
You have 100 observations in the following ranges:
0 - 1000 55
1000 - 2500 15
2500 - 10000 22
10000 and over 8
In which of the following ranges does the Chi-Square statistic fall?
A. Less than 3
C. At least 4.5, but less than 6
E. 7 or more.
12.32 (160, 11/87, Q.16) (2.1 points) In the following table, the values of tIqx are calculated from a
fully specified survival model, and the values of dx+t are observed deaths
from the complete mortality experience of 100 cancer patients age x at entry to the study.
t tIq x dx+t
0 0.10 15
1 0.25 30
2 0.25 20
3 0.20 15
4 0.15 10
5 0.05 10
You hypothesize that the mortality of cancer patients is governed by the specified model.
Let χ2 be the value of the Chi-Square statistic used to test the validity of this model, and ν be the
degrees of freedom. Determine χ2 - ν.

(A) 6.4 (B) 7.4 (C) 8.4 (D) 45.3 (E) 46.3
12.33 (4, 5/90, Q.55) (3 points) Two states provide workers' compensation indemnity claim
benefits. One state caps payments at $50,000 and the other state caps payments at $100,000.
Size of indemnity claim data is provided in the following table.
Claim Number of Claims
Size State 1 State 2
0 - 24,999 2,900 1,000
25,000 - 49,999 1,200 300
50,000 - ∞ 900 -
50,000 - 59,999 - 200
60,000 - 99,999 - 300
100,000 - ∞ - 200
We are fitting the combined data for both states to a Pareto distribution, using minimum chi-square
estimation. The chi-square statistic is being calculated by combining the number of claims from the
two claim size intervals common to both states, which results in six terms for the chi-square statistic.
What is the chi-square statistic, Q, corresponding to the parameter estimates α = 3 and
θ = 75,000?
A. Q < 200
B. 200 < Q < 400
C. 400 < Q < 600
D. 600 ≤ Q < 800
E. 800 ≤ Q
12.34 (4, 5/91, Q.40) (2 points) Calculate the the Chi-Square statistic, χ2, to test the hypothesis
that the underlying distribution is Loglogistic with parameters θ = 2 and γ = 1,
given the 25 grouped observations.
Range Number of Observations
0≤x<2 8
2≤x<6 5
6 ≤ x < 10 4
10 ≤ x < 14 3
14 ≤ x 5
A. χ2 ≤ 3.0 B. 3.0 < χ2 ≤ 5.0 C. 5.0 < χ2 ≤ 7.0 D. 7.0 < χ2 ≤ 9.0 E. 9.0 < χ2
• Basic limits data from two states have been collected.
Indiana's data is capped at $25,000 while New Jersey's data is capped at $50,000.
Number of claims by size of loss are:
Number of Claims
Claim Size Indiana New Jersey
0 - 9,999 4,925 2,405
10,000 - 24,999 2,645 1,325
25,000 - ∞ 2,430 -
25,000 - 34,999 - 405
35,000 - 49,999 - 325
50,000 - ∞ - 540
10,000 5,000
• The underlying loss distribution is the same in both states.
• A Pareto distribution, with parameters α = 2 and θ = 25,000, has been fit to
the combined data from the two states using minimum chi-square estimation.
Calculate the Chi-Square statistic using the six distinct size classes.
A. Less than 5.00
E. At least 8.00
• A random sample, x1 , ..., x20 is taken from a probability distribution function F(x)
1.07, 1.07, 1.12, 1.35, 1.48, 1.59, 1.60, 1.74, 1.90, 2.02,
2.05, 2.07, 2.15, 2.16, 2.21, 2.34, 2.75, 2.80, 3.64, 10.42.
• The probability distribution function F(x) underlying the random sample is assumed to have
the form F(x) = 1 - (1/x), x ≥ 1.
• The observations from the random sample were segregated into the following intervals:
[1, 4/3), [4/3, 2), [2, 4), [4, ∞)
You are to use the Chi-Square statistic to test the hypothesis, H0 , that F(x) provides an acceptable
fit. Which of the following is true?
B. Do not reject H0 at α = 0.005; reject H0 at α = 0.010.
C. Do not reject H0 at α = 0.010; reject H0 at α = 0.025.
D. Do not reject H0 at α = 0.025; reject H0 at α = 0.050.
• X is a random variable assumed to have a Pareto distribution
with parameters α = 2 and θ = 1000.
• A random sample of 10 observations of X yields the values
100, 200, 225, 275, 400, 700, 800, 900, 1500, 3000.
Use the intervals [0,250), [250,500), [500,1000), and [1000,∞) to calculate the
Chi-Square statistic, Q, to test the Pareto assumption.
A. Q < 0.80
B. 0.80 ≤ Q < 1.00
C. 1.00 ≤ Q < 1.20
D. 1.20 ≤ Q < 1.40
E. 1.40 ≤ Q
A random sample of 1,000 observations from a loss distribution has been grouped into five
intervals as follows:
Interval Number of Observations
[ 0, 3.0) 180
[3.0, 7.5) 180
[7.5, 15.0) 235
[15.0, 40.0) 255
[40.0, ∞) 150
1000
The loss distribution is believed to be a Pareto distribution and the minimum
chi-square technique has been used with the grouped data to estimate the parameters,
α = 3.5 and θ = 50. Using the Chi-Square statistic, what is the highest significance level at which you
would not reject this fitted distribution?
A. Less than 0.005
B. 0.005
C. 0.010
D. 0.025
E. 0.050
• 100 observed losses have been recorded in thousands of dollars and are grouped as follows:
(0, 1) 15
[1, 5) 40
[5,10) 20
[10,15) 15
[15, ∞) 10
• The random variable X underlying the observed losses, in thousands, is believed to have
the density function:
f(x) = (1/5) e-x/5, x > 0.
Determine the value of the Chi-Square statistic.
A. Less than 2
E. At least 11
12.40 (4B, 5/96, Q.23) (2 points) Forty (40) observed losses have been recorded in thousands
of dollars and are grouped as follows:
Interval Number of Total Losses
($000) Losses ($000)
(1, 4/3) 16 20
[4/3, 2) 10 15
[2, 4) 10 35
[4, ∞) 4 20
The null hypothesis, H0 , is that the random variable X underlying the observed losses, in thousands,
has the density function
f(x) = 1/x2 , x > 1.
Using the Chi-Square statistic, determine which of the following statements is true.
For a complete study of 150 patients diagnosed with a fatal disease, you are given:
(i) For each patient, t = 0 at time of diagnosis.
(ii) The number of deaths in each interval is
Interval Number of Deaths
(0,1] 21
(1,2] 27
(2,3] 39
(3,4] 63
(iii) The χ2 statistic is used to test the fit of the survival model S(t) = 1 - t(t + 1)/20, 0 ≤ t ≤ 4.
(iv) The appropriate number of degrees of freedom is denoted by ν.
χ2
Calculate .
ν
(A) 0.5 (B) 0.6 (C) 0.7 (D) 0.9 (E) 1.2
(3,000, 5,000] 6
(5,000, 10,000] 29
(10,000, 25,000] 39
(25,000, ∞) 26
with parameters α = 2 and θ = 25,000.
A chi-square test is performed using the Chi-Square statistic with four classes.
Determine which of the following statements is true.
of Losses Squares
(0,2000] 39 38,065 52,170,078
(2000,4000] 22 63,816 194,241,387
(4000,8000] 17 96,447 572,753,313
(8000, 15000] 12 137,595 1,628,670,023
(15,000 ∞) 10 331,831 17,906,839,238
Total 100 667,754 20,354,674,039
When a Pareto Distribution was fit via the method of moments to a different data set, the estimated
parameters were α = 2.5 and θ = 10,000. Determine the chi-square statistic and number of degrees
of freedom for a test (with five groups) to access the acceptability of fit of the data above to these
parameters.
12.44 (4, 11/04, Q.10 & 2009 Sample Q.140) (2.5 points)
You are given the following random sample of 30 auto claims:
54 140 230 560 600 1,100 1,500 1,800 1,920 2,000
2,450 2,500 2,580 2,910 3,800 3,800 3,810 3,870 4,000 4,800
7,200 7,390 11,750 12,000 15,000 25,000 30,000 32,300 35,000 55,000
You test the hypothesis that auto claims follow a continuous distribution F(x) with the
following percentiles:
x 310 500 2,498 4,876 7,498 12,930
F(x) 0.16 0.27 0.55 0.81 0.90 0.95
You group the data using the largest number of groups such that the expected number of
claims in each group is at least 5.
(A) Less than 7
(E) At least 16
12.45 (4, 5/05, Q.33 & 2009 Sample Q.201) (2.9 points) You test the hypothesis that a given
set of data comes from a known distribution with distribution function F(x).
The following data were collected:
Interval F(xi) Number of Observations
x<2 0.035 5
2≤x<5 0.130 42
5≤x<7 0.630 137
7≤x<8 0.830 66
8≤x 1.000 50
Total 300
where xi is the upper endpoint of each interval.
You test the hypothesis using the chi-square goodness-of-fit test.
(B) The hypothesis is rejected at the 0.10 significance level,
(C) The hypothesis is rejected at the 0.05 significance level,
(D) The hypothesis is rejected at the 0.025 significance level,

(i) A computer program simulates n = 1000 pseudo-U(0, 1) variates.
(ii) The variates are grouped into k = 20 ranges of equal length.
20
(iii) ∑O j2 = 51,850
j=1
(iv) The Chi-square goodness-of-fit test for U(0, 1) is performed.

12.47 (CAS3L, 11/12, Q.23) (2.5 points)

A six-sided die is rolled 120 times with the following distribution of outcomes:
Outcome Frequency
1 15
2 13
3 28
4 25
5 12
6 27
The following hypothesis test has been set up:
H0 : The die is fair (outcomes are equally likely).
H1 : The die is not fair.
Determine the significance level at which one would reject the null hypothesis given the outcomes in
the table above.
A. Less than 0.5%
E. At least 5.0%
12.1. D. For 6 intervals so have 6 - 1 = 5 degrees of freedom. Chi-Square Statistic is 11.42,

which is greater than 11.070 but less than 12.832.
Since 11.42 > 11.070, we reject at 5%; since 11.42 < 12.832, we do not reject at 2.5%.
Result Observed Number Expected Number ((Observed - Expected)^2)/Expected
1 121 100 4.41
2 94 100 0.36
3 116 100 2.56
4 97 100 0.09
5 88 100 1.44
6 84 100 2.56
Sum 600 600 11.42
12.2. A. With 12 intervals there are 12 - 1 = 11 degrees of freedom.

The Chi-Square Statistic is 45 as computed below. Since 45 > 26.757, one rejects at 1/2%.
Period Observed Number Assumed Prob. Assumed Number ((Observed - Assumed)^2)/Assumed

1 402 1/12 374.33 2.04
2 451 1/12 374.33 15.70
3 412 1/12 374.33 3.79
4 357 1/12 374.33 0.80
5 370 1/12 374.33 0.05
6 389 1/12 374.33 0.57
7 391 1/12 374.33 0.74
8 386 1/12 374.33 0.36
9 326 1/12 374.33 6.24
10 344 1/12 374.33 2.46
11 354 1/12 374.33 1.10
12 310 1/12 374.33 11.06
Sum 4492 1 4492 44.93
Comment: We assume a uniform distribution, 1/12 in each period. No parameters have been fit.
As always, we lose one degree of freedom, 12 - 1 = 11, since the total of the assumed and
observed columns are set equal.
12.3. D. One has 8 intervals and has fit one parameter, therefore one has 8 - 1 - 1 = 6 degrees of
freedom. One computes the Chi-Square Statistic as 13.29 as shown below.
Since 12.592 < 13.29 < 14.449, one rejects at 5% and does not reject at 2.5%.
Bottom of Top of # claims Fitted Chi
Interval Interval in the F(lower) F(upper) # claims Square
$ Thous. $ Thous. Interval
5 10 7400 0.00000 0.75000 7500.0 1.33
10 15 1450 0.75000 0.88889 1388.9 2.69
15 20 475 0.88889 0.93750 486.1 0.25
20 25 230 0.93750 0.96000 225.0 0.11
25 50 350 0.96000 0.99000 300.0 8.33
50 75 50 0.99000 0.99556 55.6 0.56
75 100 20 0.99556 0.99750 19.4 0.02
100 Infinity 25 0.99750 1.00000 25.0 0.00
SUM 10000 10000 13.29
12.4. B. For 3 types we have 3 - 1 = 2 degrees of freedom.

Chi-Square Statistic is 10.23, which is greater than 9.21 but less than 10.60.
Since 10.23 > 9.210, we reject at 1%; since 10.23 < 10.597, we do not reject at 1/2%.
Type Observed Number Assumed Number ((Observed - Assumed)^2)/Assumed
A 1183 1250 3.59
AB 2612 2500 5.02
B 1205 1250 1.62
Sum 5000 5000 10.23
x
12.5. C. F(x) = ∫0 f(t) dt = 4x3 - 3x4, 0≤ x ≤1.
The Chi-square statistic is computed as by taking the sum of:
(fitted - observed)2 / fitted numbers of claims. For each interval, the fitted number of claims =
{total number of claims}{F(upper) - F(lower)}
Interval Interval in the Interval F(lower) F(upper) # claims Square
0 0.2 35 0.0000 0.0272 27.2 2.24
0.2 0.3 65 0.0272 0.0837 56.5 1.28
0.3 0.4 95 0.0837 0.1792 95.5 0.00
0.4 0.5 120 0.1792 0.3125 133.3 1.33
0.5 0.6 135 0.3125 0.4752 162.7 4.72
0.6 0.7 160 0.4752 0.6517 176.5 1.54
0.7 0.8 185 0.6517 0.8192 167.5 1.83
0.8 0.9 155 0.8192 0.9477 128.5 5.46
0.9 1 50 0.9477 1.0000 52.3 0.10
1000 1000 18.50
12.6. C. Chi-Square is 18.50. There are 9 intervals, so we have 9 - 1 = 8 degrees of freedom.

Since 18.50 >17.535, we reject at 2.5%; since 18.50 < 20.090, we do not reject at 1%.
12.7. E. One has 6 intervals and has fit one parameter, therefore one has 6 - 1 - 1 = 4 degrees of
freedom. One computes the Chi-Square Statistic as 4.09 as shown below. Since 4.09 < 7.779,
the p-value is greater than 10%. This is equivalent to saying we do not reject at a 10%
significance level the hypothesis that the data was drawn from the fitted Exponential Distribution.
Bottom of Top of # claims F(Upper) Fitted (Observed-
Interval F(Lower) /Fitted
0 1 350 0.00000 0.36791 0.36791 367.9 0.87
1 2 250 0.36791 0.60046 0.23255 232.6 1.31
2 3 150 0.60046 0.74745 0.14699 147.0 0.06
3 4 100 0.74745 0.84036 0.09291 92.9 0.54
4 5 50 0.84036 0.89910 0.05873 58.7 1.30
5 Infinity 100 0.89910 1.00000 0.10090 100.9 0.01
SUM 1000 1000 4.09
For example, F(3) = 1 - exp(-3/2.18) = 0.74745. The fitted number of claims in the interval 2 to 3 is:
(1000) {F(3) - F(2)} = (1000)(0.74745 - 0.60046) = 147.0; 1000 is the total observed claims.
The contribution to the Chi-Square Statistic from the interval 2 to 3 is: (150-147)2 / 147 = 0.06.
The Chi-Square Statistic is the sum of the contributions: 4.09.
Comment: For the Chi-Square Distribution with 4 degrees of freedom S(7.779) = 10%; therefore
S(4.09) > S(7.779) = 10%. Using a computer, S(4.09) = 39%.
12.8. B. Use the relationship between the F-Distribution and the Incomplete Beta function.
β(a, b; y) = F[by/(a(1-y))], where F is the F-distribution with 2a and 2b degrees of freedom.
Let a = 5, b = 2, y = x/(x + 400). ⇒ 1 - y = 400/(x + 400). Therefore, the Generalized Pareto

Distribution = β(5, 2; x/(x + 400)) = F10,4[2{x/(x+400)} / {5(400/(x+400))}] = F10,4[2x/2000] =
F10,4[x/1000]. So for example, the Distribution function at x =1000 is F10,4[1] = .452.
Thus the predicted portion of claims between $1000 and $2000 is F10,4[2] - F10,4[1] =
0.737 - 0.452 = 0.285. With 1000 total claims, the predicted number of claims for this interval is
285. Summing the contributions from each interval, χ2 = 0.620.
Interval ($000) Predicted Observed {(Pred-Obs)^2}/Pred
0-1 452 450 0.009
1-2 285 290 0.088
2-3 112 110 0.036
3-4 54 50 0.296
4-5 30 30 0.000
5-10 47 50 0.191
over 10 20 20 0.000
Total 1000 1000 0.620
Comment: This is a difficult question!
12.9. D. One has 5 intervals and has fit two parameters, therefore one has 5 - 1 - 2 = 2 degrees of
freedom. One computes the Chi-Square Statistic as 7.23 as shown below. Since 5.991 < 7.23 <
7.378, one rejects at 5% and does not reject at 2.5%. (Using the row for 2 degrees of freedom,
find the columns that bracket the Chi-Square Statistic. Reject at the significance level of the column to
the left and do not reject at the significance level of the column to the right.)
0 2.5 2625 0.00000 0.51330 0.51330 2566.5 1.33
2.5 5 975 0.51330 0.71852 0.20522 1026.1 2.55
5 10 775 0.71852 0.87510 0.15658 782.9 0.08
10 25 500 0.87510 0.97121 0.09611 480.6 0.79
25 Infinity 125 0.97121 1.00000 0.02879 143.9 2.49
SUM 5000 5000 7.23
For example, F(25) = 1 - {θ/(θ + x)}α = 1 - {6.8/(6.8 + 25)}2.3 = 0.97121.
The fitted number of claims in the interval 10 to 25 is: (5000){ F(25) - F(10) } =
(5000)(0.97121 - 0.87510) = 480.6, where 5000 is the total observed claims.
The contribution from the interval 10 to 25 is: (500 - 480.6)2 / 480.6 = 0.79.
12.10. E. F(2.5) = Φ[{ln(x) − µ} / σ] = Φ[{ln(2.5) - .8489} / 1.251] = Φ[0.05] = 0.5199.

F(5) = Φ[{ln(5) - .8489} / 1.251] = Φ[0.61] = 0.7291.
F(10) = Φ[{ln(10) - .8489} / 1.251] = Φ[1.16] = 0.8770.
F(25) = Φ[{ln(25) - .8489} / 1.251] = Φ[1.89] = 0.9706.
The expected number of claims in the interval 10 to 25 is: (5000){F(25) - F(10)} =
(5000)(0.9706 - 0.8770) = 468.0. The contribution to the Chi-Square Statistic from the interval 10
to 25 is: (500 - 468.0)2 / 468.0 = 2.19. The sum of the contributions is 12.25.
Bottom of Top of # claims F(Upper) Expected (Observed-
Interval Interval in the F(lower) F(upper) minus # claims Expected)^2
Interval F(Lower) /Expected
0 2.5 2625 0.0000 0.5199 0.5199 2599.5 0.25
2.5 5 975 0.5199 0.7291 0.2092 1046.0 4.82
5 10 775 0.7291 0.8770 0.1479 739.5 1.70
10 25 500 0.8770 0.9706 0.0936 468.0 2.19
25 Infinity 125 0.9706 1.0000 0.0294 147.0 3.29
SUM 5000 5000 12.25
Comment: Without rounding prior to looking in the Normal Table, the statistic would be 9.9.
The Pareto Distribution in the prior question was a better fit to this data than the LogNormal
Distribution, since the Pareto had a smaller Chi-Square Statistic. Note that the Pareto and the
LogNormal have the same number of parameters, so it is appropriate to compare them in this way.
12.11. A. One has 5 intervals and has fit two parameters, therefore one has 5 - 1 - 2 = 2 degrees
of freedom. One computes the Chi-Square Statistic as 12.25 as shown previously.
Since 10.597 < 12.25, one rejects at 1/2%.
12.12. E. The number of degrees of freedom equals the number of intervals minus one minus the
number of fitted parameters = 9 - number of fitted parameters.
The Inverse Burr has 3 parameters, so 6 d.f.
Since 14.449 < 15 < 16.812, the p-value for the Inverse Burr is between 1% and 2.5%.
The Weibull has 2 parameters, so 7 d.f.
Since 12.017< 13.8 < 14.067, the p-value for the Weibull is between 5% and 10%.
The Exponential has 1 parameter, so 8 d.f.
Since 15.507 < 17 < 17.535, the Exponential has a p-value between 2.5% and 5%.
Thus the Weibull has the biggest p-value, while the Inverse Burr has the smallest.
Thus the fits from best to worst are: Weibull, Exponential, Inverse Burr.
Comment: Since the Inverse Burr has a larger Chi-square and more parameters than the Weibull,
the Inverse Burr is a worse fit than the Weibull and has a smaller p-value. It is only necessary to
compute the p-values and compare them if the Distribution with more parameters has a smaller Chi-
Square.
12.13. B. The number of degrees of freedom equals the number of intervals minus one minus the
number of fitted parameters = 9 - number of fitted parameters. The Transformed Beta has 4
parameters, so 5 d.f. Since 11.070 < 11.4 < 12.832, the Transformed Beta has a
p-value between 2.5% and 5%. The Transformed Gamma has 3 parameters, so 6 d.f. Since
10.685 < 11.9 < 12.592, the p-value for the Transformed Gamma is between 5% and 10%. The
ParaLogistic has 2 parameters, so 7 d.f. Since 16.013 < 16.2 < 18.475, the p-value for the
ParaLogistic is between 1% and 2.5%. Thus the Transformed Gamma has the biggest p-value,
while the ParaLogistic has the smallest. Thus the fits from best to worst are: Transformed Gamma,
Transformed Beta, ParaLogistic.
12.14. A. Since they all have the same numbers of degrees of freedom, the one with the smallest
Chi-square has the biggest p-value and the best fit. Thus the fits from best to worst are:
Inverse Gaussian, Gamma, LogNormal.
Comment: When the Distributions have the same number of parameters (and the same number of
intervals) one can just rank the Chi-Square values without computing p-values. Smallest Chi-Square
is the best fit.
For this set of questions, using a computer, the p-values are as follows:
Distribution χ2 d.f. p-value
Transformed Beta 11.4 5 4.4%
Transformed Gamma 11.9 6 6.4%
Burr 12.7 6 4.8%
Inverse Gaussian 13.0 7 7.2%
Weibull 13.8 7 5.5%
Gamma 14.3 7 4.6%
Inverse Burr 15.0 6 2.0%
ParaLogistic 16.2 7 2.3%
Exponential 17.0 8 3.0%
LogNormal 18.7 7 0.9%
12.15. D. For a Pareto with α = 1 and θ = 2: F(x) = 1 - 2/(x+2) = x / (x+2).

The fitted number of claims for each interval is:
150{F(top of interval) - F(bottom of interval)}
Bottom Top F(Bottom F(Bottom Fitted Observed (Observed-Fitted)^2
of Interval of Interval of Interval) of Interval) Number Number /Fitted
0 3 0.000 0.600 90.000 75 2.50
3 7 0.600 0.778 26.667 30 0.42
7 10 0.778 0.833 8.333 10 0.33
10 ∞ 0.833 1.000 25.000 35 4.00
Sum 150 150 7.25
12.16. Let p1 = expected probability for the first interval.

Let p2 = 1 - p1 = expected probability for the second interval.
Let O1 = number observed in the first interval.
Let O2 = number observed in the second interval.
Let n = O1 + O2 = number of data points.
Let E1 = p1 n = number expected in the first interval.
Let E2 = p2 n = number expected in the second interval.
Note that O1 + O2 = E1 + E2 . ⇒ O1 - E1 = E2 - O2 . ⇒ (O1 - E1 )2 = (O2 - E2 )2 .
χ 2 = (O1 - E1 )2 /E1 + (O2 - E2 )2 /E2 = (O1 - E1 )2 {1/E1 + 1/E2 } = (O1 - E1 )2 (E1 + E2 )/(E1 E2 )
= (O1 - E1 )2 n/{n p1 n (1-p1 )} = (O1 - n p1 )2 /{n p1 (1-p1 )} = {(O1 - n p1 )/ n p1(1- p1) }2 .

Since H0 is true, O1 is Binomially Distributed, with parameters n and p1 .
O 1 has mean n p1 and variance n p1 (1-p1 ).
Therefore, (O1 - n p1 )/ n p1(1- p1) is approximately Standard Normally Distributed.
As n approaches infinity, (O1 - n p1 )/ n p1(1- p1) approaches a Standard Normal Distribution. ⇒
χ 2 = {(O1 - n p1 )/ n p1(1- p1) }2 approaches a Chi-Square Distribution with 1 degree of freedom.

Comment: When computing the Chi-square goodness-of-fit test statistic, for a given interval, the
Expected does approach the Observed as a percent. In other words,
|Expected - Observed|/ Expected goes to zero for each interval. However, the contribution from
each interval, (Expected - Observed)2 / Expected, does not go to zero due to the square in the
numerator. In the above example, if H0 is true, the sum of the contributions, the Chi-Square Statistic,
approaches the square of a Standard Normal Distribution. Even if the assumed distribution is not
correct, but the assumed distribution function matches the true distribution function at the
breakpoint between the intervals, then the result still holds. With only two intervals, we are
really only testing whether the assumed distribution function value at the breakpoint is correct.
12.17. C. There are 5 - 1 = 4 degrees of freedom.

Lower Upper obser- Assumed Chi-
Limit Limit vations F(lower) F(upper) # claims Square
0 1 210 0.0000 0.2582 258.2 9.00
1 2 200 0.2582 0.4472 189.0 0.64
2 3 190 0.4472 0.6325 185.2 0.12
3 4 200 0.6325 0.8165 184.0 1.38
4 5 200 0.8165 1.0000 183.5 1.48
1000 12.62
Since 11.143 < 12.62 ≤ 13.277, reject at 2.5% and do not reject at 1%.
12.18. D. 1. True. Both the expected, Ei, and the observed, Oi, for each interval would double.
Therefore, Σ (Ei - Oi)2/Ei would also double.
The same number of intervals. ⇒ The same numbers of degrees of freedom.
⇒ Look in the same row of the Chi-Square Table. ⇒ The same critical values.
2. True. What Professor Klugman means, quoting George Box, is that any statistical model is only
an approximation to reality.
3. False. Number of intervals minus one minus the number of fitted parameters.
Comment: Statement 2 is a quote from the textbook taken out of context. Exam questions should
not quote from the textbook out of context, but once in a while they do.
“For the chi-square goodness-of-fit test, if the sample size were to double, with each number
showing up twice instead of once, the test statistic would double and the critical values would remain
unchanged.”
Assuming the null hypothesis were true, then if we took a second sample of size equal to the first,
we would not expect to get the same proportion of values in each interval as we did in the first
sample. For example, let us assume a case where the null hypothesis was true, and the first sample
had a very unusually large number of items in the first interval. This can happen due to random
fluctuation. However, we have no reason to assume that this would again occur for a second sample.
In fact, if the null hypothesis was true, if we were to take enough samples, we would expect the total
proportion of items observed to approach the expected in each interval. The Chi-Square Statistic
would approach a Chi-Square Distribution with the appropriate number of degrees of freedom.
If instead the null hypothesis is false, then for most or all intervals the probability covered by an
interval would not match that which we calculated from the null hypothesis. Therefore, as we took
more samples, the observed total proportions should not approach the expected proportions
derived from the null hypothesis. The Chi-Square Statistic would not approach a Chi-Square
Distribution.
The power of a test is the probability of rejecting H0 when it is false.
As the total sample size increases, the power of the test increases.
For a relatively small sample, even if H0 is not true, there may not be enough statistical evidence to
reject H0 . It is easy to get a small sample which is not too bad a match to H0 , even though the data
was not drawn from the assumed distribution.
For a very large sample, if H0 is not true, there is likely to be enough statistical evidence to reject H0 .
When the data was not drawn from the assumed distribution, there is only a very small probability of
getting a large sample which is a good match to H0 ,.
12.19. A. In order to have 5 expected claims, we need to group the last 2 intervals together.
1 10,000 1496 0.0000 0.6665 0.66650 1495.0 0.00
10,000 25,000 365 0.6665 0.8261 0.15960 358.0 0.14
25,000 100,000 267 0.8261 0.9562 0.13010 291.8 2.11
100,000 300,000 99 0.9562 0.9898 0.03360 75.4 7.41
300,000 Infinity 16 0.9898 1.0000 0.01020 22.9 2.07
SUM 2243 2243 11.73
There are two fitted parameters, and thus 5 - 1 - 2 = 2 degrees of freedom.
For 2 degrees of freedom, the 1/2% critical value is 10.597.
10.597 < 11.73. ⇒ Reject at 0.5%.
Comment: We make no use of the given dollars of loss to answer the question.
We can only try to group consecutive intervals.
The over 1 million interval would have an expected number of:
2243 S(1 million) = (2243)(0.0014) = 3.14 < 5.
Thus we combine the 300,001-1,000,000 and over 1,000,000 intervals.
Having done this, we confirm that the expected number in each interval is at least 5.
This same rule of thumb was used in 4, 11/04, Q.10.
The Maximum Likelihood LogNormal Distribution has µ = 8.435 and σ = 1.802; fitting done on a
computer. The data was taken from AIA Closed Claim Study (1974) in Table IV of “Estimating Pure
Premiums by Layer - An Approach” by Robert J. Finger, PCAS 1976.
12.20. B. The number expected to die during the first year is: (0.3)(1000) = 300.
The number expected to die during the second year is: (0.5)(1-.3)(1000) = 350.
The number expected to die during the third year is: (0.4)(1-.5)(1-.3)(1000) = 140.
The number expected to survive more than 3 years is: 1000 - 300 - 350 - 140 = 210.
time Observed Number Expected Number ((Observed - Expected)^2)/Expected
0 to 1 270 300 3.000
1 to 2 376 350 1.931
2 to 3 161 140 3.150
more than 3 193 210 1.376
Sum 1000 1000 9.458
The number of degrees of freedom is: 4 - 1 - 0 = 3.
9.348 < 9.458 < 11.345. ⇒ Do not reject H0 at 0.010; reject H0 at 0.025.
12.21. C. In order to have 5 expected claims, we need to group the last two intervals together.
F(10,000) = Φ[{ln(x) − µ} / σ] = Φ[{ln(10,000) - 8.3} / 1.7] = Φ[0.54] = 0.7054.
F(50,000) = Φ[{ln(50,000) - 8.3} / 1.7] = Φ[1.48] = 0.9306.
F(100,000) = Φ[{ln(100,000) - 8.3} / 1.7] = Φ[1.89] = 0.9706.
F(300,000) = Φ[{ln(300,000) - 8.3} / 1.7] = Φ[2.54] = 0.9945.
1 10,000 1124 0.0000 0.7054 0.70540 1154.7 0.82
10,000 50,000 372 0.7054 0.9306 0.22520 368.7 0.03
50,000 100,000 83 0.9306 0.9706 0.04000 65.5 4.69
100,000 300,000 51 0.9706 0.9945 0.02390 39.1 3.60
300,000 Infinity 7 0.9945 1.0000 0.00550 9.0 0.45
SUM 1637 1637 9.59
For 5 - 1 = 4 d.f., the 5% and 2.5% critical values are: 9.488 and 11.143.
9.488 < 9.59 < 11.143. ⇒ Reject at 5% and do not reject at 2.5%.
Comment: We make no use of the given dollars of loss to answer the question.
Data taken from NAIC Closed Claim Study (1975) in Table VII of “Estimating Pure Premiums by
Layer - An Approach” by Robert J. Finger, PCAS 1976.
Without rounding prior to looking in the Normal Table, the statistic would be 9.9.
12.22. C. There are 50 observed, so for an interval to have at least 5 expected claims, it must
cover at least: 5/50 = 10%. The largest number of groups that accomplishes this is: 0 to 2500, 2500
to 5000, 5000 to 10,000, 10,000 to 15,000, 15,000 to 25,000, and 25,000 to ∞.
Each contribution from an interval is: (assumed - observed)2 /assumed.
Lower Upper obser- Prob. in Assumed Chi-
Limit Limit vations F(lower) F(upper) Interval # claims Square
0 2,500 7 0.00 0.21 0.21 10.5 1.17
2,500 5,000 7 0.21 0.41 0.20 10.0 0.90
5,000 10,000 13 0.41 0.63 0.22 11.0 0.36
10,000 15,000 13 0.63 0.75 0.12 6.0 8.17
15,000 25,000 6 0.75 0.87 0.12 6.0 0.00
25,000 infinity 4 0.87 1.00 0.13 6.5 0.96
50 50.0 11.56
5 5
12.23. B. ∑O j = ∑Ej = 100.
j=1 j=1
Since the intervals are of the same length and we are assuming a uniform distribution, the expected
numbers in each interval are the same. Ej = 100/5 = 20 for all j.
5 5 5 5
Chi-Square Statistic is: ∑(Oj 2
- Ej) / Ej = ∑(Oj 2
- 20) / 20 = ∑O j2 /50 - 2 ∑O j + 100.
j=1 j=1 j=1 j=1
Therefore, the Chi-Square Statistic is: 2169/20 - 100 = 8.45.

The number of degrees of freedom is: 5 - 1 = 4. 7.779 < 8.45 < 9.448.
⇒ Reject H0 at the 0.10 significance level, but not at the 0.05 significance level.
12.24. False. The maximum likelihood fit will usually have a relatively small Chi-Square statistic,
but it will usually not be the same as the distribution which has the smallest Chi-Square statistic.
Comment: As discussed in a subsequent section, one can fit distributions by minimizing the
Chi-Square Statistic, although this is not discussed in Loss Models.
12.25. E. Let N be the sample size.

Then the Chi-Square Statistic is the sum of (assumed - observed)2 /assumed:
(0.06N)2 /0.24N + 0 + (0.04N)2 /0.26N + (0.02N)2 /0.25N = 0.02275N.
There are 4 - 1 = 3 degrees of freedom. The 1% critical value is 11.345.
p-value is 1%. ⇒ 0.02275N = 11.345. ⇒ N = 499.
Comment: See “The Expert Mind,” by Philip E. Ross, Scientific American, August 2006.
12.26. D. S(x) = exp[-(x/7000)1/2], where x is the size of loss.

S(500) = 0.7655. S(1500) = 0.6294. S(5500) = 0.4121.
S(10,500) = 0.2938. S(25,000) = 0.1511.
Payments Losses Probability
(0, 1000] 500 to 1500 {S(500) - S(1500)} / S(500) = 0.1778.
(1,000, 5,000] 1500 to 5500 {S(1500) - S(5500)} / S(500) = 0.2839.
(5,000, 10,000] 5500 to 10,500 {S(5500) - S(10,500)} / S(500) = 0.1545.
(10,000, 24,500) 10,500 to 25,000 {S(10,500) - S(25,000)} / S(500) = 0.1864.
24,500 25,000 or more S(25,000) / S(500) = 0.1974.
Lower Upper Observed Expected Chi-
Limit Limit Number Probability Number Square
0 1,000 165 0.1777 177.7 0.908
1,000 5,000 292 0.2839 283.9 0.231
5,000 10,000 157 0.1545 154.5 0.039
10,000 24,500 200 0.1865 186.5 0.982
24,500 186 0.1974 197.4 0.658
1000 1000 2.818
(165 - 177.7)2 (292 - 283.9)2 (157 - 154.5 )2 (200 - 186.5 )2 (186 - 197.4 )2
+ + + +
177.7 283.9 154.5 186.5 197.4
= 2.818.
Comment: Note that the probabilities for the intervals add to unity, as they always should; the sum
of the observed and expected columns are equal.
The data has been truncated and shifted from below at 500 and censored from above at 25,000.
If we want to compare to the observed data, then we need to compare it to what the distribution
would be after truncation and shifting from below at 500. We need to compare apples to apples.
The new survival function after truncation is: S(x+d) / S(d) = S(x+500) / S(500), where x is the size
of payment after the deductible and x + 500 is the original size of loss.
In this case, S(x) is the survival function for the Weibull prior to the effect of the deductible.
α-1
∑
xi e- x
12.27. B. As per Theorem A.1 in Loss Models, for alpha integer, Γ(α ; x) = 1 - .
i!
i=0
For the Gamma Distribution: F(x) = Γ(α; x/θ).

α-1
(x / θ)i e- x / θ
.
i=0
Here we have a Gamma Distribution with α = 3 and θ = 5.

F(10) = 1 - e-2 (1 + 2 + 22 /2) = 0.3233.
F(25) = 1 - e-5 (1 + 5 + 52 /2) = 0.8753.
0 10 696 0.0000 0.3233 0.3233 646.6 3.77
10 25 1028 0.3233 0.8753 0.5520 1104.0 5.23
25 Infinity 276 0.8753 1.0000 0.1247 249.4 2.84
SUM 2000 2000 11.84
12.28. C. Since we observe losses that may be as big as 500, we have ω ≥ 500.
⎛ 50 ⎞ 30 ⎛ 100 - 50 ⎞ 50 ⎛ 250 - 100 ⎞ 130 ⎛ 500 - 250 ⎞ 190
Then the likelihood is: ,
⎝ω⎠ ⎝ ω ⎠ ⎝ ω ⎠ ⎝ ω ⎠
which is proportional to ω−400.

This is a decreasing function of ω, so in order to maximize the likelihood we want the smallest
possible ω, which is 500.
For a uniform distribution from 0 to 500, the expected number of losses for the four intervals are:
40, 40, 120, and 200.
For example, we expect 1/10 of the losses to be in the interval (0, 50); (1/10)(400) = 40.
Therefore, the chi-square statistic is:
(30 - 40)2 (50 - 40)2 (130 - 120) 2 (190 - 200) 2
+ + + = 6.333.
40 40 120 200
There were four intervals, and one fitted parameter, so the number of degrees of freedom is:
4 - 1 - 1 = 2.
For 2 degrees of freedom, the 5% critical value is 5.991 and the 2.5% critical value is 7.378.
Since 5.991 < 6.333 < 7.378, we reject H0 at 5% but not at 2.5%.
12.29. C. F(1000) = 1 - Exp[-1000/1997] = 0.3939. F(2000) = 1 - Exp[-2000/1997] = 0.6327.

Lower Upper # Prob. in Expected Chi-
Limit Limit obs. F(bot)
F(top) Interval # claims Square
0 1000 70 0.3939 78.78 0.980
1000 2000 60 0.2387 47.75 3.143
2000 infinity 70 0.3673 73.47 0.163
200 1.0000 200.00 4.286
(78.78 - 70)2 (47.75 - 60)2 (73.47 - 70)2
+ + = 4.286.
78.78 47.75 73.47
One fitted parameter. ⇒ Degrees of freedom = 3 - 1 - 1 = 1.

3.841 < 4.286 < 5.024. ⇒ Reject at 5% and not at 2.5%.
12.30. D. The number of degrees of freedom is: 12 - 4 - 1 = 7.

For 7 degrees of freedom, the 5% critical value 14.067, and the 2.5% critical value is 16.013.
14.067 < 15.038 < 16.013. Thus reject H0 at 5% but not at 2.5%.
12.31. C. The total number of claims is 100.

For each interval the expected number of claims is: (100)(F(upper) - F(lower)).
For each interval compute: (expected - observed)2 / expected.
Bottom of Top of # claims Expected Chi
Interval
0 1000 55 0.000 0.535 53.5 0.04
1000 2500 15 0.535 0.770 23.5 3.08
2500 10000 22 0.770 0.953 18.3 0.77
10000 Infinity 8 0.953 1.000 4.7 2.30
100 100 6.19
For example, F(2500) = 1 - {1500/(1500+2500)}1.5 = 1 - 0.3751.5 = 1 - 0.230 = 0.770.
However, the final interval has fewer than 5 expected claims, so group it with the next to last interval:
Bottom of Top of # claims Expected Chi
Interval
0 1000 55 0.000 0.535 53.5 0.04
1000 2500 15 0.535 0.770 23.5 3.08
2500 Infinity 30 0.770 1.000 23.0 2.16
100 100 5.28
Comment: Since the question told us to have the minimum expected number of observations in
any group should be 5, we did so. This is one common rule of thumb.
12.32. B. ν = The number of degrees of freedom is:

# of intervals - 1 - # fitted parameters = 6 - 1 - 0 = 5.
t Observed Number Expected Number ((Observed - Expected)^2)/Expected
0 15 10 2.500
1 30 25 1.000
2 20 25 1.000
3 15 20 1.250
4 10 15 1.667
5 10 5 5.000
Sum 100 100 12.417
χ 2 - ν = 12.417 - 5 = 7.417.
Comment: There is no mention of having fit the survival model to this data.
12.33. B. For each of the six distinct size classes one computes:
(observed number of claims - fitted number of claims )2 / fitted number of claims
Lower Upper Observed Assumed (Observed - Fitted)^2
Endpoint Endpoint # claims F(lower) F(upper) # claims / Fitted =
Chi-Square
0 25000 3900 0.0000 0.5781 4047 5.3
25000 50000 1500 0.5781 0.7840 1441 2.4
50000 infinity 900 0.7840 1.0000 1080 30.0
50000 60000 200 0.7840 0.8285 89 138.2
60000 100000 300 0.8285 0.9213 186 70.7
100000 infinity 200 0.9213 1.0000 157 11.5
7000 7000 258.1
For example, F(25000) = 1 - (75000/(25000+75000))3 = 0.5781.
The expected number of claims in the interval from 25,000 to 50,000 is:
(7000){F(50000) - F(25000)} = (7000)(0.7840 - 0.5781) = 1441.
The total number of claims for State 2 is 2000. The number of claims in State 2 expected from
50,000 to 60,000 is: (2000){F(60000)-F(50000)} = 2000(0.8285 - 0.7840) = 89.
The total number of claims for State 1 is 5000. The number of claims in State 1 expected from
50,000 to ∞ is: (5000){F(∞)-F(50000)} = 5000(1 - 0.7840) = 1080.
The contribution to the Chi-Square from this interval is (200-89)2 / 89 = 138.
12.34. D. The Chi-square statistic is computed by taking the sum of:

(fitted - observed)2 / fitted numbers of claims. For each interval,
the fitted number of claims = {total number of claims}{F(upper) - F(lower)}.
For the Loglogistic, F(x) = (x/θ)γ / {1 + (x/θ)γ} = x/(2+x).
Group the last three given intervals together to form an interval from 6 to ∞, in order to get an
expected number of claims of at least 5.
Bottom Top # claims Fitted ((Observed - Fitted)^2)/Fitted
of of in the F(lower) F(upper) # claims equals Chi-Square
0 2 8 0.000 0.500 12.50 1.62
2 6 5 0.500 0.750 6.25 0.25
6 Infinity 12 0.750 1.000 6.25 5.29
SUM 25.00 25.00 7.16
Comment: Since the question told us to have the minimum expected number of observations in
any group should be 5, we did so. This is one common rule of thumb.
12.35. B. For each of the six distinct size classes one computes:
(observed number of claims - fitted number of claims )2 / fitted number of claims.
Lower Upper Observed Fitted Chi
0 10000 7330 0.0000 0.4898 7346.94 0.039
10000 25000 3970 0.4898 0.7500 3903.06 1.148
25000 infinity 2430 0.7500 1.0000 2500.00 1.960
25000 35000 405 0.7500 0.8264 381.94 1.392
35000 50000 325 0.8264 0.8889 312.50 0.500
50000 infinity 540 0.8889 1.0000 555.56 0.436
15000 15000 5.47
For example, F(10000) = 1 - {25000/(10000+25000)}2 = 0.4898. The fitted number of claims in
the interval from 10,000 to 25,000 is (15000){F(25000) - F(10000)} = (15000)(0.7500 - 0.4898) =
3903.1. The fitted number of claims in New Jersey from 25,000 to 35,000 is:
(5000){F(35000) - F(25000)} = 5000(0.8264 - 0.7500) = 381.94.
The contribution to the Chi-Square from this interval is (405 - 381.94)2 / 381.94 = 1.392.
12.36. D. For each interval we compute (observed - fitted)2 / fitted.

There are four intervals, so that the degrees of freedom are 4 - 1 = 3.
Since 9.348 > 9.2 we canʼt reject at the 2.5% significance level.
Since 7.815 < 9.2 we can reject at the 5% significance level.
Lower Upper obser- Fitted Chi-
1 1.333 3 0.000 0.250 5.0 0.80
1.333 2 6 0.250 0.500 5.0 0.20
2 4 10 0.500 0.750 5.0 5.00
4 infinity 1 0.750 1.000 5.0 3.20
20 20 9.20
Comment: Go to the row for 3 degrees of freedom, and find where 9.2 is bracketed:
7.815 < 9.2 < 9.348. Reject at the significance level of the column to the left, 5%, and do not reject
to the right, 2.5%. The distribution is a Single Parameter Pareto distribution.
12.37. A. F(x) = 1 - (1000/(1000+x))2 .

Chi-Square goodness-of-fit statistic is the sum of terms for each interval:
(observed number of claims - fitted number of claims )2 / fitted number of claims.
Bottom of Top of F[Top of Fitted # Observed # Chi-
Interval Interval Interval] of Claims of Claims Square
0 250 0.360 3.600 3 0.1000
250 500 0.556 1.960 2 0.0008
500 1000 0.750 1.940 3 0.5792
1000 ∞ 1.000 2.500 2 0.1000
Sum 10.000 10 0.7800
The fitted number of claims in an interval is 10 times the difference of the Distribution Function at the
top and bottom of the interval.
For example, the fitted number of claims for the third interval is 10(0.750 - 0.556) = 1.940.
The Chi-Square contribution for the third interval is: (3 - 1.94)2 / 1.94 = 0.5792.
Comment: We use these intervals to compute the Chi-Square statistic, as we were told to do.
12.38. B. For each interval we compute (observed - fitted)2 / fitted.

As computed below, the sum of the contributions of the intervals is 9.50.
There are five intervals, and we have fit two parameters using minimum Chi-Square, so that the
degrees of freedom are 5 - 1 - 2 = 2. (Number of degrees of freedom = the number of intervals
minus one, minus one degree of freedom for every parameter fit, when we fit via minimum Chi-
square.)
Since 10.597 > 9.50, we do not reject at the 0.5% significance level.
Since 9.210 < 9.50, we can reject at the 1% significance level.
0 3 180 0.000 0.184 184.5 0.11
3 7.5 180 0.184 0.387 202.4 2.47
7.5 15 235 0.387 0.601 213.9 2.08
15 40 255 0.601 0.872 271.4 0.99
40 infinity 150 0.872 1.000 127.8 3.85
1000 9.50
12.39. E. This an Exponential Distribution, so F(x) = 1 - e-x/5. The Chi-square statistic is computed
by taking the sum over the intervals of the squared differences of the fitted and observed numbers
of claims divided by the expected number of claims:
(Fitted-Observed)2 / Fitted. For each interval the fitted number of claims is:
{total number of claims}{F(upper) - F(lower)} .
Interval
0 1 15 0.000 0.181 18.1 0.54
1 5 40 0.181 0.632 45.1 0.57
5 10 20 0.632 0.865 23.3 0.46
10 15 15 0.865 0.950 8.6 4.86
15 ∞ 10 0.950 1.000 5.0 5.06
100 11.49
12.40. D. f(x) = 1/x2 , x > 1, therefore integrating gives F(x) = 1 - 1/x, x > 1.
The Chi-square statistic is computed by taking the sum of the contributions from each interval:
(fitted - observed)2 / fitted numbers of claims.
For each interval the fitted number of claims = {total number of claims}{F(upper) - F(lower)}
Bottom of Top of # claims in Fitted Chi
Interval Interval the Interval F(lower) F(upper) # claims Square
1 1.333 16 0.000 0.250 10.0 3.60
1.333 2 10 0.250 0.500 10.0 0.00
2 4 10 0.500 0.750 10.0 0.00
4 ∞ 4 0.750 1.000 10.0 3.60
40 40 7.20
We have 4 intervals, so we have 4 - 1 = 3 degrees of freedom.
Since 7.20 < 7.815, we do not reject at 5%. Since 6.251 < 7.20, we reject at 10%.
Comment: f(x) is the probability density function for a Single Parameter Pareto Distribution.
Since there is no mention of having fit the curve to this data, we do not decrease the degrees of
freedom by the number of fitted parameters.
Whenever we are doing a Chi-Square goodness of fit test, we compare the expected number of
items in each interval to the observed number of items in each interval.
In that case, if the null hypothesis is true, the statistic has (approximately) a Chi-Square Distribution.
If instead we were to compare the expected dollars in each interval to the observed number of
dollars of each interval, that would be a different test.
I do not know what the distribution of this different test statistic would be.
(If the assumed size of loss distribution had a finite second moment, perhaps someone could figure
out what the limit of the test statistic would be as the sample size went to infinity.)
12.41. E. There is no mention of having fit any parameters,

ν = the number of degrees of freedom is: # of intervals - 1 - # fitted parameters = 4 - 1 - 0 = 3.
time Observed S(upper end) Probability Expected Chi-Square
Number Covered by Interval Number
0 to 1 21 0.9 0.1 15 2.400
1 to 2 27 0.7 0.2 30 0.300
2 to 3 39 0.4 0.3 45 0.800
3 to 4 63 0 0.4 60 0.150
Sum 150 1 150 3.650
χ2
= 3.65/3 = 1.22.
ν
12.42. D. Since the data is truncated from below, we need to adjust the distribution function.
G(x) = {F(x) - F(3000)}/S(3000), where F is for a Pareto with α = 2 and θ = 25,000.
The Chi-square statistic is computed as usual by taking the sum of:
(fitted - observed)2 / fitted numbers of claims.
For each interval the fitted number of claims = {total number of claims}{G(upper) - G(lower)}.
Bottom Top # claims Fitted Chi
of of in the F(lower) F(upper) G(lower) G(upper) # claims Square
Interval Interval interval
3000 5000 6 0.2028 0.3056 0.0000 0.1289 12.9 3.68
5000 10000 29 0.3056 0.4898 0.1289 0.3600 23.1 1.50
10000 25000 39 0.4898 0.7500 0.3600 0.6864 32.6 1.24
25000 ∞ 26 0.7500 1.0000 0.6864 1.0000 31.4 0.92
100 100 7.34
There are 4 intervals, so we have 4 - 1 = 3 degrees of freedom.
Since 6.251< 7.34 < 7.815, we do not reject at 5% and reject at 10%.
12.43. Since the Pareto was not fit to the data set to which we are comparing, subtract no fitted
parameters. ⇒ The number of degrees of freedom is # of intervals - 1 = 5 - 1 = 4.
For each interval compute: (observed # of claims - assumed # of claims)2 / assumed #.
Lower Upper obser- Assumed Chi-
0 2000 39 0.0000 0.3661 36.61 0.16
2000 4000 22 0.3661 0.5688 20.27 0.15
4000 8000 17 0.5688 0.7700 20.12 0.48
8000 15000 12 0.7700 0.8988 12.89 0.06
15000 infinity 10 0.8988 1.0000 10.12 0.00
100 0.85
F(x) = 1 - (1 + x/10000)-2.5. For example, F(2000) = 1 - (1 + 2000/10000)-2.5 = 0.3661.
(100)(0.5688 - 0.3661) = 20.27. (22 - 20.27)2 / 20.27 = 0.15.
Comment: Since 0.85 < 7.779, do not reject the Pareto at 10%. The method of moments Pareto fit
to this data is α = 2.780 and θ = 11,884, as determined in the solution to Course 4 Sample Exam,
Q.8. This would have 5 - 1 - 2 = 2 degrees of freedom and a chi-square statistic of 1.45.
Since 1.45 < 4.605 we would not reject this Pareto at 10%.
0 2000 39 0.0000 0.3511 35.11 0.43
2000 4000 22 0.3511 0.5536 20.25 0.15
4000 8000 17 0.5536 0.7609 20.73 0.67
8000 15000 12 0.7609 0.8966 13.57 0.18
15000 infinity 10 0.8966 1.0000 10.34 0.01
100 1.45
12.44. A. There are 30 observed, so for an interval to have at least 5 expected claims, it must
cover at least: 5/30 = 1/6 = 16.67%. The largest number of groups that accomplishes this is: 0 to
500, 500 to 2498, 2498 to 4876, and 4876 to ∞.
Each contribution from an interval is: (assumed - observed)2 /assumed.
Lower Upper obser- Prob. in Assumed Chi-
Limit Limit vations F(lower) F(upper) Interval # claims Square
0 500 3 0.00 0.27 0.27 8.1 3.21
500 2498 8 0.27 0.55 0.28 8.4 0.02
2498 4876 9 0.55 0.81 0.26 7.8 0.18
4876 infinity 10 0.81 1.00 0.19 5.7 3.24
30 30.0 6.66
(8.1 - 3)2 /8.1 + (8.4 - 8)2 /8.4 + (7.8 - 9)2 /7.8 + (5.7 - 10)2 /5.7 = 6.66.
Alternately, χ2 = Σ (Oi2 / Ei) - n = 32 /8.1 + 82 /8.4 + 92 /7.8 + 102 /5.7 - 30 = 6.66.
12.45. C. There are 5 - 1 = 4 degrees of freedom.

Lower Upper obser- Expected Chi-
Limit Limit vations F(lower) F(upper) Number Square
0 2 5 0.0000 0.0350 10.5 2.881
2 5 42 0.0350 0.1300 28.5 6.395
5 7 137 0.1300 0.6300 150.0 1.127
7 8 66 0.6300 0.8300 60.0 0.600
8 ∞ 50 0.8300 1.0000 51.0 0.020
300 300 11.022
(5 - 10.5)2 /10.5 + (42 - 28.5)2 /28.5 + (137 - 150)2 /150 + (66 - 60)2 /60 + (50 - 51)2 /51 = 11.022.
Since 9.488 < 11.022 ≤ 11.143, reject at 5% and do not reject at 2.5%.
20 20
12.46. E. ∑O j = ∑Ej = 1000.
j=1 j=1
Since the intervals are of the same length and we are assuming a uniform distribution, the expected
number for each interval are the same. Ej = 1000/20 = 50 for all j.
20 20 20 20
Chi-Square Statistic is: ∑(Oj 2
- Ej) / Ej = ∑(Oj 2
- 50) / 50 = ∑ 2
O j /50 - 2 ∑O j + 1000.
j=1 j=1 j=1 j=1
Therefore, the Chi-Square Statistic is: 51,850/50 - 1000 = 37.

The number of degrees of freedom is: 20 - 1 = 19.
36.191 < 37 < 38.582.
⇒ Reject H0 at the 0.01 significance level, but not at the 0.005 significance level.
Alternately, χ2 = Σ(Oi2 / Ei) - n = ΣOi2 /50 - 1000 = 51,850/50 - 1000 = 37. Proceed as before.
Comment: There are no fitted parameters.
This is a case where using a 1% significance level we would make a Type I error; the simulated
sample was sufficiently unusual that we rejected the null hypothesis even though it was true.
12.47. C. The expected values are each equal to: 120/6 = 20.
Chi-Square statistic is:
(15 - 20)2 (13 - 20)2 (28 - 20)2 (25 - 20)2 (12 - 20)2 (27 - 20)2
+ + + + + = 13.80.
20 20 20 20 20 20
Since 12.83 < 13.80 < 15.09, reject at 2.5% but not 1%.
Comment: Using a computer the p-value is 1.69%.
2016-C-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/22/15, Page 413
Section 13, The Likelihood Ratio Test163
As discussed previously, the larger the likelihood or loglikelihood the better the fit. One can use the
Likelihood Ratio Test to test whether a fit is significantly better.
For example we previously fit via Maximum Likelihood both a Transformed Gamma and a Weibull
to the ungrouped data in Section 2. Since the Weibull is a special case of a Transformed Gamma,
the Transformed Gamma has a larger maximum likelihood. The loglikelihoods were -1748.98 for the
Transformed Gamma and -1753.04 for the Weibull.
The test statistic is twice the difference of loglikelihoods: (2){-1748.98 - (-1753.04)} = 8.12.
One compares the test statistic to the Chi-Square Distribution with one degree of freedom:
0.100 0.050 0.025 0.010 0.005

2.706 3.841 5.024 6.635 7.879
Since 8.12 > 7.879, at the 0.5% significance level we reject the null hypothesis that the data came
from the fitted Weibull (2 parameters), as opposed to the alternative hypothesis that the data came
from the fitted Transformed Gamma Distribution (3 parameters).
The Likelihood Ratio Test (or Loglikelihood Difference Test) proceeds as follows:164
1. One has two distributions, one with more parameters than the other, both fit to the same data
via Maximum Likelihood.
2. One of the distributions is a special case of the other.165
3. Compute twice the difference in the loglikelihoods.166 167
4. Compare the result of step 3 to the Chi-Square Distribution, with a number of degrees of
5. Draw a conclusion as to whether the more general distribution fits significantly better than
its special case.
H0 is the hypothesis that the distribution with fewer parameters is appropriate.

163
164
Note that the twice the difference of the loglikelihoods approaches a Chi-Square Distribution as the sample size
gets larger. Thus one should be cautious about drawing any conclusion concerning fits to small data sets.
165
This test is often applied when one distribution is the limit of the other. Loss Models at page 339 states that in
this case the test statistic has a mixture of Chi-Square Distributions.
166
Equivalently, one computes twice the log of the ratio of the likelihoods.
167
The factor of two is a constant, which as discussed subsequently, comes from the 1/2 in the exponent of the
density of the Standard Normal Distribution.
Exercise: Both a Weibull (2 parameters) and an Exponential Distribution (1 parameter) have been
fit to the same data via the Method of Maximum Likelihood. The loglikelihoods are -737 for the
Weibull and -740 for the Exponential. Use the Likelihood Ratio Test to determine whether the
Weibull fits this data significantly better than the Exponential.
[Solution: The Exponential is a special case of the Weibull, with one less parameter.
Therefore we compare to the Chi-Square Distribution with one degree of freedom. Twice the
difference of the loglikelihoods is 6. 5.024 < 6 < 6.635. Thus at 2.5% we reject the simpler
Exponential model, in favor of H1 that the more complex Weibull model is appropriate.
At the 1% level we do not reject H0 that the simpler Exponential model is appropriate.]
Exercise: A Weibull Distribution has been fit to some data via the Method of Maximum Likelihood.
The fitted parameters are θ^ = 247 and ^τ = 1.54 with loglikelihood of -737.
For θ = 300 and τ = 2, the loglikelihood is -741.

Use the likelihood ratio test in order to test the hypothesis that θ = 300 and τ = 2.
[Solution: The Weibull θ = 300 and τ = 2 is a special case of the general Weibull, with two less
parameters. Therefore we compare to the Chi-Square Distribution with two degrees of freedom.
Twice the difference of the loglikelihoods is 8. 7.378 < 8 < 9.210. Thus at 2.5% we reject the
hypothesis that θ = 300 and τ = 2, while at the 1% level we do not reject.
Comment: See Example 16.9 in Loss Models.]
Testing Other Hypothesis:168
One can use the likelihood ratio in order to test other hypotheses. For example, one can test
hypotheses involving restrictions on the relationships of the parameters of the distributions of two
related data sets, such as in the following example.
Phil and Sylvia are competitors in the light bulb business.169

You were able to test 20 of Philʼs bulbs and 10 of Sylviaʼs bulbs:
Number of Bulbs Total Lifetime Average Lifetime
Phil 20 20,000 1000
Sylvia 10 15,000 1500
You assume that the distribution of the lifetime (in hours) of a light bulb is Exponential.
168
The general subject of hypothesis testing is discussed in a subsequent section.
169
See 4, 11/00, Q.34, in the section on Maximum Likelihood.
Exercise: Using maximum likelihood, separately estimate θ for Philʼs Exponential Distribution.
What is the corresponding maximum loglikelihood?
[Solution: For an Exponential Distribution, ln f(x) = -x/θ - ln(θ).
The loglikelihood for Philʼs bulbs is: Σ{-xi /θ - ln(θ)} = (-1/θ)Σxi - nln(θ) = -20000/θ - 20ln(θ).
0 = 20000/θ2 - 20/θ. θ = 20000/20 = 1000. (The same result as the method of moments.)
The maximum loglikelihood is: -20000/1000 - 20ln(1000) = -158.155.]
Similarly, if we separately estimate θ for Sylviaʼs Exponential Distribution, θ = 1500.

The corresponding maximum loglikelihood is: -15000/1500 - 10ln(1500) = -83.132.170
Sylvia advertises that her light bulbs burn twice as long as Philʼs.
Using maximum likelihood applied to all the data, estimate θ for Philʼs Exponential Distribution
restricted by Sylviaʼs claim. What is the corresponding maximum loglikelihood?
[Solution: For the Exponential Distribution, f(x) = e-x/θ/θ. ln f(x) = -x/θ - ln(θ).
Assuming θS = 2θP, then the loglikelihood is: ∑ {-xi / θP - ln(θ P)} + ∑ {-xi / 2θ P - ln(2θP)}
Phil Sylvia
= -20,000/θP - 20ln(θP) - 15,000/ (2θP) - 10ln(2θP).
0 = 20,000/θP2 - 20/θP + 15,000/ 2θP2 - 10/θP.
θP = (20,000 + 15,000/2)/(20 + 10) = 917. θS = 2θP = 1834.

The maximum loglikelihood is: -20000/917 - 20ln(917) - 15000/1834 - 10ln(1834) =
-21.810 - 136.422 - 8.179 - 75.143 = -241.554.]
It is not surprising that without the restriction we can do a somewhat better job of fitting the data. The
unrestricted model involves two Exponentials, while the restricted model is a special case in which
one of the Exponentials has twice the mean of the other.
Let the null hypothesis H0 be that Sylviaʼs light bulbs burn twice as long as Philʼs. Let the alternative
H1 be that H0 is not true. Then we can use the likelihood ratio test as follows.
170
In general, the maximum loglikelihood for an Exponential and n ungrouped data points is: -n(1 + ln( x )).
We use the loglikelihood for the restricted model of -241.554 and the loglikelihood for the
unrestricted model of -241.287.
The test statistic is as usual twice the difference in the loglikelihoods:

(2) {-241.287 - (-241.554)} = 0.534.
We compare to the Chi-Square with one degree of freedom, since the restriction is one
dimensional. Since 0.534 < 2.706, we do not reject H0 at 10%.
One reason we did not reject Sylviaʼs claim was due to the small sample size.
Exercise: Redo the above example with the following different data:
Phil 200 200,000 1000
Sylvia 100 150,000 1500
[Solution: Separate estimate of θ for Philʼs Exponential Distribution, θ = 1000.
The corresponding maximum loglikelihood is: -200000/1000 - 200ln(1000) = -1581.55.
Separate estimate of θ for Sylviaʼs Exponential Distribution, θ = 1500.
Restricted by Sylviaʼs claim, θP = (200000 + 150000/2)/(200 + 100) = 917. θS = 2θP = 1834.
The maximum loglikelihood is: -200000/917 - 200ln(917) - 150000/1834 - 100ln(1834) =
-2415.54. Unrestricted loglikelihood is: -1581.55 - 831.32 = -2412.87.
Twice the difference in the loglikelihoods: (2) {-2412.87 - (-2415.54)} = 5.34.
Compare to the Chi-Square with one degree of freedom.
Comment: The sample size was 10 times as large and so were all the loglikelihoods.]
As another example, recall that the Pareto fit by Maximum Likelihood to the Ungrouped Data in
Section 2 had parameters α = 1.702 and θ = 240,151 and loglikelihood of -1747.87.
The Ungrouped Data in Section 2 had a mean of 312,675. If we were to restrict our review to
Pareto Distributions with this mean, then θ = 312,675(α - 1).171 With this restriction, the maximum
likelihood Pareto has α = 2.018 and θ = 318,303 with loglikelihood of -1748.12.
We test the hypothesis that the mean is 312,675 versus the alternative that it is not, by the
likelihood ratio test.172 The test statistic is: (2) {-1747.87 - (-1748.12)} = 0.50. Since the restriction is
one dimensional, we compare to a Chi-Square with 1 degree of freedom.
Since 0.50 < 2.706, at 10% we do not reject the hypothesis that the mean is 312,675.
171
Such that the mean θ/(α−1) = 312675. This is a special case of the Pareto, with one fewer parameter.
172
A Practical Technique of Deciding Which of Many Distributions to Use:173
Assume different distributions with various numbers of parameters have been fit via maximum
likelihood to the same data. For each number of parameters, the best fit is that with the largest
loglikelihood. Choose a significance level.
1. Compare the loglikelihood of the best one parameter distribution with

the best two parameter distribution, using the likelihood ratio test.174
2. Compare the best 3 parameter distribution to the result of step 1, using the likelihood ratio test.
3. Compare the best four parameter distribution (if any) to the result of step 2,
using the likelihood ratio test.
4. etc.
Exercise: Distributions have been fit to the same data via maximum likelihood.
Distribution Negative Loglikelihood Number of Parameters
Burr 4508.32 3
Exponential 4511.07 1
Gamma 4509.82 2
Generalized Pareto 4508.56 3
Inverse Burr 4507.93 3
Inverse Exponential 4513.12 1
Inverse Gamma 4509.41 2
Inverse Transformed Gamma 4507.59 3
LogLogistic 4510.03 2
LogNormal 4508.95 2
Mixture of 2 Exponentials 4509.02 3
Mixture of 3 Exponentials 4505.91 5
Mixed Exponential-Pareto 4506.10 4
Mixed Weibull-Pareto 4504.82 5
Pareto 4510.38 2
Pareto with α = 2 4512.60 1
Transformed Beta 4507.04 4
Weibull 4510.18 2
Weibull with τ = 0.5 4510.74 1
Based on this information, at a significance level of 5%, using the above technique, which distribution
provides the best fit?
173
174
Even though there is no theorem to justify it, actuaries often use the Likelihood Ratio Test to compare fits of
distributions with different numbers of parameters, even when one is not a special case of the other.
[Solution: The best one parameter distribution is the Weibull with τ = 0.5, with a loglikelihood of
-4510.74. The best two parameter distribution is the LogNormal, with a loglikelihood of -4508.95.
Twice the difference in the loglikelihoods is: (2)(1.79) = 3.58. The critical value at 5% for one degree
of freedom is 3.841. 3.58 ≤ 3.841. ⇒ Use the simpler Weibull with τ = 0.5.
The best three parameter distribution is the Inverse Transformed Gamma, with a loglikelihood of
-4507.59. Twice the difference in the loglikelihoods between the Inverse Transformed Gamma and
the Weibull with τ = 0.5 is: (2)(3.15) = 6.30. The critical value at 5% for two degrees of freedom is
5.991. 6.30 > 5.991. ⇒ Use the Inverse Transformed Gamma.

The best four parameter distribution is the Mixed Exponential-Pareto, with a loglikelihood of
-4506.10. Twice the difference in the loglikelihoods between the Mixed Exponential-Pareto and the
Inverse Transformed Gamma is: (2)(1.49) = 2.98. 2.98 ≤ 3.841.
⇒ Use the simpler Inverse Transformed Gamma.
The best five parameter distribution is the Mixed Weibull-Pareto, with a loglikelihood of -4504.82.
Twice the difference in the loglikelihoods between the Mixed Weibull-Pareto and
the Inverse Transformed Gamma is: (2)(2.77) = 5.54.
5.54 ≤ 5.991. ⇒ Use the simpler Inverse Transformed Gamma.
Thus the Inverse Transformed Gamma is the best fit.
Comment: In this practical technique, at each stage we apply the likelihood ratio test, even when we
do not have a special case or a limit.]
Derivation of the Likelihood Ratio Test:175
Under regularity conditions, maximum likelihood estimates asymptotically have a Multivariate Normal
distribution with Covariance Matrix V, and the likelihood function asymptotically has a Multivariate
Normal distribution.176
The likelihood is proportional to: exp[-(1/2)(t - θ)ʼ V-1 (t - θ)],

where t is the vector of maximum likelihood parameters,
θ is a vector of values of these parameters, probably different than those that produce the maximum
likelihood,
and V is the covariance matrix.177
175
The derivation is not on the syllabus. See for example, Kendallʼs Advanced Theory of Statistics, Volume 2.
176
Asymptotically, means as the sample size approaches infinity.
177
See Equation 23.25 in Kendallʼs Advanced Theory of Statistics, Volume 2.
For H1 , the unrestricted case, the maximum likelihood set of parameters is t.178
The likelihood for θ = t is proportional to: exp[-(1/2)(t - t)ʼ V-1 (t - t)] = e0 = 1.
H0 is the restricted case, where for example some of the parameters are fixed.179
Call the maximum likelihood vector of parameters for the restricted case r.
The likelihood for θ = r is proportional to: exp[-(1/2)(t - r)ʼ V-1 (t - r)].
Therefore, the ratio of the maximum likelihood in the unrestricted case to the maximum likelihood in
the restricted case is:
1/exp[-(1/2)(t - r)ʼ V-1 (t - r)] = exp[(1/2)(t - r)ʼ V-1 (t - r)].
The likelihood ratio test statistic is twice the log of this ratio, which is: (t - r)ʼ V-1 (t - r).
Since for large samples, the vector of maximum likelihood parameters, t, has a Multivariate
Normal distribution, the quadratic form (t - r)ʼ V-1 (t - r) has what is called a NonCentral
Chi-Square Distribution.180
If H0 is true, then since maximum likelihood is asymptotically unbiased, the expected values of the
vectors t and r are equal, and the likelihood ratio test statistic, (t - r)ʼ V-1 (t - r), is a (central) Chi-Square
Distribution. The number of degrees of freedom is equal to the difference in the dimensionality of the
unrestricted and restricted cases, or the number of parameters fixed in the restricted case.
Note that this result was derived assuming a large sample. As the sample size approaches infinity,
the likelihood ratio test statistic approaches a Chi-Square Distribution.181
178
H1 might be for example that the distribution is a Transformed Gamma, with 3 parameters.
179
H0 might be for example that the distribution is an Exponential, which is a special case of a Transformed Gamma
with α = 1 and τ = 1.
180
See Section 23.6 of Kendallʼs Advanced Theory of Statistics, Volume 2.
181
See “Mahlerʼs Guide to Simulation” for an example of estimating the p-value via simulation for the Likelihood Ratio
Test for a sample of size 100.
Problems:

• Define the “Holmes Distribution” to be: F(x) = 1 - (1 + x/θ)-2, x > 0.
• Various distributions have each been fit to the same data via Maximum Likelihood.
• The loglikelihoods are as follows:
Distribution Number of Parameters Loglikelihood
Transformed Beta 4 -2582
Transformed Gamma 3 -2583
Generalized Pareto 3 -2585
Weibull 2 -2586
Pareto 2 -2587
Holmes 1 -2588
Exponential 1 -2592
13.1 (1 point) Based on the likelihood ratio test, one tests the hypothesis H0 that the Pareto
Distribution is an appropriate model versus the alternative hypothesis H1 that the Generalized
Pareto Distribution is an appropriate model for this data.
13.2 (2 points) Based on the likelihood ratio test, one tests at the 1% significance level.
1. The Generalized Pareto Distribution is a more appropriate model for this data than the
2. The Pareto Distribution is a more appropriate model for this data than the
3. The Holmes Distribution is a more appropriate model for this data than the
A. 1 B. 2 C. 3 D. 1, 2, 3 E. None of A,B,C, or D.
13.3 (1 point) Based on the likelihood ratio test, one tests the hypothesis H0 that the Holmes
Distribution is an appropriate model versus the alternative hypothesis H1 that the Pareto Distribution
is an appropriate model for this data. Which of the following is true?
13.4 (1 point) Based on the likelihood ratio test, one tests the hypothesis H0 that the Generalized
Pareto Distribution is not a better model for this data than the Holmes Distribution.
13.5 (1 point) Based on the likelihood ratio test, one tests the hypothesis H0 that the Weibull
Distribution is an appropriate model versus the alternative hypothesis H1 that the Transformed
Gamma Distribution is an appropriate model for this data. Which of the following is true?
13.6 (1 point) Based on the likelihood ratio test, one tests the hypothesis H0 that the Exponential
Distribution is an appropriate model versus the alternative hypothesis H1 that the Transformed
Gamma Distribution is an appropriate model for this data.
13.7 (1 point) Based on the likelihood ratio test, one tests the hypothesis H0 that the Weibull
Distribution is not a better model for this data than the Exponential Distribution.

You are given the following 4 claims: 100, 500, 2000, 10,000.
You fit a LogNormal Distribution to these claims via maximum likelihood.
13.8 (2 points) What is the fitted value of µ?
13.9 (1 point) What is the fitted value of σ?
13.10 (2 points) What is the maximum loglikelihood?
13.11 (3 points) Determine the result of using the likelihood ratio test in order to test the hypothesis
that µ = 5 and σ = 1.5.
13.12 (5 points) You have the following data from three states:
State Number of Claims Aggregate Losses Average Size of Loss
Bay 3000 3,000,000 1000
Empire 6000 7,500,000 1250
Granite 1500 1,125,000 750
The size of claim distribution for each state is Exponential.
Let H0 be the hypothesis that the mean claim size for Empire State is 1.3 times that for Bay State
and 1.6 times that for Granite State.
Based on the likelihood ratio test, one tests the hypothesis H0 . Which of the following is true?

You are given the following 5 claims: 40, 150, 230, 400, 770.
You assume the size of loss distribution is Gamma.
13.13 (3 points) You fit a Gamma Distribution with α = 3 to these claims via maximum likelihood.
What is the maximum loglikelihood?
A. less than -34.8
B. at least -34.8 but less than -34.7
C. at least -34.7 but less than -34.6
D. at least -34.6 but less than -34.5
E. at least -34.5
13.14 (3 points) Determine the result of using the likelihood ratio test in order to test the hypothesis
that α = 3 and θ = 200 versus the alternative α = 3.
13.15 (1 point) A LogNormal Distribution has been fit via maximum likelihood to some data.
The corresponding loglikelihood is -2067.83.
Then restricting the mean to be 1000, another LogNormal Distribution has been fit via maximum
likelihood to the same data. The corresponding loglikelihood is -2072.02.
Let H0 be that the mean of the population that produced this data is 1000.
Let H1 be that the mean of the population that produced this data is not 1000.
Region Number of Claims Aggregate Losses Average Size of Claim
Rural 5000 500,000 100
Urban 10,000 1,250,000 125
You assume that the distribution of sizes of claims is exponential.
Based on data from other states, you assume that the mean claim size for Urban insureds is 1.2
times that for Rural insureds.
Let H0 be the hypothesis that the mean claim size in West Carolina for Urban is 1.2 times that for
Rural.
Let H1 be that H0 is not true.
Using the likelihood ratio test, one tests the hypothesis H0 . Which of the following is true?
13.17 (3 points) Let X1 , X2 , ..., Xn be a sample from a Normal Distribution.

Let H0 : Normal with µ = 0, and σ = 1.
Let H1 : Normal with mean µ, and σ = 1.

Demonstrate that in this case the likelihood ratio test statistic has a Chi-Square Distribution with one
degree of freedom, if H0 is true.

(i) 500 claim amounts are randomly selected from a Pareto distribution with θ = 10 and unknown α.
(ii) Σ ln(xi + 10) = 1485.
You use the likelihood ratio test to test the hypothesis that θ = 10 and α = 1.7.
13.19 (3 points) Distributions have been fit to the same data via maximum likelihood.
Distribution Negative Loglikelihood Number of Parameters
Burr 3020.65 3
Exponential 3023.88 1
Gamma 3021.05 2
Generalized Pareto 3021.82 3
Inverse Burr 3019.25 3
Inverse Exponential 3031.71 1
LogNormal 3024.20 2
Pareto 3022.87 2
Weibull 3022.04 2
At a significance level of 1%, which distribution provides the best fit?
13.20 (3 points) You observe the following 10 losses from state A:

10, 79, 87, 22, 18, 34, 73, 70, 58, 69. The sum is 520.
You observe the following 10 losses from state B:
48, 125, 100, 170, 133, 56, 131, 87, 205, 105. The sum is 1160.
Let H0 : The losses from both states follow the same Exponential Distribution.
Let H1 : The losses from each state follow an Exponential Distribution.
Using the likelihood ratio test, which of the following are true?
(A) Do not reject H0 at the 0.10 level of significance.
(B) Reject H0 at the 0.10 level of significance, but not at the 0.05 level of significance.
(C) Reject H0 at the 0.05 level of significance, but not at the 0.025 level of significance.
(D) Reject H0 at the 0.025 level of significance, but not at the 0.01 level of significance.
(E) Reject H0 at the 0.01 level of significance.
13.21 (1 point) You fit via maximum likelihood an Exponential and Weibull Distribution to the same
data set. The loglikelihood for the fitted Exponential is -1034.71. The loglikelihood for the fitted
Weibull is -1031.30. Based on the likelihood ratio test, test the hypothesis H0 that the Exponential
Distribution is an appropriate model versus the alternative hypothesis H1 that the Weibull
Distribution is an appropriate model for this data.
(A) Reject H0 at the 0.005 significance level.
(B) Reject H0 at the 0.010 significance level, but not at the 0.005 level.
(C) Reject H0 at the 0.025 significance level, but not at the 0.010 level.
(D) Reject H0 at the 0.050 significance level, but not at the 0.025 level.
(E) Do not reject H0 at the 0.050 significance level.

(i) A random sample of 100 losses from a Inverse Gamma distribution.
(ii) The maximum likelihood estimates are α^ = 3.075 and θ^ = 994.

(ii) The natural logarithm of the likelihood function evaluated at the maximum likelihood estimates is
-686.084.
(iii) When α = 4, the maximum likelihood estimate of θ is 1293.
(iv) Σln(xi) = 594.968.
(v) Σxi = 50,383.

(vi) Σ 1/xi = 0.309383.
(vii) You use the likelihood ratio test to test the hypothesis
H0 : α = 4.
H1 : α ≠ 4.
13.23 (2 points) A sample of size 2500 is drawn from an Exponential Distribution with mean θ.
The sum of the sample is 120,000.
Use the Likelihood Ratio Test in order to test whether θ = 50.
13.24 (4 points) You that assume a data set of size n came from an Inverse Gaussian with θ = 13.
You use the likelihood ratio test in order to test H0 : µ = µ0 versus H1 : µ ≠ µ0 .
Let X = the sample mean.

13 n
Show that the likelihood ratio test statistic has the form: ( X - µ0 )2 .
X µ0 2

• A random sample of losses from an Inverse Weibull distribution is:
0.1 0.3 0.6 0.8
13.25 (3 points) For τ = 2, determine the maximum loglikelihood.
13.26 (3 points) The maximum likelihood estimates are θ^ = 0.2277 and τ^ = 1.2433.
You use the likelihood ratio test to test the hypothesis
H0 : τ = 2.
H1 : τ ≠ 2.
13.27 (3 points) The following random sample is from an Exponential Distribution:

100 150 200 300 500
The likelihood ratio test is used to test the null hypothesis that the mean of the distribution is θ0 .
For which values of θ0 would one reject the null hypothesis at a 5% significance level?
Use a computer to help you.
13.28 (4 points) For a data set of size 100:

100 100
∑ xj = 658 ∑ xj2 = 4927

j=1 j=1
H0 : This data was drawn from a Normal Distribution with µ = 6.5 and σ = 2.
H1 : This data was drawn from the maximum likelihood Normal Distribution.
Determine the result of applying the Iikelihood ratio test.
13.29 (1 point) A three-parameter distribution is fit via maximum Iikelihood.

The corresponding loglikelihood is -1042.920.
For α = 4, τ = 5 and θ = 100, the loglikelihood is -1050.031.
H0 : This data was drawn from the distribution with α = 4, τ = 5 and θ = 100.
H1 : This data was drawn from the maximum likelihood fit.
Determine the result of applying the Iikelihood ratio test.

• For a random sample of 100 losses from a Weibull distribution:
100 100 100
∑ ln[xi] = 856.948 ∑ xi2/ 3 = 46,711.7 ∑ xi0.7805 = 142,329

i=1 i=1 i=1
100 100 100
∑ xi0.9 = 465,336 ∑ xi = 1,267,671. ∑ xi1.5 = 215,927,358.

i=1 i=1 i=1
• The maximum likelihood estimates are θ^ = 10,967 and τ^ = 0.7805.
13.30 (3 points) Determine the result of a likelihood ratio test:

H0 : τ = 1. H1 : τ ≠ 1.
(A) Do not reject H0 at the 5% level of significance.
(B) Reject H0 at the 5% level of significance, but not at the 2.5% level of significance.
(C) Reject H0 at the 2.5% level of significance, but not at the 1% level of significance.
(D) Reject H0 at the 1% level of significance, but not at the 0.5% level of significance.
(E) Reject H0 at the 0.5% level of significance.

H0 : τ = 2/3. H1 : τ ≠ 2/3.

H0 : θ = 10,000 and τ = 0.9. H1 : Not H0 .
13.33 (4, 11/03, Q.28 & 2009 Sample Q.22) (2.5 points)
You fit a Pareto distribution to a sample of 200 claim amounts and use the likelihood ratio test to test
the hypothesis that α = 1.5 and θ = 7.8.
You are given:
(i) The maximum likelihood estimates are α^ = 1.4 and θ^ = 7.6.

(ii) The natural logarithm of the likelihood function evaluated at the maximum likelihood
estimates is -817.92.
(iii) Σln(xi + 7.8) = 607.64.

(i) A random sample of losses from a Weibull distribution is:
595 700 789 799 1109
(ii) At the maximum likelihood estimates of θ and τ, Σ ln(f(xi)) = -33.05.
(iii) When τ = 2, the maximum likelihood estimate of θ is 816.7.

(iv) You use the likelihood ratio test to test the hypothesis
H0 : τ = 2.
H1 : τ ≠ 2.

(i) Twenty claim amounts are randomly selected from a Pareto distribution with α = 2 and unknown θ.
(ii) The maximum likelihood estimate of θ is 7.0.
(iii) Σ ln(xi + 7.0) = 49.01.
(iv) Σ ln(xi + 3.1) = 39.30.
You use the likelihood ratio test to test the hypothesis that θ = 3.1.
13.36 (CAS ST 5/14, Q.14) (2.5 points) You are given the following information about two loss
severity distributions fit to a sample of 275 closed claims:
• For the Exponential distribution, the natural logarithm of the likelihood function evaluated
at the maximum likelihood estimate is -828.37.
• For the Weibull distribution, the natural logarithm of the likelihood function evaluated
at the maximum likelihood estimate is -826.23.
• The Exponential distribution is a subset of the Weibull distribution.
Calculate the significance level at which the Weibull distribution provides a better fit than the
Exponential distribution.
A. Less than 0.5%
E. At least 5.0%
13.1. D. The Pareto is a special case of the Generalized Pareto, with one less parameter.
Twice the difference in the loglikelihoods is 2{-2585 -(-2587)} = 4. Since there is a difference of one
in the number of parameters we compare to the Chi-Square Distribution with one degree of
freedom. Since 3.841 < 4 < 5.024 we reject at 5% and do not reject at 2.5%.
13.2. A. 1. We test the hypothesis H0 that the Generalized Pareto Distribution is an appropriate
model versus the alternative hypothesis H1 that the Transformed Beta Distribution is an appropriate
model for this data. The Generalized Pareto is a special case of the Transformed Beta, with one
less parameter. Twice the difference in the loglikelihoods is: 2{-2582 -(-2585)} = 6. Since there is a
difference of one in the number of parameters we compare to the Chi-Square Distribution with one
degree of freedom. Since 5.024 < 6 < 6.635, we reject at 2.5% and do not reject at 1%. Thus
statement #1 is True. (The Transformed Beta is a better model than the Generalized Pareto at the
2.5% significance level. However, the Generalized Pareto is a better model than the Transformed
Beta at the 1% significance level.)
2. We test the hypothesis H0 that the Pareto Distribution is an appropriate model versus the
alternative hypothesis H1 that the Transformed Beta Distribution is an appropriate model for this
data. The Pareto is a special case of the Transformed Beta, with two less parameters. Twice the
difference in the loglikelihoods is: 2{-2582 -(-2587)} = 10. Since there is a difference of two in the
number of parameters we compare to the Chi-Square Distribution with two degrees of freedom.
Since 9.210 < 10 < 10.597 we reject at 1% and do not reject at 0.5%. Thus statement #2 is False.
3. The “Holmes” Distribution is a Pareto with alpha fixed at 2 and is a special case of the
Transformed Beta, with three less parameters. Twice the difference in the loglikelihoods is:
2{-2582 -(-2588)} = 12. Since there is a difference of three in the number of parameters we
compare to the Chi-Square Distribution with three degrees of freedom.
Since 11.345 < 12 < 12.838 we reject at 1% and do not reject at 0.5%. Thus statement #3 is
False. (Statement #3 would be true at a 0.5% significance level. The Transformed Beta is a better
model than the Holmes Distribution at the 1% significance level. However, the Holmes Distribution is
a better model than the Transformed Beta at the 0.5% significance level.)
Comment: I made up the name “Holmes Distribution” solely for the purposes of these questions. It
is a special case of the Pareto for alpha = 2, fixed.
13.3. E. The Holmes Distribution is a special case of the Pareto, with one less parameter. Twice
the difference in the loglikelihoods is 2{-2587 -(-2588)} = 2.
Since there is a difference of one in the number of parameters we compare to the Chi-Square
Distribution with one degrees of freedom. Since 2 < 2.706 we do not reject at 10%.
Comment: The Pareto Distribution is not a better model than the Holmes Distribution at a 10%
significance level. (It is therefore also true that the Pareto Distribution is not a better model than the
Holmes Distribution at a 5% significance level, etc.)
13.4. D. The Holmes Distribution is a special case of the Generalized Pareto, with two less
parameters. Twice the difference in the loglikelihoods is 2{-2585 -(-2588)} = 6.
Since there is a difference of two in the number of parameters we compare to the Chi-Square
Distribution with two degrees of freedom.
Since 5.991 < 6 < 7.378, we reject at 5% and do not reject at 2.5%.
13.5. C. The Weibull is a special case of the Transformed Gamma, with one less parameter. Twice
the difference in the loglikelihoods is: 2{-2583 -(-2586)} = 6.
Since there is a difference of one in the number of parameters we compare to the
Chi-Square Distribution with one degree of freedom.
Since 5.024 < 6 < 6.635 we reject at 2.5% and do not reject at 1%.
13.6. A. The Exponential is a special case of the Transformed Gamma, with two less parameters.
Twice the difference in the loglikelihoods is 2{-2583 -(-2592)} = 18.
Since there is a difference of two in the number of parameters we compare to the Chi-Square
Distribution with two degrees of freedom. Since 10.597 < 18 we reject at 0.5%.
13.7. A. The Exponential is a special case of the Weibull, with one less parameter.
Twice the difference in the loglikelihoods is 2{-2586 -(-2592)} = 12.
Since there is a difference of one in the number of parameters we compare to the
Chi-Square Distribution with one degree of freedom. Since 7.879 < 12 we reject at 0.5%.
(lnx - µ)2
exp[- ]
2σ2
13.8, 13.9, & 13.10. f(x) = .
x σ 2π
ln f(x) = -0.5{(ln(x)-µ)2 /σ2 } - ln(σ) - ln(x) - (1/2)ln(2π).

Σ ln f(xi) = -0.5{Σ(ln(xi)-µ)2 /σ2 } - nln(σ) - Σln(xi) - (n/2)ln(2π).
Set the partial derivatives of the sum of loglikelihoods equal to zero:
∂ ∑ ln[f(xi)] = Σ(ln(xi)-µ)2/σ3 - n/σ = 0. ∂ ∑ ln[f(xi)] = Σ(ln(xi)-µ)/σ2 = 0.
∂σ ∂µ
⇒ Σ(ln(xi)-µ) = 0. ⇒ µ = Σln(xi) / n = {ln(100) + ln(500) + ln(2000) + ln(10000)}/ 4 = 6.908.

Therefore σ = (Σ(ln(xi)-µ)2 / n )0.5 = 2.887 = 1.699.
For µ = 6.908 and σ = 1.699, loglikelihood: -.5{Σ(ln(xi)-µ)2 /σ2 } - nln(σ) - Σln(xi) - (n/2)ln(2π) =
-.5{Σ(ln(xi) - 6.908)2 /1.6992 } - 4ln(1.699) - Σln(xi) - 2ln(2π) =

-2.00 - 2.12 - 27.63 - 3.676 = -35.43.
13.11. D. For µ = 5 and σ = 1.5, loglikelihood:

-0.5{Σ(ln(xi)-µ)2 /σ2 } - nln(σ) - Σln(xi) - (n/2)ln(2π) =
-0.5{Σ(ln(xi) - 5)2 /1.52 } - 4ln(1.5) - Σln(xi) - 2ln(2π) = -5.80 - 1.62 - 27.63 - 3.675 = -38.73.
The test statistic is twice the difference in the loglikelihoods: (2){-35.43 - (-38.73)} = 6.60.
The difference is number of parameters is: 2 - 0 = 2.
So we compare the statistic to the Chi-Square Distribution with 2 degrees of freedom.
5.991 < 6.60 < 7.378. ⇒ Reject at 5%, but not at 2.5%.
13.12. D. For an Exponential Distribution, ln f(x) = -x/θ - ln(θ).

Assuming θB = θE/1.3, and θG = θE /1.6, then the loglikelihood is:
∑ {-xi 1.3 / θE - ln(θE / 1.3)} + ∑ {-xi / θE - ln(θ E)} + ∑ {-xi 1.6 / θE - ln(θE / 1.6)} =
Bay Empire Granite
-3000000(1.3)/θE - 3000ln(θE/1.3) - 7500000/θE - 6000ln(θE) - 1125000(1.6)/θE - 1500ln(θE/1.6)

= -13,200,000/θE - 10500ln(θE) + 3000ln(1.3) + 1500ln(1.6).
Setting the partial derivative of the loglikelihood with respect to θE equal to zero:
0 = 13,200,000/θE2 - 10500/θE. θE = 13,200,000/10500 = 1257.14.

-13,200,000/1257.14 - 10500ln(1257.14) + 3000ln(1.3) + 1500ln(1.6) = -83942.13.
Separate estimate of θ for Bay Stateʼs Exponential Distribution, θ = 1000.
The corresponding maximum loglikelihood is: -3,000,000/1000 - 3000ln(1000) = -23723.27.
Separate estimate of θ for Empire Stateʼs Exponential Distribution, θ = 1250.
Separate estimate of θ for Granite Stateʼs Exponential Distribution, θ = 750.
Unrestricted loglikelihood is: - 23723.27 - 48785.39 - 11430.11 = -83938.77.
The restriction is two dimensional, so compare to the Chi-Square with two degrees of freedom.
Since 5.991 < 6.72 < 7.378, we reject H0 at 5% and do not reject H0 at 2.5%.
13.13. A. For alpha fixed, maximum likelihood is equal to the method of moments.
X = (40 + 150 + 230 + 400 + 770)/5 = 318. θ = X /α = 318/3 = 106.
f(x) = e-x/θx2 /(2θ3). Σln f(xi) = Σ{-xi/θ + 2 ln(xi) - ln2 - 3ln(θ)} = -Σxi/θ + 2 Σln(xi) - nln2 - 3nln(θ).
For θ = 106 the loglikelihood is: -Σxi/106 + 2 Σln(xi) - 5ln2 - 15ln106 =

-15 + 53.55 - 3.47 - 69.95 = -34.87.
13.14. D. For θ = 200 the loglikelihood is: -Σxi/200 + 2 Σln(xi) - 5ln2 - 15ln200 =
-7.95 + 53.55 - 3.47 - 79.47 = -37.34.
The difference is number of parameters is: 1 - 0 = 1.
In the likelihood ratio test, we determine whether fitting a parameter from the data makes a significant
difference. When we are given alpha and theta, we are not estimating any parameters from the data.
When we are given alpha = 3, we are only estimating one parameter, theta, from the data.
13.15. A. Use the likelihood ratio test.

Twice the difference in the loglikelihoods is: (2){-2067.83 - (-2072.02)} = 8.38.
Fixing the mean is a one dimensional restriction, so we compare to the Chi-Square with one degree
of freedom. 8.38 > 7.879, so we reject H0 at 1/2%.
13.16. C. f(x) = e-x/θ/θ. ln f(x) = -x/θ - ln(θ). Loglikelihood is: -Σxi/θ - nln(θ).
Separate estimate of θ for Rural Exponential Distribution, θ = 100, same as the method of
moments.
The corresponding maximum loglikelihood is: -500,000/100 - 5000ln(100) = -28025.85.
Separate estimate of θ for Urban Exponential Distribution, θ = 125.
Restricted by H0 , θU = 1.2θR, the loglikelihood for the combined sample is:
-500,000/θR - 5000ln(θR) -1,250,000/(1.2θR) - 10000ln(1.2θR).
Setting the partial derivative with respect to θR equal to zero, and solving:
θR = (500000 + 1250000/1.2)/(5000 + 10000) = 102.78. θU = (1.2)(102.78) = 123.33.

-500,000/102.78 - 5000ln(102.78) -1,250,000/123.33 - 10000ln(123.33) = -86311.76.
Unrestricted loglikelihood is: -28025.85 - 58283.14 = -86308.99.
Comment: The restricted model has one parameter, while the unrestricted model has 2 parameters.
13.17. For σ = 1, f(x) = exp[-(x - µ)2 /2] / 2 π .

loglikelihood is: -Σ(xi - µ)2 /2 - (n/2)ln(2π).
The maximum likelihood µ = X , with corresponding loglikelihood: -Σ(xi - X )2 /2 - (n/2)ln(2π).
For µ = 0, loglikelihood is: -Σxi2 /2 - (n/2)ln(2π).
Twice the difference in the loglikelihoods is: Σxi2 - Σ(xi - X )2 = Σ(2xi X - X 2 ) = n X 2 .
If H0 is true, X is Normal with mean zero and standard deviation 1/ n .
Therefore, n X is a Unit Normal with mean zero and standard deviation 1.

Therefore, n X 2 is the square of a Unit Normal, a Chi-Square Distribution with 1 degree of freedom.
Unlike the usual situation, where the test statistic is approximately Chi-Square, with the
approximation improving as n gets larger, here it actually follows a Chi-Square Distribution.
σ could have taken any fixed positive value, rather than 1.
13.18. A. f(x) = αθα/(θ + x)α + 1 = α 10α/(10 + x)α + 1.

ln f(x) = ln(α) + α ln(10) − (α+1)ln(10 + x).
Loglikelihood = Σ ln (f(xi)) = Σ { ln(α) + α ln(10) − (α+1)ln(10 + xi) }
= 500 ln(α) + 500α ln(10) - 1485(α+1).

Setting the derivative with respect to α equal to zero:
0 = 500/α + 500 ln(10) - 1485. ⇒ α^ = 500 / {1485 - 500ln(10)} = 1.498.

The corresponding loglikelihood is:
500 ln(1.498) + (500)(1.498) ln(10) - (1485)(2.498) = -1782.83.
The loglikelihood corresponding to θ = 10 and α = 1.7 is:
500 ln(1.7) + (500)(1.7) ln(10) - (1485)(2.7) = -1786.99.
The likelihood ratio test statistic is: (2){-1782.83 - (-1786.99)] = 8.32.
We are comparing a situation with one fitted parameter versus zero fitted parameters,
or 1 - 0 = 1 degree of freedom in the Chi-Square table.
8.32 > 7.879, so we reject at H0 1/2%.
13.19. Of the one parameter distributions, the Exponential has the best loglikelihood at -3023.88.
Of the two parameter distributions, the Gamma has the best loglikelihood at -3021.05.
Applying the likelihood ratio test to the Gamma versus the Exponential, twice the difference is:
(2)(-3021.05 - 3023.88) = 5.66 ≤ 6.635 the 1% critical value for 1 degree of freedom.
Thus we do not reject the simpler Exponential in favor of the Gamma.
Of the three parameter distributions, the Transformed Gamma has the best loglikelihood at
-3016.98.
Applying the likelihood ratio test to the Transformed Gamma versus the Exponential, twice the
difference is: (2)(-3016.98 - -3023.88 ) = 13.80 > 9.210 the 1% critical value for 2 degrees of
freedom. Thus we reject the simpler Exponential in favor of the Transformed Gamma.
The Transformed Gamma is selected as best.
Comment: For a given number of parameters, the distribution with the larger loglikelihood is the
better fit. Applying the likelihood ratio test to the Transformed Gamma versus the Gamma, twice the
difference is (2)(-3016.98 - -3021.05) = 8.14 > 6.635 the 1% critical value for 1 degree of freedom.
Thus we reject the simpler Gamma in favor of the Transformed Gamma.
It may turn out that applying the likelihood ratio test does not show a clear winner as it did here.
The Schwarz Bayesian Criterion does not have this potential problem.
13.20. B. f(x) = e-x/θ/θ. loglikelihood: -Σxi/θ - n ln(θ).
Maximum likelihood is equal to the method of moments: θ = X .

The maximum loglikelihood is: -Σxi/ X - n ln( X ) = -n{1 + ln( X )}.
For State A separately, θ = X = 52.

The maximum loglikelihood is: -10{1 + ln(52)} = -49.5124.
For State B separately, θ = X = 116.
For the two states combined, θ = X = 84.
Twice the difference in loglikelihoods: 2{-(49.5124 + 57.5359) - (-108.6163)} = 3.136.
The difference in the number of parameters is 1. Therefore we compare to a Chi-Square
Distribution with 1 degree of freedom. Since 2.706 < 3.136 < 3.841,
reject H0 at the 0.10 level of significance, but not at the 0.05 level of significance.
13.21. B. The test statistic is twice the difference in the loglikelihoods:

(2)( -1031.30 - (-1034.71)) = 6.82. The difference is number of parameters is: 2 - 1 = 1.
13.22. C. f(x) = θα e−θ/x / {Γ(α) xα+1}. ln f(x) = αln(θ) - θ/x - lnΓ(α) - (α + 1)ln(x).
loglikelihood = Nαln(θ) - θΣ1/xi - N lnΓ(α) - (α + 1)Σln(xi)
= (100)αln(θ) - θ(0.309383) - 100lnΓ(α) - (α + 1)(594.968).
For α = 4 and θ = 1293, the loglikelihood is:
(100)(4)ln(1293) - (1293)(0.309383) - 100ln6 - (4 + 1)(594.968) = -688.160.

The test statistic is twice the difference in the loglikelihoods:
(2)(L1 - L0 ) = (2){-686.084 - (-688.160)} = 4.152.
The difference in the number of parameters is: 2 - 1 = 1.
13.23. C. The maximum likelihood fit is θ = X = 120,000/2500 = 48.

f(x) = exp[-x/θ] / θ. lnf(x) = -x/θ - ln(θ).
Thus the loglikelihood is: -120,000/θ - 2500 ln(θ).
The maximum loglikelihood is: -120,000/48 - 2500 ln(48) = -12,178.003.
The loglikelihood for θ = 50 is: -120,000/50 - 2500 ln(50) = -12,180.058.
The Likelihood Ratio test statistic is: (2) {-12,178.003 - (-12,180.058)} = 4.110.
θ = 50 is a special case of that with θ varying; the test has: 1 - 0 = 1 degree of freedom.
3.841 < 4.110 < 5.024. Reject at 5%, do not reject at 2.5%.
13.24. For the Inverse Gaussian distribution with θ fixed, maximum likelihood is equal to the method
of moments. µ = X .
Alternately, f(x) = (θ/ 2πx3 ).5 exp[- θ({x − µ} / µ)2 / 2x].
Σ ln f(xi) = (n/2) ln(θ) - (n/2) ln(2π) - 1.5Σ ln(xi) - (θ/2) Σ(xi /µ2 - 2/µ + 1/ xi).
Set the partial derivative of the loglikelihood with respect to µ equal to zero.
∂ ∑ ln[f(xi)] = -(θ /2)Σ(-2xi /µ3 + 2/µ2 ) = 0.

∂µ
Σ 2/µ2 = Σ2xi /µ3. Therefore, n µ = Σ xi. ⇒ µ = Σ xi /n = X .

The corresponding maximum likelihood is:
(n/2) ln(θ) - (n/2) ln(2π) - 1.5Σ ln(xi) - (θ/2) Σ(xi / X 2 - 2/ X + 1/ xi).
The loglikelihood corresponding to µ = µ0 is:
(n/2) ln(θ) - (n/2) ln(2π) - 1.5Σ ln(xi) - (θ/2) Σ(xi /µ0 2 - 2/µ0 + 1/ xi).
Twice the difference in the loglikelihoods is:
nX 2n nX 2n
θ {Σ(xi /µ0 2 - 2/µ0 ) - Σ(xi / X 2 - 2/ X ) } = 13 { - - 2 + }
µ0 2 µ0 X X
13 n 13 n
= { X 2 - 2 X µ0 - µ0 2 + 2µ0 2} = ( X - µ0 )2 .
X µ0 2 X µ0 2
13.25. f(x) = τ(θ/x)τ exp[-(θ/x)τ] /x.
ln f(x) = ln(τ) - (τ + 1)ln(x) + τ ln(θ) - (θ/x)τ.

For τ = 2, the loglikelihood is:
4 ln(2) - 3 ln(0.1) - 3 ln(0.3) - 3 ln(0.6) - 3 ln(0.8) + 8 ln(θ) - θ2 (0.1-2 + 0.3-2 + 0.6-2 + 0.8-2).
Set the derivative of the loglikelihood equal to zero:
8/θ = (2θ) (115.451). ⇒ θ = 0.1861.
Thus the maximum loglikelihood is:
4 ln(2) - 3 ln(0.1) - 3 ln(0.3) - 3 ln(0.6) - 3 ln(0.8) + 8 ln(0.1861) - 0.18612 (115.451) =
-1.956.
Comment: For an Inverse Weibull distribution with τ fixed, the maximum likelihood θ is:
⎛ N ⎞ 1/ τ
⎜ ⎟
∑
.
⎜ xi- τ ⎟
⎝ ⎠
13.26. A. The loglikelihood is:

4 ln(τ) - (τ + 1) ln[(0.1)(0.3)(0.6)(0.8)] + 4τ ln(θ) - (0.1-τ + 0.3-τ + 0.6-τ + 0.8-τ) θτ.
For θ = 0.2277 and τ = 1.2433, the loglikelihood is:
4 ln(1.2433) - (1.2433 + 1) ln[(0.1)(0.3)(0.6)(0.8)] + (4)(1.2433) ln(0.2277)
- (0.1-1.2433 + 0.3-1.2433 + 0.6-1.2433 + 0.8-1.2433) (0.22771.2433) = -0.976.
The likelihood ratio statistic is twice the difference: (2){-0.976 - (-1.956)} = 1.960.
There is a difference in fitted parameters of: 2 - 1 = 1.
Consulting the Chi-Square Table for one degree of freedom,
1.960 < 2.706. Do not reject H0 at 10%.
13.27. The maximum likelihood fit is θ = X = 1250/5 = 250.

f(x) = exp[-x/θ] / θ. lnf(x) = -x/θ - ln(θ).
Thus the loglikelihood is: -1250/θ - 5ln(θ).
The maximum loglikelihood is: -1250/250 - 5ln(250) = -32.6073.
θ = θ0 is a special case of that with θ varying; the test has: 1 - 0 = 1 degree of freedom.
The 5% critical value for the Chi-Square is 3.841.
Thus we reject the null hypothesis when: (2) {-32.6073 - (-1250/θ0 - 5ln[θ0 ])} > 3.841.
⇒ 250/θ0 + ln[θ0 ] > 6.906.

Here is a graph of 250/θ0 + ln[θ0 ]:
6.906
Theta
100 250 500 750
We will reject when θ0 is either much larger or much smaller than the maximum Iikelihood fit of 250.
Trying some values, 250/116 + ln[116] = 6.909 > 6.906.
250/699 + ln[699] = 6.907 > 6.906.
Thus we reject when either θ0 ≤ 116 or θ0 ≥ 699.
Comment: For example, if θ0 = 110, then the loglikelihood for θ0 is:

-1250/110 - 5 ln(110) = -34.8660.
The test statistic is: (2){-32.6073 - (-34.8660)} = 4.175 > 3.841, and we reject at 5%.
Usually one is given θ0 and you perform the test; however, these are the ideas behind the
Non-Normal Confidence Intervals to be discussed in a subsequent section.
A 95% confidence interval for θ is (116, 699); these are the values such that the loglikelihood ≥
maximum loglikelihood - (Pth percentile of the Chi-Square Dist.) / 2 = -32.6073 - 3.841/2 = -34.528.
13.28. B. For a Normal Distribution, maximum likelihood is equal to the method of moments.
∧
X = 658/100 = 6.58. ⇒ µ = 6.58.
∧
Second moment is: 4927/100 = 49.27. ⇒ σ = 49.27 - 6.582 = 2.444.
(x -µ)2
exp[- ]
2σ2 .
For a Normal Distribution, f(x) =
σ 2π
(x - µ)2
ln[f(x)] = - - ln[σ] - ln[2π]/2.
2σ2
100
∑ (xi - µ)2 - 100 ln[σ] - 50 ln[2π]

-1
Thus in this case with n = 100, the loglikelihood is:
2σ2
i=1
100 100
∑ ∑ xj -
-1 µ 50 µ2
= xj2 + 2 - 100 ln[σ] - 50 ln[2π]
2σ2 σ σ2
j=1 j=1
= -2463.5/σ2 + 658 µ/σ2 - 50 µ2 / σ2 - 100 ln[σ] - 50 ln[2π].

For the maximum likelihood fit, the loglikelihood is:
-2463.5/2.4442 + (658)(6.58) / 2.4442 - (50)(6.582 ) / 2.4442 - 100 ln[2.444] - 50 ln[2π].
= -231.261.
For µ = 6.5 and σ = 2, the loglikelihood is:
-2463.5/22 + (658)(6.5) / 22 - (50)(6.52 ) / 22 - 100 ln[6.5] - 50 ln[2π].
= -235.959.
Thus the test statistic is: (2) {-231.261 - (-235.959)} = 9.396.
The difference in numbers of parameters is: 2 - 0 = 2.
For 2 degrees of freedom from the Chi-Square table,
the 1% critical value is 9.210, and the 0.5% critical value is 10.597.
Since 9.210 < 9.396 < 10.597, we reject H0 at 1% but not at 0.5%.
13.29. A. The test statistic is: (2) {-1042.920 - (-1050.031)} = 14.222.

The difference in numbers of parameters is: 3 - 0 = 3.
For 3 degrees of freedom from the Chi-Square table, the 0.5% critical value is 12.838.
Since 12.838 < 14.222, we reject H0 at 0.5%.
⎛ x ⎞τ ⎡ ⎛ x ⎞ τ⎤
τ ⎜ ⎟ exp ⎢-⎜ ⎟ ⎥
⎝ θ⎠ ⎣ ⎝ θ⎠ ⎦
13.30. E. For the Weibull Distribution, f(x) = = τ xτ−1 exp[-(x/θ)τ] / θτ.
x
Thus the loglikelihood is: n ln[τ] + (τ-1) ∑ ln[xi] - ∑ xiτ / θτ - n τ ln[θ] =

100
100 ln[τ] + (τ-1)(856.948) - ∑ xiτ / θτ - 100 τ ln[θ].
i=1
For the maximum Iikelihood fit: θ^ = 10,967 and τ^ = 0.7805.

100 ln[0.7805] + (0.7805-1)(856.948) - 142,329 / 10,9670.7805 - (100) (0.7805) ln[10,967] =
-1038.955.
For τ = 1, we have an Exponential Distribution.
The maximum likelihood fit is θ = X = 1,267,671 / 100 = 12,676.71.
The corresponding loglikelihood is: -1,267,671/12,676.71 - 100 ln[12,676.71] = -1044.752.
The test statistic is: (2){-1038.955 - (-1044.752)} = 11.594.
There are 2 - 1 = 1 degree of freedom.
Looking in the Chi-Square Table: 7.879 < 11.594. Thus reject H0 at 0.5%.
We are always comparing two models; one is a special case of the other.
In this question, we compare a Weibull to the special case an Exponential (tau = 1).
We compare the best Weibull to the best Exponential.
It might be better to write that all out, but that is what the conventional shorthand in this question used
by the Exam Committee and others means. In general, we compare the Maximum Loglikelihood to
the restricted Maximum Loglikelihood. The former in this case is over all possible Weibulls, including
those with tau = 1. The latter in this case is over all Weibulls with tau = 1, in other words all
Exponentials.
⎛ x ⎞τ ⎡ ⎛ x ⎞ τ⎤
τ ⎜ ⎟ exp ⎢-⎜ ⎟ ⎥
⎝ θ⎠ ⎣ ⎝ θ⎠ ⎦
13.31. A. For the Weibull Distribution, f(x) = = τ xτ−1 exp[-(x/θ)τ] / θτ.
x

100
100 ln[τ] + (τ-1)(856.948) - ∑ xiτ / θτ - 100 τ ln[θ].
i=1

100 ln[0.7805] + (0.7805-1)(856.948) - 142,329 / 10,9670.7805 - (100) (0.7805) ln[10,967] =
-1038.955.
For τ = 2/3, the maximum likelihood fit is: θ = (∑ x / N)
i
τ 1/ τ
= (46,711.7/100)1.5 = 10,096.

100 ln[2/3] + (2/3 - 1)(856.948) - 46,711.7 / 10,0962/3 - (100) (2/3) ln[10,096] = -1040.854.
Looking in the Chi-Square Table: 2.706 < 3.798 < 3.841. Thus reject H0 at 10% and not at 5%.
Similar to 4, 11/05, Q.25 (2009 Sample Q.235).
⎛ x ⎞τ ⎡ ⎛ x ⎞ τ⎤
τ ⎜ ⎟ exp ⎢-⎜ ⎟ ⎥
⎝ θ⎠ ⎣ ⎝ θ⎠ ⎦
13.32. B. For the Weibull Distribution, f(x) = = τ xτ−1 exp[-(x/θ)τ] / θτ.
x

100
100 ln[τ] + (τ-1)(856.948) - ∑ xiτ / θτ - 100 τ ln[θ].
i=1

100 ln[0.7805] + (0.7805-1)(856.948) - 142,329 / 10,9670.7805 - (100) (0.7805) ln[10,967] =
-1038.955.
For θ = 10,000 and τ = 0.9, the corresponding loglikelihood is:
100 ln[0.9] + (0.9 - 1)(856.948) - 465,336 / 10,0000.9 - (100) (0.9) ln[10,000] = -1042.049.
Looking in the Chi-Square Table: 5.991 < 6.188 < 7.378. Thus reject H0 at 5% and not at 2.5%.
13.33. C. f(x) = (αθα)(θ + x)−(α + 1). ln f(x) = ln(α) + αln(θ) - (α + 1)ln(θ + x).
loglikelihood = N ln(α) + N α ln(θ) - (α + 1)Σln(θ + xi).
For α = 1.5, θ = 7.8, and N = 200, the loglikelihood is:

(200)ln(1.5) + (200)(1.5)ln(7.8) - (2.5)(607.64) = -821.77.
The test statistic is twice the difference in the loglikelihoods: (2)(-817.92 - (-821.77)) = 7.70.
The difference in the number of fitted parameters is: 2 - 0 = 2.
7.378 < 7.70 < 9.210. ⇒ Reject at 2.5%, but not at 1%.
Comment: When we fit both alpha and theta, then there are two parameters. When we instead fix
both alpha and theta, then there are no fitted parameters. The latter is a special case of the former.
13.34. C. f(x) = τ(x/θ)τ exp[-(x/θ)τ] /x.
ln f(x) = ln(τ) + (τ - 1)ln(x) - τ ln(θ) - (x/θ)τ.

For τ = 2 and θ = 816.7, ln f(x) = ln(2) + ln(x) - 2ln(816.7) - (x/816.7)2 .
Loglikelihood is: 5ln(2) + Σln(xi) - 10ln(816.7) - Σ(xi/816.7)2 =
3.466 + 33.305 - 67.053 - 5.000 = -35.28.
We are given that at the maximum likelihood estimate the loglikelihood is -33.05.
The likelihood ratio statistic is twice the difference: (2){-33.05 - (-35.28)} = 4.46.
There is a difference in number of fitted parameters of 2 - 1 = 1.
Consulting the Chi-Square Table for one degree of freedom,
3.841 < 4.46 < 5.024. Reject H0 at 5%, but not at 2.5%.
Comment: For τ = 2 known and fixed, you should be able to determine the maximum likelihood
value of θ yourself. θ = (Σ xiτ / N)1/τ = {(5952 + 7002 + 7892 + 7992 + 11092 )/5}1/2 = 816.7.
Using a computer, the maximum likelihood Weibull has θ = 869.6 and τ = 4.757; given these fitted
parameters, you should be able to calculate the corresponding maximum loglikelihood of -33.05.
H0 is to use the simpler distribution, the Weibull with τ = 2 fixed.
13.35. E. f(x) = (αθα)(θ + x)−(α + 1). ln f(x) = ln(α) + αln(θ) - (α + 1)ln(θ + x).
loglikelihood = N ln(α) + N α ln(θ) - (α + 1)Σln(θ + xi) = (20)ln(2) + (20)(2) ln(θ) - (2 + 1)Σln(θ + xi).
For θ = 7.0, the loglikelihood is:
20 ln(2) + 40 ln(7) - 3 Σln(7 + xi) = 91.6993 - (3)(49.01) = -55.331.
For θ = 3.1, the loglikelihood is:
20 ln(2) + 40 ln(3.1) - 3 Σln(3.1 + xi) = 59.1190 - (3)(39.30) = -58.781.

The difference in the number of parameters is: 1 - 0 = 1.
Comment: When we fix alpha and fit theta, then there is only one parameter. When we instead fix
both alpha and theta, then there are no fitted parameters. The latter is a special case of the former.
13.36. D. Likelihood ratio test statistic: (2) {(-826.23) - (-828.37)} = 4.28.

Since the Weibull has 2 parameters and the Exponential has one parameter,
the number of degrees of freedom is: 2 - 1 = 1.
Looking in the Chi-Square table, 3.84 < 4.28 < 5.02.
Thus the p-value is between 2.5% and 5.0%.
2016-C-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/22/15, Page 448
Section 14, Hypothesis Testing182
You should know how to apply hypothesis testing to the Chi-Square Test, Likelihood Ratio Test,
Kolmogorov-Smirnov Test, etc. It is also a good idea to know some of the general terminology.
Chi-Square Example:
The previously discussed application of the Chi-Square Statistic is an example of Hypothesis

Testing. For example, for the grouped data in Section 3, for the Transformed Gamma Distribution fit
by the Method of Maximum Likelihood, with parameters α = 3.2557, θ = 1861, and τ = 0.59689,
the Chi-Square statistic is 16.45.
The steps of hypothesis testing are:
1. Choose a level of significance For example, level of significance = 1/2%.
2. Formulate the statistical model The grouped data in Section 3 is a random sample
drawn from a single distribution.
3. Specify the null hypothesis H0 H0 : The assumed distribution is a

Transformed Gamma Distribution with
α = 3.2557, θ = 1861, and τ = 0.59689
and the alternative hypothesis H1 . H1 : The assumed distribution is not the above.
4. Select a test statistic whose The Chi-Square Statistic computed above has
behavior is known. approximately a Chi-Square Distribution with
5 degrees of freedom.183
5. Find the appropriate critical region. The critical region or rejection region is χ2 > 16.750.184
6. Compute the test statistic on the The test statistic is χ2 = 16.45.

assumption that H0 is true.
7. Draw conclusions. If the test statistic The test statistic is not in the critical region,
lies in the critical region, then reject the (since 16.45 ≤ 16.750) so we do not reject H0 .
null hypothesis. We do not reject the fitted Trans. Gamma at 1/2%.
182
183
Assuming the null hypothesis is true. Degrees of freedom = 9 intervals - 1 - 3 fitted parameters = 5.
184
Consulting the Chi-Square table for 5 d.f. and P = 99.5% ⇔ significance level of 1/2%.
The interval from 0 to 16.750 is a 99.5% confidence interval for the Chi-Square Statistic if H0 is true.
If the test statistic had been instead 17, then we would have rejected the null hypothesis. If the
significance level had been 1% instead of 1/2%, then the critical region would have been instead
χ 2 > 15.086. The test statistic would have been in the critical region, since 16.45 > 15.086.
Thus we would have rejected the fitted Transformed Gamma at 1%.
Null Hypothesis:
In general, in hypothesis testing one tests the null hypothesis H0 versus an alternative
hypothesis H1 . It is important which hypothesis is H0 and which is H1 ; as will be discussed, they
are treated differently.
In the example above, the null hypothesis was that the grouped data in Section 3 was a random
sample drawn from a Transformed Gamma Distribution with α = 3.2557, θ = 1861, and τ = 0.59689.
The alternative hypothesis was that the grouped data in Section 3 was not a random sample drawn
from this Transformed Gamma Distribution. A large Chi-Square means it is unlikely H0 is true and
therefore we would reject H0 .
In an application to regression, the null hypothesis might be that a certain slope in a regression
model is zero, while the alternative hypothesis is that this slope is not zero.
If the universe of possibility is divided in a manner that includes a boundary, the null
hypothesis must include the boundary.185 For example, if µ is the mean of a Normal Distribution,
H0 might be µ ≥ 0, while H1 is µ < 0.
Note that hypothesis tests are set up to disprove something, H0 , rather than prove
something. In the Chi-Square example, the test is set up to disprove that the data came from a
certain Transformed Gamma Distribution.
For example, a dry sidewalk is evidence it did not rain. On the other hand a wet sidewalk might be
caused by rain or something else such as a sprinkler system. A wet sidewalk can not prove that it
rained, but a dry sidewalk is evidence that it did not rain.
Similarly, a large Chi-Square value is evidence that the data was not drawn from the given
distribution, and may lead one to reject the null hypothesis. On the other hand, a small
Chi-Square value results in one not rejecting the null hypothesis; a small Chi-Square value does not
prove the null hypothesis is true.
185
We do not reject H0 unless there is sufficient evidence to do so. This is similar to the legal concept of
innocent (not guilty) until proven guilty. A trial does not prove one innocent.
Exercise: A LogNormal and a Pareto Distribution have each been fit to the same data, grouped into
11 intervals. The Chi-Square Statistics are 16.2 and 16.7 respectively.
What conclusions do you draw at a 2.5% significance level?
[Solution: There are 2 fitted parameters and 11 - 1 - 2 = 8 degrees of freedom. The critical value for
2.5% is 17.535. Since 16.2 ≤ 17.535 we do not reject the LogNormal.
Since 16.7 ≤ 17.535 we do not reject the Pareto.
Comment: The Chi-Square for the LogNormal is somewhat lower, and since there are the same
number of degrees of freedom, the p-value for the LogNormal is somewhat higher than for the
Pareto. The LogNormal is a somewhat better fit than the Pareto.]
In the above exercise, at a 2.5% significance level we do not reject either the LogNormal or the
Pareto fit. We were unable to reject two contradictory hypotheses that the data is a random sample
from a fitted Pareto or LogNormal Distribution. Clearly the data canʼt be a random sample from both
of these distributions. It would be somewhat troubling to “accept” two contradictory hypotheses.
Thus “do not reject H0 ” is more precise than and preferable to “accept H0 ”.
According to Loss Models, one should not use the term “accept H0 ”. Nevertheless, it is common
for actuaries, including perhaps some members of the exam committee, to use the terms “do not
reject H0 ” and “accept H0 ” synonymously.
For many actuaries in common usage: do not reject ⇔ accept.
Test Statistic:
A hypothesis test needs a test statistic whose distribution is known. In the above example, the
test statistic was the Chi-Square Statistic. In a subsequent example, the test statistic is the
Kolmogorov-Smirnov Statistic. In other tests, one would use the Normal Table, t, or F Tables.
Critical Values:
The critical values are the values used to decide whether to reject H0 . For example, in the above
Chi-Square test, the critical value (for 1/2% and 5 degrees of freedom) was 16.750.
We reject H0 if the Chi-Square statistic is greater than 16.750. The critical value(s) form the
boundary (other than ±∞) of the rejection or critical region.
rejection region ⇔ critical region ⇔
if test statistic is in this region then we reject H0 .

Significance Level:
The significance level, α, of the test is a probability level selected prior to performing the test. In
the above Chi-Square example 1/2% was selected. Using the Chi-Square table attached to the
exam, one can perform tests at significance levels of 10%, 5%, 2.5%, 1%, and 1/2%. For example,
a significance level of 5% uses the column listed as P = 1 - 5% = 95%.
If Prob[test statistic will take on a value at least as unusual as the computed value | H0 is true] is less
than or equal to the significance level chosen, then we reject the H0 . If not, we do not reject H0 .
The result of any hypothesis test depends on the significance level chosen. Therefore, in practical
applications the choice of the significance level is usually important.
In testing the effectiveness of a new drug, a one-sided significance of 5% might be used.

The null hypothesis is that the new drug is not effective.
In order to establish experimental evidence for an elementary particle such as the Higgs Boson,
1
physicists use a one-sided significance level of: = Φ[-5].
3.5 million
The null hypothesis is that the particle does not exist.
Exercise: A Weibull Distribution has been fit to data grouped into 11 intervals.
The Chi-Square Statistic is 18.3.
What conclusions do you draw at different significance levels?
[Solution: There are 2 fitted parameters and 11 - 1 - 2 = 8 degrees of freedom. The critical values for
10%, 5%, 2.5%, 1%, and 1/2% shown in the Chi-Square Table for 8 degrees of freedom are:
13.362, 15.507, 17.535, 20.090, 21.955. Since 18.3 > 13.362, reject the Weibull at 10%. Since
18.3 > 15.507, reject the Weibull at 5%. Since 18.3 > 17.535, reject the Weibull at 2.5%. Since
18.3 ≤ 20.090, do not reject the Weibull at 1%. Since 18.3 ≤ 21.955, do not reject the Weibull at
1/2%.]
The results of this exercise would usually be reported as: reject the Weibull at 2.5%, do not reject
the Weibull at 1%. Since we reject at 2.5%, we also automatically reject at 5% and 10%. Since we
do not reject at 1%, we also automatically do not reject at 1/2%.
Types of Errors:
There are two important types of errors that can result when performing hypothesis testing:186
Type I Error Reject H0 when it is true.

Type II Error Do not reject H0 when it is false.
Exercise: A Chi-Square test was performed at a 1/2% significance level for 5 degrees of freedom.
The Chi-Square Statistic was greater than the critical value of 16.75. Therefore, H0 was rejected.
What is the probability of a Type I error?
[Solution: If H0 is true, then there is a 1/2% chance that the Chi-Square Statistic will be greater than
16.750, due to random fluctuation in the limited sample represented by the observed data; the
survival function at 16.750 of a Chi-Square Distribution with ν = 5 is 1/2%. The Chi-Square Statistic
was greater than 16.750, therefore the probability of a Type I error is less than 1/2%.]
Rejecting H0 at a significance level of α, means the probability of a Type I error is at most α.
We would like both a small probability of rejecting when we should not (making a Type I error), and
a small probability of failing to reject when we should (making a Type II error).
But there is a trade-off.
Reducing the probability of one type of error, increases the probability of the other type of error.
With more relevant data one can reduce the probability of both types of errors.
186
We are assuming you set everything up correctly. These errors are due to the random fluctuations present in all
data sets and the incomplete knowledge of the underlying risk process which led one to perform a hypothesis test
in the first place.
p-value:
The p-value = Prob[test statistic takes on a value less in agreement with H0

than its calculated value].
If the p-value is less than or equal to the chosen significance level, then we reject H0 .187
Exercise: For 2 degrees of freedom, the Chi-Square Statistic is 8. What is the p-value?
[Solution: Since the critical value for 2.5% is 7.378 and the critical value for 1% is 9.210,
and 7.378 < 8 < 9.210, the p-value is between 1% and 2.5%. Reject at 2.5%, do not reject at 1%.
Using a computer or noting that the Chi-Square Distribution with 2 degrees of freedom is an
Exponential Distribution with mean 2, the p-value is: S(8) = e-8/2 = 1.83%.]
When applying hypothesis testing to test the fit of a distribution to data, the larger the
p-value the better the fit. Small p-values indicate a poor fit.
According to Loss Models:188

p-values > 10% do not indicate any support for the alternative hypothesis H1 , while
p-values < 1% indicate strong support for H1 .
187
Some sources instead say that we reject H0 if the p-value is less than the chosen significance level.
There should not be a case on the exam or in practical applications where the p-value is exactly equal to the
significance level.
188
Power of a Test:
The power of a test is the probability of rejecting the null hypothesis, when H1 is true.
Prob[Type II error] = 1 - Power of the test = probability of failing to reject H0 when it is false.
Thus, everything else equal, large power of a test is good.
Assume that you have a data set with 100 values.

H0 is that the data is from a Normal Distribution with variance 30 and mean = 5.
H1 is that the data is a Normal Distribution with variance 30 and mean = 7.
What is the power of applying the Normal test at a 5% significance level?
We reject H0 if the observed mean is large; we perform a one-sided test.

The observed mean is Normally Distributed, with a variance of: 30/100 = 0.3.
If H0 is true, Prob[observed mean > x] = 1 - Φ[(x - 5)/ 0.3 ].
Consulting the Normal Table, this probability is 5% when x = 5 + (1.645) 0.3 = 5.90.
Power = the probability of rejecting the null hypothesis when H1 is true =
Probability[Normal Distribution with variance 0.3 and mean = 7 is greater than 5.90] =
Φ[(7 - 5.90)/ 0.3 ] = Φ(2.01) = 97.8%.
Exercise: What is the power of instead applying the Normal test at a 2.5% significance level?
[Solution: Performing a one-sided test, we reject when the observed mean is greater than:
5 + (1.960) 0.3 = 6.07. Power = Φ[(7 - 6.07)/ 0.3 ] = Φ(1.70) = 95.5%.]
In general, all other things being equal, the smaller the significance level, the smaller the power of the
test. A smaller significance level results in less chance of rejecting H0 even though it is true, a Type I
error, but a greater chance of failing to reject H0 even though it is false, a Type II error.
Decision H0 true H0 False
Reject H0 Type I Error ⇔ p-value Correct ⇔ Power
Do not reject H0 Correct ⇔ 1 - p-value Type II Error ⇔ 1 - Power
In general, there is a trade-off between Type I and Type II errors. Making the probability of one
type of error smaller, usually makes the probability of the other type of error larger.
A hypothesis test is uniformly most powerful if it has the greatest power (largest probability of
rejecting H0 when it is false) of any test with the same (or smaller) significance level.189
We compare the power of tests for a given size data set. The larger the data set, the easier it is to
reject H0 when it is false; the larger the data set, the more powerful a given test.
Exercise: What is the power of applying the Normal test at a 2.5% significance level and assuming
200 data points?
[Solution: Performing a one-sided test, we reject when the observed mean is greater than:
5 + (1.960) 0.15 = 5.76. Power = Φ[(7 - 5.76)/ 0.15 ] = Φ(3.20) = 99.93%.
Comment: Increasing the sample size from 100 to 200 increased the power of the test from 95.5%
to 99.93%.]
For example, when computing the Chi-Square statistic, the numerator has terms proportional to the
square of the sample size and the denominator has terms that are proportional to the sample size.
Thus as the sample size increases, if everything else stays the same, the value of the test statistic
will increase, while the critical values stay the same, and thus we are more likely to reject H0 when it is
false.190
In the case of the Kolmogorov-Smirnov test, the critical values decrease as one over the square root
of the sample size. Thus as the sample size increases, if everything else stays the same, the test
statistic is the same but the critical values are smaller, and thus we are more likely to reject H0 when it
is false.191
189
190
See page 337 of Loss Models. The Anderson-Darling Statistic acts similarly. If the data set doubled in size, all else
being equal, the test statistic would double, while the critical value remained the same, and thus we are more likely to
reject H0 when it is false.
191
Questions will often involve determining the K-S Statistic for a data set of size 5, solely so that you can compute
the statistic under exam conditions. The K-S test would have little power if applied to such very small data sets.
Kolmogorov-Smirnov Example:
The application of the Kolmogorov-Smirnov Statistic, to be discussed subsequently, is another

important example of hypothesis testing.
For example, with 130 points as in the ungrouped data in Section 2, the critical values for the K-S
Statistic are:
Significance Level = α 0.20 0.10 0.05 0.01

| | | |
Critical Value for n = 130 0.0938 0.107 0.119 0.143
For the Pareto Distribution fit to the ungrouped data in Section 2 via Method of Moments,
α = 2.658 and θ = 518,304, the K-S Statistic is 0.131.
The critical value for 5% is: 1.36 / 130 = 0.119. Since 0.131 > 0.119, we can reject the fit of the
Pareto at a 5% significance level. On the other hand, the critical value for 1% is:
1.63 / 130 = 0.143 > 0.131, so one can not reject the Pareto at the 1% significance level.
Mechanically, the K-S Statistic for the Pareto of 0.131 is bracketed by 0.119 and 0.143.
One rejects to the left and does not reject (accepts) to the right.
Reject at 5% and do not reject at 1%.
The steps of hypothesis testing are:
1. Choose a 5% significance level

2. Formulate a statistical model: The ungrouped data in Section 2 is a random sample
of independent draws from a single distribution.
3. Specify the Null Hypothesis:
The distribution is a Pareto with α = 2.658 and θ = 518,304.192
The alternative hypothesis is that it is not the above distribution.
4. Select the K-S Statistic (whose behavior is known.)
5. The critical region or rejection region is K-S ≥ 1.36 / 130 = 0.119.
6. The test statistic is computed as 0.131, as stated above.
7. Draw the conclusion, that we reject the null hypothesis;
since the computed statistic is in the critical region we reject the fit at 5%.
192
This is a Pareto Distribution fit to the ungrouped data in Section 2 via the Method of Moments.
If instead in Step 1 one had chosen a 1% significance level, then the critical region would have been
K-S ≥ 1.63 / 130 = 0.143. The computed statistic of 0.131 is outside this critical region; thus we do
not reject the Null Hypothesis at a 1% significance. We have not proven the Null Hypothesis is true;
rather the data doesn't contradict the Null Hypothesis at this significance level of 1%.
Fitted Distributions:
In this example, the parameters of the Pareto Distribution were estimated by fitting to the
ungrouped data in Section 2. When we then compare this fitted Pareto Distribution to this same
data, the Pareto Distribution matches the data better than it otherwise would, since fitting determines
parameters that produce a distribution that is close to the data.
Therefore, the Kolmogorov-Smirnov Statistic is smaller than it would have been if we had instead
picked the parameters of the Pareto in advance.
We reject the null hypothesis when the Kolmogorov-Smirnov statistic is large. Therefore, if one uses
the table of Kolmogorov-Smirnov critical values, one would have a lower probability of rejecting
H0 .193 This increases the probability of a Type II error, failing to reject when one should. It decreases
the probability of a Type I error, rejecting when one should not.
In general, when a distribution is fit to data, and no adjustment is made to the critical values, the
probability of a Type II error increases, while the probability of a Type I error decreases,
compared to using parameters specified in advance.194
This would be the case for the usual applications of the Kolmogorov-Smirnov and
Anderson-Darling tests, to be discussed subsequently.
For the Chi-Square test, as discussed previously, we would reduce the degrees of freedom by the
number of fitted parameters. This would adjust the critical values for the effect of fitting, and there is
no obvious impact on the probabilities of errors compared to using parameters specified in
advance.
193
Assuming one did not adjust the table of critical values of the effect of fitting parameters. There is no such
adjustment for the K-S Statistic discussed on the syllabus. One might use simulation to estimate this effect.
See “Mahlerʼs Guide to Simulation.”
194
See page 332 of Loss Models. See also statement A of 4, 5/05, Q.19.
Fitting to Half the Data and Comparing the Fit to the Other Half of the Data:195
One way around this potential problem with the use of the Kolmogorov-Smirnov test when
comparing a fitted distribution is to select at random half of the data. Then fit the distribution to this half
of the data. Then compare the fitted distribution to the remainder of the given data set.
For example, I selected at random half of the ungrouped data in Section 2:196
400, 2800, 4500, 10400, 14100, 15500, 19400, 22100, 29600, 32200, 32500, 39500, 39900,
41200, 42800, 45900, 49200, 54600, 56700, 59100, 62500, 63600, 66900, 68100, 68900,
72100, 80700, 84500, 91700, 96600, 106800, 113000, 115000, 117100, 126600, 127600,
128000, 131300, 134300, 135800, 146100, 150300, 171800, 183000, 209400, 212900,
225100, 233200, 244900, 253400, 284300, 395800, 437900, 442700, 463000, 469300,
571800, 737700, 766100, 846100, 920300, 981100, 1546800, 2211000, 2229700.
Then a Pareto Distribution was fit via maximum likelihood to these 65 values:
α = 1.824 and θ = 251,823.197
This fitted Pareto Distribution was then compared to the remaining 65 values from the ungrouped
data in Section 2, that had not been used in the fit.
The Kolmogorov-Smirnov Statistic was 0.079.198
For 65 data points, the 20% critical value is: 1.07/ 65 = 0.133.
Since 0.079 < 0.133, we do not reject this maximum likelihood Pareto at 20%.
After performing the hypothesis test, if the fit is acceptable, one would then fit the distribution to the
entire original data set. In this case, the maximum likelihood Pareto for the entire data set in Section 2
has α = 1.702 and θ = 240,151, as discussed previously.
195
196
A random permutation of the numbers 1 to 130 was produced, using the function RandPermutation in
Mathematica. Then the first half of this random permutation was used to select the values from the original data set.
How to generate random permutations and subsets is discussed in a subsection of Section 4 of “Mahlerʼs Guide to
Simulation.” You almost certainly do not need to know how to do so for your exam.
197
When a Pareto was fit via maximum likelihood to all 130 data points, α = 1.702 and θ = 240,151.
198
For the maximum likelihood Pareto fit to all the data and then compared to all of this data, the Kolmogorov-Smirnov
Statistic was 0.059. With 1/2 the data, one would have expected a K-S Statistic here of about: (0.059) 2 = 0.083.
A Simulation Example of the Relationship of Sample Size and the Power of a Test:
Given some frequency data, we fit a Negative Binomial Distribution by Method of Moments.199
Then we compare the fitted distribution to the data and compute the Chi-Square Statistic.
In this example, I simulated from a two-point mixture of Poisson Distributions with means 0.1 and
0.5 and equal weight; at random we are equally likely to take a random draw from a Poisson with
λ = 10% or λ = 50%. Thus in this simulation experiment we know that the null hypothesis is not true;
the sample is not drawn from a Negative Binomial Distribution.
In applying the Chi-Square Goodness of fit test, here I had a final interval of 3 or more claims.
Thus there were four intervals and two fitted parameters. Thus the number of degrees of freedom is:
4 - 1 - 2 = 1. For 1 degree of freedom the 10% critical value is 2.706, while the 2.5% critical value is
5.024.
First I simulated 100 different samples each of size 1000, and recorded the Chi-Square statistics.
Here is a histogram of the results:
Sample Size 1000

Number
60
50
40
30
20
10
Stat.
2.706 5.024
In 87 out of 100 cases the statistic is less than 2.706. Thus at a 10% significance level, the power of
this test is only: 13/100 = 13%. In 98 out of 100 cases the statistic is less than 5.024. Thus at a
2.5% significance level, the power of this test is only: 2/100 = 2%.
199
We set rβ equal to the sample mean, and rβ(1+β) equal to the second moment minus the square of the mean.
Next I simulated 100 different samples each of size 10,000, and recorded the Chi-Square statistics.
Sample Size 10,000

Number
50
40
30
20
10
Stat.
2.706 5.024
In 65 out of 100 cases the statistic is less than 2.706. Thus at a 10% significance level, the power of
this test is only: 35/100 = 35%. In 84 out of 100 cases the statistic is less than 5.024. Thus at a
2.5% significance level, the power of this test is only: 16/100 =16%.
While the power of the test is greater for sample sizes of 10,000 rather than 1000, it is still not very
big; there is large probability of a Type II error.200 For a sample of this size, it is hard to distinguish a
Negative Binomial Distribution from a two-point mixture of Poissons.201
200
In this case, the mean frequency is 0.3. With a higher mean frequency, the power of the test for a given sample
size would be larger, all else being equal.
201
Recall that if one mixes Poissons via a Gamma, then one gets a Negative Binomial.
I simulated 100 different samples each of size 100,000, and recorded the Chi-Square statistics.
Sample Size 100,000

Number
30
25
20
15
10
Stat.
5.024
In none of 100 cases is the statistic less than 2.706. Thus at a 10% significance level, the estimated
power of this test is 100%.202 In 6 out of 100 cases the statistic is less than 5.024. Thus at a 2.5%
significance level, the power of this test is: 94/100 = 94%.203
This illustrates an important general feature: as the sample size increases the power of a statistical
test increases, all else being equal.
Also, we note that for smaller samples, it is quite common to fail to reject the null hypothesis; the
probability of a Type II error is large and the power of the test is small. However, failing to reject
does not imply that the null hypothesis is true. This is one reason why one should say “do not reject”
rather than “accept”.
202
The actual power of the test is large but somewhat less than 100%.
203
One would need more simulations in order to get a more precise estimate of the power.
Problems:
14.1 (2 points) You assume that the mean severity is $4000. An insurer wrote 1000 exposure and
observed 85 claims. You observe an average severity of $5000 with an average variance of 35
million. You test the null hypothesis that the severity assumption is adequate. (This means that you
will reject the null hypothesis only if the assumptions are too low.) At what significance level do you
reject the null hypothesis? (Assume the observed variance is that of the populationʼs size of loss
distribution. Use the Normal Approximation.)
A. 10% B. 5% C. 2.5% D. 1% E. 0.5%
14.2 (2 points) Let H0 be the hypothesis that a particular set of claims are drawn from a Pareto
distribution with parameters α = 2 and θ = 1 million. Let H1 be the hypothesis that this set of claims
are drawn from a Pareto distribution with parameters α < 2 and θ = 1 million.
You then observe 5 claims all of which are greater than 1 million.
You reject the hypothesis H0 at which of the following levels of significance?
A. 10% B. 5% C. 1% D. 0.1% E. 0.01%
14.3 (2 points) One has fit both a Gamma and an Exponential Distribution to the same data via
maximum likelihood. The loglikelihood for the Gamma is: -1542.1. The loglikelihood for the
Exponential is: -1545.8. Perform a likelihood ratio test using the form and all of the terminology of
hypothesis testing.
14.4 (2 points)
Let H0 be the hypothesis that a particular set of claims are drawn from a distribution F.
Let H1 be the hypothesis that this set of claims are drawn from a distribution with a heavier righthand
tail than F.
You then observe 3 claims, all of which are greater than x.
How large does F(x) have to be, so that you reject the hypothesis H0 at 10% significance?
A. 30% B. 54% C. 73% D. 90% E. 97%
14.5 (1 point) Which of the following statements about hypothesis testing is false?
A. The p-value is the probability given H0 is true, that the test statistic takes on a value equal
to its calculated value or a value less in agreement with H0 (in the direction of H1 ).
B. The p-value is the chance of a Type II error.
C. If the p-value is less than the chosen significance level, then we reject H0 .
D. A p-value of less than 1% for a Chi-Square test of a loss distribution, indicates strong
support for the hypothesis that the sample did not come from this loss distribution.
E. None of the above statements is false.
14.6 (2 points)
Let H0 be the hypothesis that a particular set of claims are drawn from a distribution F.
Let H1 be the hypothesis that this set of claims are drawn from a distribution with a heavier righthand
tail than F.
You then observe 3 claims, the maximum of which is greater than x.
How large does F(x) have to be, so that you reject the hypothesis H0 at 10% significance?
A. 30% B. 54% C. 73% D. 90% E. 97%

• Loss sizes follow an Exponential distribution with mean θ.
• The null hypothesis, H0 : θ ≥ 1000, is tested against the alternative hypothesis, H1 : θ < 1000.
• 100 losses that sum to 83,000 were observed.
Using the sample mean as the test statistic, determine the p-value of this test.
A. 1/2% B. 1% C. 5% D. 10% E. 20%
14.8 (2 points) You assume that the frequency is given by a Poisson Distribution with a mean of
0.07. An insurer wrote 1000 exposure and observed 85 claims. You test the
null hypothesis that the frequency assumption is adequate. (This means that you will reject the null
hypothesis only if the assumptions are too low.) At what significance level do you reject the null
hypothesis? (Use the Normal Approximation.)
A. 10% B. 5% C. 2.5% D. 1% E. None of A, B, C, or D.
14.9 (1 point) Briefly compare and contrast a Type I and a Type II error in the context of a spam
(junk) filter for email.
14.10 (1 point) A sample of size 5110 is drawn from a Normal Distribution with σ = 10.
The sum of the sample is 152,396. Test H0 : µ = 30 versus H1 : µ ≠ 30.
What is the p-value of this test?
A. Less than 0.01
E. At least 0.20

• Loss sizes follow a Pareto distribution, with parameters α (unknown) and θ = 10.
• The null hypothesis, H0 : α = 2, is tested against the alternative hypothesis, H1 : α < 2.
• Two losses of sizes 15 and 25 are observed.
14.12 (1 point) A sample of size 700 is drawn from a Normal Distribution with σ = 300.
The sum of the sample is 1.3 million. Test H0 : µ = 1850 versus H1 : µ > 1850.
What is the p-value of this test?
A. Less than 0.01
E. At least 0.20
14.13 (3 points) Let X1 , X2 , . . . , X36 be a random sample from a normal distribution with mean µ
and variance 100. In testing the null hypothesis H0 : µ = 0 against the alternative hypothesis
H1 : µ = -8, the critical region is X ≤ k.

If the significance level of the test is 1%, then what is the probability of a Type II error?
A. Less than 0.5%
B. At least 0.5%, but less than 1%
C. At least 1%, but less than 2.5%
D. At least 2.5%, but less than 5%
E. At least 5%
14.14 (3 points) Twenty observations from a LogNormal Distribution with σ = 1.5 are used to test
the null hypothesis H0 : µ = 8 against the alternative H1 : µ = 9.
20
Let H0 be rejected if the geometric average of the observations, (i ∏= 1Xi)1/ 20 , is greater than k.
If the significance level of this test is 10%, what is the probability of a Type II error?
A. Less than 2.5%
B. At least 2.5%, but less than 5%
E. At least 25%
14.15 (5 points) Medium Insurance Company (MIC) wants to determine which of its policyholders
have an expected loss ratio of more than 65%.
The annual loss ratio for each policyholder follows a Normal distribution, with σ = 15%.
Let H0 : µ = 65%, versus H1 : µ = 80%.
For each policyholder, the loss ratios for each of five years will be averaged.
H0 will be rejected if this average is greater than c.
Graph the probabilities of Type I and Type II errors, for c between 65% and 80%.
14.16 (3 points) Small Insurance Company (SIC) believes that it has underwritten each of its
accounts to an expected loss ratio of 70% or less. Consider this the null hypothesis.
SIC is reviewing its portfolio and decides that it will cancel any policyholder for whom it is 90%
certain that the expected loss ratio is more than 70%, based on a review of the average observed
loss ratio over the last n years.
The annual loss ratio for each policyholder follows a Normal distribution, with σ = 20%.
Let π(x, n) represent the power of this test where:
x = true expected loss ratio for a given policyholder
n = number of years used to compute the average observed loss ratio
Which of the following is the largest?
A. π(100%, 1) B. π(95%, 2) C. π(90%, 3) D. π(85%, 4) E. π(80%, 5)

• A random variable X follows a lognormal distribution with parameter σ = 1 and unknown µ.
• H0 : µ = 6.
• H1 : µ < 6.
• N observations of X will be drawn from the lognormal distribution.
• The null hypothesis is rejected if X < 600.
Using a normal approximation, calculate the minimum value of N which results in the probability of a
Type I error being less than 10%.
(A) Less than 260
(E) At least 290
14.18 (2, 5/83, Q.10) (1.5 points) Let X1 , X2 , . . . , X16 be a random sample from a normal
distribution with mean µ and variance 16. In testing the null hypothesis H0 : µ = 0 against the
alternative hypothesis H1 : µ = 1, the critical region is X > k.

If the significance level (size) of the test is 0.03, then the respective values of k and the probability
of a Type II error for µ = 1 are:
A. 0.48, 0.02 B. 0.48, 0.97 C. 1.88, 0.19 D. 1.88, 0.81 E. 1.88, 0.97
14.19 (2, 5/85, Q.22) (1.5 points) Let p represent the proportion of defectives in a manufacturing
process. To test H0 : p ≤ 1/4 versus H1 : p > 1/4, a random sample of size 5 is taken from the
process. If the number of defectives is 4 or more, the null hypothesis is rejected.
What is the probability of rejecting H0 if p = 1/5?
A. 6/3125 B. 4/625 C. 21/3125 D. 3104/3125 E. 621/625
14.20 (2, 5/85, Q.23) (1.5 points) Let X have the density function
f(x) = (θ + 1)xθ for 0 < x < 1.
The hypothesis H0 : θ = 1 is to be rejected in favor of H1 : θ = 2 if X > 0.90.
What is the probability of a Type I error?
A. 0.050 B. 0.095 C. 0.190 D. 0.810 E. 0.905
14.21 (2, 5/85, Q.42) (1.5 points) A researcher wants to test H0 : θ = 0 versus H1 : θ = 1, where θ is
a parameter of a population of interest. The statistic W, based on a random sample of the
population, is used to test the hypothesis. Suppose that under H0 , W has a normal distribution with
mean 0 and variance 1, and under H1 , W has a normal distribution with mean 4 and variance 1. If H0
is rejected when W > 1.50, then what are the probabilities of a Type I or Type II error respectively?
A. 0.07 and 0.01 B. 0.07 and 0.99 C. 0.31 and 0.01 D. 0.31 and 0.99 E. 0.93 and 0.99
14.22 (4, 5/87, Q.50) (1 point) Which of the following are true regarding hypothesis tests?
1. The test statistic has a probability of α of falling in the critical region when H0 is true,
where α is the level of significance.

2. One should reject the H0 when the test statistic falls outside of the critical region.
3. The fact that the test criteria is not significant proves that the null hypothesis is true.
A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3
14.23 (2, 5/88, Q.20) (1.5 points) A single observation X from the distribution with density function
α
f(x) = , for x > 1 is used to test the null hypothesis H0 : α = 2 against the alternative
(x + 1)α + 1
H1 : α = 4. Let H0 be rejected if X < k for some k.

If the probability of a Type I error is 3/4, what is the probability of a Type II error?
A. 1/16 B. 1/4 C. 7/16 D. 1/2 E. 15/16
14.24 (2, 5/88, Q.45) (1.5 points) One hundred random observations are taken from a Normal
Distribution, with mean µ and variance 4.
To test H0 : µ = 3 versus H1 : µ > 3, a critical region of the form X > c is to be used.
What is the value of c such that the probability of a Type I error is 0.10?
A. 3.17 B. 3.26 C. 3.33 D. 3.51 E. 5.56
14.25 (4, 5/89, Q.52) (3 points) An insurer calculates its premium rates for a certain line of insurance
assuming a Poisson claim frequency with a mean = 0.01 and an expected claim severity of $1,500.
The insurer wrote 10,000 of these policies and observed 115 claims with an average severity of
$2,000 and an average standard deviation of $2,000.
Using a 5% level of significance, test the null hypothesis that the underlying assumptions are
adequate. (This means that the insured will reject the null hypothesis only if his assumptions are too
low.)
(Note: Assume the sample standard deviation matches the population's claim severity standard
deviation. Use the normal approximation.)
A. Do not reject underlying frequency assumption, do not reject underlying severity assumption
B. Do not reject underlying frequency assumption, reject underlying severity assumption
C. Reject underlying frequency assumption, do not reject underlying severity assumption
D. Reject underlying frequency assumption, reject underlying severity assumption
E. Cannot be determined
14.26 (2, 5/90, Q.17) (1.7 points) Let X1 , X2 be a random sample from a Poisson distribution with
mean θ. The null hypothesis H0 : θ = 5 is to be tested against the alternative hypothesis H1 : θ ≠ 5
using the test statistic X = (X1 + X2 )/2.
What is the probability of a Type I error if the critical region is | X - 5| ≥ 4?

8 9
A. 1 - ∑ e− 5 5y / y! B. 1 - ∑ e− 5 5y / y!
y=2 y=1
8 17
C. 1 - ∑ e−10 10y / y! D. 1 - ∑ e−10 10y / y!
y=2 y=0
17
E. 1 - ∑ e−10 10y / y!
y=3
14.27 (2, 5/90, Q.29) (1.7 points) Let X1 , X2 ,. . . . , Xn be a random sample from a normal
distribution with mean µ and variance 50. The null hypothesis H0 : µ = 10 is to be tested against the
alternative hypothesis H1 : µ = 15 using the critical region X ≥ 13.75.

What is the smallest sample size required to ensure that the probability of a Type II error is less than
or equal to 0.31?
A. 2 B. 4 C. 5 D. 8 E. 20
14.28 (2, 5/92, Q.20) (1.7 points) Let p be the probability of success of a Bernoulli trial, and let X
be the number of successes in 4 trials. In testing the null hypothesis H0 : p = 0.50 against the
alternative hypothesis H1 : p = 0.25, the critical region is X ≤ 1.
What is the probability of a Type II error?
A. 27/128 B. 67/256 C. 5/16 D. 11/16 E. 189/256
14.29 (2, 5/92, Q.22) (1.7 points) Let X1 , X2 be a random sample from a distribution with density
function f(x) = θxθ−1 for 0 < x < 1, where θ > 0. The null hypothesis H0 : θ = 3 is tested against the
alternative hypothesis H1 : θ = 2 using the statistic Y = max(X1 , X2 ).

If the critical region is {Y: Y < 1/2}, then what is the probability of a Type I error?
A. 1/64 B. 1/20 C. 1/4 D. 3/4 E. 63/64
14.30 (1 point) In the previous question, what is the probability of a Type II error?
14.31 (2, 2/96, Q.16) (1.7 points) Let X be a single observation from a continuous distribution with
density f(x) = exp[-|x-θ|]/2, for -∞ < x < ∞.
The null hypothesis H0 : θ = 0 is tested against the alternative hypothesis H1 : θ = 1.
The null hypothesis is rejected if X > k. The probability of a Type I error is 0.05.
Calculate the probability of a Type II error.
A. 0.0184 B. 0.1359 C. 0.8641 D. 0.9500 E. .9816
14.32 (2, 2/96, Q.19) (1.7 points) Let X1 ,..., Xn and Y1 ,..., Yn be independent random samples
from normal distributions with means µX and µY and variances 2 and 4, respectively.
The null hypothesis H0 : µX = µY is rejected in favor of the alternate hypothesis H1 : µX > µY
if X - Y > k.
Determine the smallest value of n for which a test of significance level (size) 0.025
has power of at least 0.5 when µX = µY + 2.
A. 3 B. 4 C. 5 D. 6 E. 8
14.33 (2, 2/96, Q.20) (1.7 points) Five hypotheses are to be tested using five independent test
statistics. A common significance level (size) for each test is desired which ensures that the
probability of rejecting at least one hypothesis is 0.4, when all five hypotheses are true,
Determine the desired common significance level (size).
A. 0.040 B. 0.080 C. 0.097 D. 0.167 E. 0.400
• A portfolio consists of 100 identical and independent risks.
• The number of claims per year for each risk follows a Poisson distribution with mean θ.
• You wish to test the null hypothesis H0 : θ = 0.01 against
the alternative hypothesis H1 : θ > 0.01.
• The null hypothesis will be rejected if the number of claims for the entire portfolio in
the latest year is greater than or equal to 3.
Without using a normal approximation, determine the significance level of this test.
A. Less than 0.01
E. At least 0.20
• A portfolio of independent risks is divided into two classes.
• The number of claims per year for each risk follows a Poisson distribution with mean θ,
where θ may vary by class, but does not vary within each class.
• The observed number of claims for the latest year has been recorded as follows:
Class Number of Risks Number of Claims
1 100 4
2 25 0
• For each class individually, you wish to test the null hypothesis H0 : θ = 0.10
against the alternative hypothesis H1 : θ < 0.10.

A. H0 will be rejected at the 0.01 significance level for both classes.
B. H0 will be rejected at the 0.05 significance level for both classes,
but will be not be rejected at the 0.01 level for both classes.
C. H0 will be rejected at the 0.05 significance level for Class 1,
but will not be rejected at the 0.05 level for Class 2.
D. H0 will be rejected at the 0.05 significance level for Class 2,
but not be rejected at the 0.05 level for Class 1.
E. H0 will not be rejected at the 0.05 significance level for both classes.
• Claim sizes follow a Pareto distribution, with parameters α (unknown) and θ = 10,000.
• The null hypothesis, H0 : α = 0.5, is tested against the alternative hypothesis, H1 : α < 0.5.
• One claim of 9,600,000 is observed.
• The annual number of claims follows a Poisson distribution with mean λ.
• The null hypothesis, H0 : λ = m, is to be tested against the alternative hypothesis,
H1 : λ < m, based on one year of data.

• The significance level must not be greater than 0.05.
Determine the smallest value of m for which the critical region could be nonempty.
A. Less than 0.5
E. At least 3.5
14.38 (4B, 11/99, Q.30) (1 point) You wish to test the hypothesis that a set of data arises from a
given parametric distribution with given parameters. (Thus, no parameters are estimated from the
data.) Which of the following statements is true?
A. The value of the Chi-Square statistic depends on the endpoints of the chosen classes.
B. The value of the Chi-Square statistic depends on the number of parameters of the distribution.
C. The value of the Kolmogorov-Smirnov statistic depends on the endpoints of the chosen classes.
D. The value of the Kolmogorov-Smirnov statistic depends on the number of parameters of
the distribution.
E. None of the above statements is true.
14.39 (CAS3, 5/05, Q.24) (2.5 points)

Which of the following statements about hypothesis testing are true?
1. A Type I error occurs if H0 is rejected when it is true.
2. A Type II error occurs if H0 is rejected when it is true.
3. Type I errors are always worse than Type II errors.
14.40 (CAS3, 11/05, Q.7) (2.5 points)

You are given the following information about an experiment:
• The population is from a normal distribution.
• Normal distribution values:
Φ(x) x
0.93 1.476
0.94 1.555
0.95 1.645
0.97 1.751
• H0 : µ = 10
• H1 : µ = 11
• σ2 = 1
• The probability of Type I error is 0.05.
• The probability of Type II error is no more than 0.06.
Calculate the minimum sample size for the experiment.
A. 10 B. 11 C. 12 D. 13 E. 14
14.41 (CAS3, 11/07, Q.7) (2.5 points)

You are given the following information on a random sample:
• Y = X1 +...+ Xn where the sample size, n, is equal to 25 and the random variables are
independent and identically distributed
• Xi has a Poisson distribution with parameter λ
• H0 : λ = 0.1
• H1 : λ < 0.1
• The critical region to reject H0 is Y ≤ 3
Calculate the significance level of the test.
A. Less than 0.50
E. At least 0.80
14.42 (CAS3L, 5/08, Q.5) (2.5 points) Let X be a random variable.

X is normally distributed with σ = 1.5 and either µ = 1 or µ = 5.
Consider the following hypotheses:
H0 : X is normally distributed with µ = 1 and σ = 1.5.
H1 : X is normally distributed with µ = 5 and σ = 1.5.

You perform hypothesis testing by observing one value of X and rejecting H0 if this value
exceeds k.
If the probability of a Type I error is 2.5%, calculate the probability of a Type II error.
A. Less than 10%
E. At least 40%
• A gambler at a casino believes his probability of winning a game of chance is greater than
the probability projected by the casino management, which is 9/19.
• The gambler will make 1,000 consecutive bets and will either win $10
or lose $10 on each bet.
• H0 : The gambler's probability of winning is 9/19.
• H1 : The gambler's probability of winning is greater than 9/19.
• The null hypothesis is rejected if the gambler's net winnings exceed k.
Calculate the lowest value of k at which the probability of a Type I Error is less than 1.0%.
A. Less than $140
E. At least $230
• A random variable X follows a geometric distribution with parameter β.
• H0 : β = 1.5
• H1 : β = 0.5
• You observe one value of X and reject the null hypothesis if the observed value is equal to
0 or 1.
Calculate the power (i.e. 1 - P(Type II Error)) of this test.
A. Less than 20%
E. At least 80%
14.45 (CAS3L, 11/11, Q.20) (2.5 points) You are given the following hypothesis test:
• A random variable has a normal distribution with a known variance of 9.
• 10 observations are drawn from this distribution.
• H0 : µ ≤ 50
• HA: µ > 50
• The significance level of the test is α = 0.05.
When µ = 52, calculate the probability of a Type II error.
A. Less than 31%
E. At least 34%
• A random variable X follows a normal distribution with a known variance of 20.
• Sample size = 100.
• H0 : µ = 4
• H1 : µ = 5
• We reject the null hypothesis when X ≥ 4.75.

Calculate the absolute value of the difference in the probabilities of Type I and Type II error.
A. Less than 5%
E. At least 20%
• X1 , ... , X100 is a random sample of size 100 from a Normal distribution,
with known variance equal to 25.
• H0 : µ = k
• H1 : µ = 1.2k
• Given the critical region selected, the probability of a Type I error is 0.0013
• Given the critical region selected, the probability of a Type II error is 0.9772
Calculate k.
A. Less than 1.50
E. At least 2.25
14.48 (CAS ST, 5/14, Q.10) (2.5 points)

You are given the following information about a hypothesis test:
• The data is from a Normal Distribution with variance 25
• H0 : µ = 4
• H1 : µ = 7
• H0 is rejected when X ≥ 7.25
• The sample size is 80.
Calculate the probability of making a Type II error.
A. Less than 0.35
E. At least 0.65
14.1. A. The variance of the mean severity for 85 claims is: 35 million/85 = 411,765.
The chance that we would observe an average severity of 5000 or more:
1 -Φ[(5000-4000)/ 411,765 ] = 1 - Φ(1.56) = 1 - 0.941 = 0.059 < 0.10.
Thus we reject at 10%. Since .059 > .05, we do not reject at 5%.
Alternately, the total observed losses are: (85)(5000) = 425,000.
The expected losses for 85 claims are: (85)(4000) = 340,000.
The variance of the sum of 85 claims is: (85)(35 million) = 2975 million.
Thus the chance of observing 425,000 or more of losses is:
1 - Φ[(425,000 - 340,000)/ 2975 million ] = 1 - Φ(1.56) = 1 - 0.941 = 0.059 < 0.10.
Thus we reject at 10%, but not at 5%.
14.2. D. If H0 is true, then the probability that a claim exceeds $1 million = {θ/(θ+x)}α = 1/22 = 0.25.
Thus the probability that five claims all exceed $1 million is: 0.255 = 0.098%. We can reject at a
0.1% significance level since there is less than a 0.1% chance of rejecting the null hypothesis H0
when it is in fact true. We can not reject at a .01% level since .098% > .01%.
Comment: The p-value is .098%.
14.3. H0 is that the data is a random draw from the simpler Exponential Distribution, a special case
of the Gamma Distribution. H1 is that the data is a random draw from the Gamma Distribution, a
generalization of the Exponential Distribution with one more parameter α.
The test statistic follows a Chi-Square distribution with 1 degree of freedom, the difference in the
number of fitted parameters for the Exponential and Gamma.
For significance levels of 5%, 2.5%, 1%, and 1/2% the critical values are: 3.84, 5.02, 6.64, and 7.88.
Since 7.4 > 6.64, reject H0 in favor of H1 at a 1% significance level.
(At a 1% level, the rejection or critical region is χ2 > 6.64.)

Since 7.4 ≤ 7.88, do not reject H0 in favor of H1 at a 1/2% significance level.
(At a 1/2% level, the rejection or critical region is χ2 > 7.88.)
The p-value of the test is between 1/2% and 1%.
14.4. B. Prob[observation | H0 true] = S(x)3 . We would reject H0 at 10%, if this probability were
less than 10%. S(x)3 < 0.1. ⇒ S(x) < 0.464. ⇔ F(x) > 0.536.
14.5. B. The p-value = Prob[Type I error] = Prob[rejecting H0 when it is true]. ⇒

Statement B is false. In general, a small p-value indicates a bad fit. Loss Models states that a
p-value of less than 1% indicates strong support for H1 , the alternative hypothesis that the data
sample did not come from this loss distribution. ⇒ Statement D is true.
14.6. E. Assuming the data is drawn from F, the distribution function of the maximum is F(x)3 .
Prob[observation | H0 true] = 1 - F(x)3 . We would reject H0 at 10%, if this probability were less
than 10%. 1 - F(x)3 < 0.1. ⇒ F(x)3 > 0.9. ⇔ F(x) > 0.965.
14.7. C. The sum of 100 independent, identically distributed Exponentials has mean 100θ and
variance 100θ2. Thus the average has mean θ and variance θ2/100.
Assuming θ = 1000, Prob[ X ≤ 83,000/100] ≅ Φ[(830 - 1000)/100] = Φ[-1.7] = 4.46%.

We work with the value of θ in H0 closest to H1 .
If θ > 1000, Prob[ X ≤ 830] is smaller than 4.46%.
830 - 1100
For example, if θ = 1100, Prob[s X ≤ 830] ≅ Φ[ ] = Φ[-2.45] = 0.71%.
110
A graph of Prob[ X ≤ 830], as a function of theta:
Prob.
0.04
0.03
0.02
0.01
theta
1050 1100 1150 1200 1250 1300
14.8. B. The expected number of claims is: (1000)(.07) = 70. The variance of the number of claims
is 70. The standard deviation is: 70 = 8.37. The chance that we would observe 85 or more claims
from the assumed Poisson is: 1 - Φ[(84.5-70)/8.37] = 1 - Φ(1.73) = 0.0418.
0.05 > 0.0418 > 0.025 ⇒ we reject at 5%, and do not reject at 2.5%.
14.9. The null hypothesis is that a given email is not spam (junk). If the filter labels as spam an
email that is not spam, then it has made a Type I error. If instead, the filter fails to label as spam an
email that is spam, then it has made a Type II error.
Comment: Since you really do not want to miss an important email, the filter should be designed so
that it is more worried about making a Type I error than a Type II error.
In the case of a test screening for a type of cancer, the null hypothesis is that the cancer is not
present. If the test detects cancer when it is not there (a false positive), then it has made a Type I
error. If instead, the test fails to detect a cancer when it is present, then it has made a Type II error.
In this case, the doctor should be more worried about making Type II errors than Type I errors.
14.10. E. X = 152396/5110 = 29.8231. Z = (29.8231 - 30)/(10/ 5110 ) = -1.265.

This is a two-sided test, with p-value: 2{1 - Φ(1.265)} = 2(1 - 0.8971) = 20.6%.
10.3% 10.3%
- 1.265 1.265
Comment: If instead H1 : µ < 30, then the p-value is: Φ(-1.265) = 1 - .8971 = 10.3%.
10.3%
- 1.265 1.265
If instead H1 : µ > 30, then the p-value is: 1 - Φ(-1.265) = Φ(1.265) = 0.8971 = 89.7%.
Do not reject H0 !
89.7%
- 1.265 1.265
14.11. B. The alternate hypothesis is α < 2, which is a heavier-tailed Pareto, with a higher
probability of a very large loss.
Thus we reject the null hypothesis when we observe large losses.
We compute the chance of the given observation or a more extreme observation:
the chance of a loss of size 15 or more times the chance of a loss of size 25 or more.
The survival function of the Pareto for α = 2 is: S(x) = 1/(1 + x/10)2 .
S(15) = 16%. S(25) = 8.16%.
Prob[this observation or something more extreme in the direction of H1 | H0 ]
= Prob[first loss ≥ 15 and second loss ≥ 25 | H0 ] =
Prob[first loss ≥ 15 | H0 ] Prob[second loss ≥ 25 | H0 ] = S(15)S(25) = (16%)(8.16%) = 1.3%.
Thus since 2% > 1.3% > 1%, we reject H0 at 2% and do not reject at 1%.
Comment: Similar to Q. 14.28 (4B, 11/98, Q.7). One can not use the likelihood ratio test, since we
do not have one distribution that is a special case of another.
14.12. E. X = 1,300,000/700 = 1857.14. Z = (1857.14 - 1850) / (300/ 700 ) = 0.63.

This is a one-sided test, with p-value: 1 - Φ(0.63) = 26.43%.
26.43%
-3 -2 -1 0.63 2 3
14.13. B. 1% = Prob[ X ≤ k | µ = 0] = Φ[k/ 100 / 36 )]. -2.326 = 0.6k. k = -3.877.

Prob[Type II error] = Prob[fail to reject | µ = -8] = Prob[ X > -3.877 | µ = -8]
= 1 - Φ[(-3.877 - (-8)) / 100 / 36 ] = 1 - Φ[2.474] = 0.67%.
20
14.14. B. Let Y = (i ∏= 1Xi)1/ 20 be the Geometric Average.
Then ln(Y) = ΣXi/20, which is Normal with mean µ and σ = 1.5/ 20 = .3354.
Prob[Type I error] = Prob[lnY > lnk | µ = 8] = 1 - Φ[(ln k - 8)/0.3354] = 10%.

1.282 = (ln k - 8)/0.3354. k = 4582.
Prob[Type II error] = Prob[lnY ≤ ln4582 | µ = 9] = Φ[(ln4582 - 9)/0.3354] = Φ[-1.700] = 4.5%.
14.15. The average loss ratio has mean µ, and standard deviation 15/ 5 = 6.708.
Prob[Type I Error] = Prob[Loss Ratio > c | µ = 65] = 1 - Φ[(c - 65)/6.708] = Φ[9.690 - c /6.708].
Prob[Type II Error] = Prob[Loss Ratio ≤ c | µ = 80] = Φ[(c - 80)/6.708] = Φ[c/6.708 - 11.93].
Prob. of Error
0.5
0.4
0.3
Type1 Type2
0.2
0.1
c
66 68 70 72 74 76 78 80
14.16. B. Over n years, the average loss ratio has mean µ, variance σ2/n,
and standard deviation 20/ n .

We will reject when the observed loss ratio > c.
We want there to be at most a 10% chance that if µ = 70, we would see a loss ratio of c or more.
10% = Prob[Loss Ratio > c | µ = 70] = 1 - Φ[(c - 70)/(20/ n )].
⇒ (c - 70)/(20/ n ) = 1.282. ⇒ c = 70 + 1.282(20/ n ). Reject when loss ratio > 70 + 25.64/ n .

Power(x, n) = Prob[Reject H0 | µ = x] = Prob[y > 70 + 25.64/ n | µ = x] =
1 - Φ[(70 + 25.64/ n - x)/(20/ n )] = 1 - Φ[(3.5 - x/20) n + 1.282].

π(100%, 1) = 1 - Φ[(3.5 - 5) 1 + 1.282] = 1 - Φ[-0.218] = Φ[0.218] = 58.6%.
π(95%, 2) = 1 - Φ[(3.5 - 4.75) 2 + 1.282] = 1 - Φ[-0.486] = Φ[0.486] = 68.7%.
π(90%, 3) = 1 - Φ[(3.5 - 4.5) 3 + 1.282] = 1 - Φ[-0.450] = Φ[0.450] = 67.4%.
π(85%, 4) = 1 - Φ[(3.5 - 4.25) 4 + 1.282] = 1 - Φ[-0.218] = Φ[0.218] = 58.6%.
π(80%, 5) = 1 - Φ[(3.5 - 4) 5 + 1.282] = 1 - Φ[0.164] = Φ[-0.164] = 43.5%.
Comment: Similar to CAS3, 5/06, Q.6. There is no need to look up values in the Normal Table in
order to determine which one is biggest.
14.17. E. If H0 is true, then the LogNormal Distribution has mean of:

exp[6 + 12 /2] = exp[6.5] = 665.14,
second moment of: exp[(2)(6) + (2)(12 )] = exp[14], and variance of: e14 - e13 = 760,191.
If H0 is true, then X has a mean of 665.14.
If H0 is true, then X has a variance of: 760,191 / N.

Thus using the Normal Approximation, if H0 is true
600 - 665.14
Prob[ X < 600] = Φ[ ] = Φ[-0.07471 N ].
760,191/ N
The probability of a Type I error is:
Prob[Reject | H0 ] = Prob[ X < 600 | µ = 6] = Φ[-0.07471 N ].
In order for this probability to be less than 10%, we need: 0.07471 N > 1.282.
Therefore, N > 1.2822 / 0.074712 = 294.45. Smallest N is 295.
Comment: Similar to CAS3L, 11/10, Q.21.
14.18. D. X is Normal with mean µ and variance: 16/16 = 1
The probability of rejecting if µ = 0 is: 1 - Φ(k) = 0.03. ⇒ k = 1.88.

Prob[Type II error] = Prob[not rejecting | µ = 1] = Φ(1.88 - 1) = Φ(0.88) = 0.81.
14.19. C. Prob[4 or more defectives | p = 1/5] = (5)(1/5)4 (4/5) + (1/5)5 = 21/3125.
14.20. C. f(x) = (θ + 1)xθ. F(x) = xθ+1.

Prob[Type I error] = Prob[reject | H0 ] = Prob[X > 0.90 | θ = 1] = 1 - 0.92 = 0.19.
14.21. A. Prob[Type I Error] = Prob[Rejecting H0 | H0 is true] = 1 - Φ(1.5/1) = 0.0668.
Prob[Type II Error] = Prob[Not Rejecting H0 | H1 is true] = Φ[(1.5-4)/1] = Φ(-2.5) = 0.0062.
14.22. A. 1. True. This is the definition of the significance level. For example, for a Chi-Square Test,
a 1% significance means that there is a 1% chance that χ2 > critical value.
2. False. One should not reject the null hypothesis (at the given level of significance) when the test
statistic falls outside of the critical region. 3. False. The fact that the test criteria is not significant merely
tells us that the data do not contradict the null hypothesis, rather than proving that H0 is true.
14.23. A. f(x) = αx−(1+α), for x > 1. F(x) = 1 - x−α, for x > 1.

3/4 = Prob[Type I Error] = Prob[reject when should not] = Prob[X < k | H0 ] =
Prob[X < k | α = 2] = 1 - k-2. ⇒ 1/4 = 1/k2 . ⇒ k = 2.
Prob[Type II Error] = Prob[do not reject when should] = Prob[X ≥ 2 | H1 ] =
Prob[X ≥ 2 | α = 4] = 2-4 = 1/16.
Comment: A Single Parameter Pareto Distribution, with θ = 1.
14.24. B. 0.10 = Prob[Type I error] = Prob[Reject when H0 true] = Prob[ X > c | µ = 3] =
1 - Φ[(c- 3)/ 4 / 100 ]. ⇒ 0.10 = Φ((c- 3)/.2) ⇒ (c - 3)/.2 = 1.282. ⇒ c = 3.256.

14.25. B. Mean number of claims is (10,000)(0.01) = 100. For 10,000 independent policies each
with a variance of 0.01, the variance is (10,000)(0.01) = 100.
Thus the chance of 115 claims or more is approximately: 1- Φ[(114.5 - 100)/ 100 ] =
1 - Φ(1.45) = 1- 0.9265 = 7.35%.
Since 7.35% > 5%, we can not reject the frequency hypothesis at 5%.
If the standard deviation is $2000 for a single claim, then for the average of 115 claims it is
2000 / 115 = 186.5. The expected mean severity is 1500.
Prob[observed severity ≥ 2000] ≅ 1- Φ[(2000 - 1500)/186.5] = 1 - Φ(2.68) = 1 - 0.9963 =
0.37%. Since .37% < 5%, we can reject the severity hypothesis at 5%.
Comment: Note the use of the continuity correction in approximating the discrete frequency
distribution by a Normal, even though using 114.5 rather than 115 makes no difference to the
solution in this case.
14.26. E. If H0 is true, X1 + X2 is Poisson with mean 10.

| X - 5| ≥ 4. ⇔ X ≤ 1 or X ≥ 9. ⇔ X1 + X2 ≤ 2 or X1 + X2 ≥ 18.
17
Prob[reject when H0 true] = 1 - Prob[2 < X1 + X2 < 18] = 1 - ∑ e−1010y / y! .
y=3
14.27. D. Probability of a Type II Error is: Prob[failing to reject | H1 true] =

Prob[ X < 13.75 | H1 true] = Φ[(13.75 - 15)/ 50 / n ] = Φ(-0.1768 n ).
Set this probability equal to .31: Φ(-0.1768 n ) = .31. ⇒ -0.1768 n = -0.50. ⇒ n = 8.

Thus the smallest possible n is 8.
Comment: For n = 8, Probability of a Type II Error is: Φ[(13.75 - 15)/ 50 / 8 ] = Φ[-0.5] = 0.3085.
14.28. B. Prob[Type II error] = Prob[failing to reject when H0 false] = Prob[X > 1 | H1 ] =

1 - Prob[X = 0 or X = 1 | p = 0.25] = 1 - 0.754 - (4)(0.25)(0.753 ) = 67/256.
14.29. A. f(x) = θxθ−1. F(x) = xθ.

Prob[Type I error] = Prob[rejecting H0 when H0 is true] = Prob[max(X1 , X2 ) < 1/2 | θ = 3] =
Prob[X1 < 1/2 | θ = 3] Prob[X2 < 1/2 | θ = 3] = (1/2)3 (1/2)3 = 1/64.
14.30. Prob[Type II error] = Prob[not rejecting H0 | H1 ] = 1 - Prob[max(X1 , X2 ) < 1/2 | θ = 2] =
1 - Prob[X1 < 1/2 | θ = 2] Prob[X2 < 1/2 | θ = 2] = 1 - (1/2)2 (1/2)2 = 1 - 1/16 = 0.9375.
Comment: Power = Prob[rejecting H0 | H1 ] = 1/16.
If the critical region is {Y: Y < c}, then power = c4 and significance = c6. Not a good test!
14.31. C. By integrating f(x) from x to ∞: S(x) = exp[-(x-θ)]/2, x > θ.

0.05 = Prob[Type I error] = Prob[rejecting H0 when it is true] = Prob[X > k | θ = 0] =
exp[-k]/2. ⇒ 0.01 = e-k. ⇒ k = 2.303.

Prob[Type II error] = Prob[not rejecting H0 | H1 ] = Prob[X < 2.303 | θ = 1] =
1 - exp[-(2.303-1)]/2 = 1 - e-1.303/2 = 0.8641.
14.32. D. X - Y is Normal with mean µX - µY and variance 2/n + 4/n = 6/n.
When µX = µY, X - Y has mean 0. When µX = µY + 2, X - Y has mean 2.
.025 = Prob[Rejecting H0 when it is true] = Prob[ X - Y > k | µX = µY] = 1 - Φ[k/ 6 / n ].
⇒ k/ 6 / n = 1.96. ⇒ k = 4.801/ n .
0.5 ≤ Prob[Rejecting H0 when µX = µY + 2] = Prob[ X - Y > 4.801/ n | µX = µY + 2] =
1 - Φ[{(4.801/ n ) - 2}/ 6 / n ].
⇒ {(4.801/ n ) - 2}/ 6 / n ≤ 0. ⇒ (4.801/ n ) ≤ 2. ⇒ n ≥ 5.76. ⇒ n = 6.
14.33. C. For each test independently, α = Prob[Reject when hypothesis is true].
0.6 = Prob[Not rejecting none of the hypothesis when they are all true] = (1 - α)5 . ⇒ α = 0.097.
14.34. C. If H0 is true, each individual has a distribution which is Poisson with mean 0.01.
The sum of independent Poisson variables is Poisson; thus the portfolio has a Poisson Distribution
with a mean of (100)(0.01) = 1.
For such a Poisson the chance of x claims is given by: f(x) = e-1 (1)x / x! = e-1 / x! .
The chance of 3 or more claims (and thus rejecting the null hypothesis) is:
1 - {f(0) +f(1) +f(2)} = 1 - {e-1 + e-1 + e-1 /2} = 1 - 2.5e-1 = 1 - 0.920 = 0.080.
Comment: The significance level α of the test is the chance of rejecting the null hypothesis when it is
in fact true. If one used the Normal Approximation, the chance of 3 or more claims is approximately:
1 - Φ[(2.5 -1)/ 1] = 1 - Φ(1.5) = 1 - 0.9332 = 0.0668.
14.35. C. Adding up independent Poisson Distributions gives a Poisson Distribution with the sum
of the individual parameters. Therefore each class has a Poisson Distribution.
If each risk in Class 1 has a mean of θ1, then Class 1 has a mean of 100 θ1. If θ1 = 0.1, then Class 1
has a mean of 10. Thus observing only 4 claims might lead one to believe that θ1 < .1.
The chance of observing 4 or fewer claims is for a Poisson Distribution with mean λ:
e−λ{1 + λ + λ2/2 + λ3/6 + λ4/24}. If θ1 = 0.1, then λ = 100θ1 = 10, and the chance of 4 or fewer
claims is 644e-10 = 0.029. Thus for Class 1 we reject H0 at the 0.05 significance level, but do not
reject H0 at the 0.01 significance level.
If θ2 = 0.1, then Class 2 has a Poisson with a mean of (25)(0.1) = 2.5. Thus observing only 0 claims
might lead one to believe that θ2 < 0.1. The chance of observing 0 (or fewer claims) is for a Poisson
Distribution with mean λ: e−λ. If θ2 = 0.1, then λ = 2.5, and the chance of 0 claims is e-2.5 = 0.082.
Thus for Class 2 we do not reject H0 at the 0.05 significance level.
14.36. C. S(9,600,000) = {θ/(θ + 9,600,000)}α = (10000/9,610,000)0.5 = 3.2%.

Thus since 5% > 3.2% > 2%, we reject H0 at 5% and do not reject at 2%.
Comments: For smaller α, the Pareto is heavier-tailed and there is more chance of getting a claim as
large as 9.6 million or larger. For example, if instead α = 0.3, then S(9,600,000) = 12.7%, and we
would not reject at 10% the hypothesis that α = 0.3. The more extreme the observation, the more
likely we are to reject the null hypothesis. If for example, the single claim observed had been
instead 50 million, then the chance of observing such a large claim if α = 0.5 would be only 1.4%,
so we would in that case reject at 2% and do not reject at 1%.
14.37. D. If the observed number of claims is too low, then we will reject the null hypothesis that
λ = m in favor of the alternative hypothesis that λ is smaller than m. Assume for example m = 5 and
we observe only 1 claim. Then given the null hypothesis the chance that we could have observed 1
or fewer claims is e-5 + 5e-5 = 4.0%.
Thus we could reject the null hypothesis at a 5% significance level.
In general we would reject the null hypothesis at a 5% significance level if we observe c claims and
F(c) ≤ 5%. The critical region consists of those values of c for which we would reject at the given
significance level. For example, for m = 5, the critical region for a 5% significance level is {0, 1}.
Now the critical region will always be empty if we can not reject when we observe zero claims.
This will be the case if F(0) = e-m > significance level = α .
Thus the critical region is empty if m < -ln(α).
If α = 0.05 then the critical region is empty for m < -ln(0.05) = 2.996.
If α < 0.05, then -ln(α) > 2.996. Thus for example, if α = 0.01, then the critical region is empty for
m < -ln(0.01) = 4.605.
So over all α ≤ 0.05, the smallest m for which the critical region can be nonempty is m = 2.996 and
α = 5%. The corresponding critical region is {0}.
14.38. A. Statement A is true. If one groups the data differently the value of the Chi-Square
statistic changes. Statement B is false. The value of the Chi-Square statistic never depends directly
on the number of parameters. Statement C is false, since we should not group the data in order to
calculate the K-S Statistic. Statement D is false, because the K-S Statistic never depends directly on
the number of parameters.
Comment: As discussed in my section on the Chi-Square test, there are different rules of thumb
one can use to group data prior to calculating the Chi-Square Distribution. On your exam, use any
rule they tell you to in a question.
Note that even if the value of the Chi-Square statistic changes, this may or may not affect your
conclusion as to whether you do not reject or reject the hypothesis at a given significance level. If
one had fit parameters, then the number of degrees of freedom of the Chi-Square statistic would
have been reduced by the number of fitted parameters; however, the value of the Chi-Square
statistic never depends directly on the number of parameters, whether estimated from data or not.
14.39. A. Statement 1 is true.

A Type II error occurs if H0 is not rejected when it is false. ⇒ Statement 2 is false.
Depending on the situation being modeled, either type of error can be worse.
Comment: From a purely statistical point of view, one wants to avoid both types of errors, and
neither is inherently worse. However, for a given sample size, decreasing the probability of one
type of error increases the probability of the other type of error.
14.40. B. Since H1 : µ = 11 > 10, we reject H0 when X > c.
We are given that 5% = Prob[Type I error] = Prob[reject | H0 ] = Prob[ X > c | µ = 10] =
1 - Φ[(c - 10)/(1/ n )].
⇒ (c - 10)/(1/ n ) = 1.645. ⇒ n = 1.645/(c - 10).

We want 6% ≥ Prob[Type II error] = Prob[do not reject | H1 ] = Prob[ X ≤ c | µ = 11] =
Φ[(c - 11)/(1/ n )].
⇒ (c - 11)/(1/ n ) = -1.555. ⇒ n = -1.555/(c - 11).

Combining the two equations: 1.645/(c - 10) = -1.555/(c - 11). ⇒ c = 10.514.
⇒ n = (1.645/.514)2 = 10.24. So we need a sample size of at least 11.
14.41. D. The sum of 25 independent Poisson Distributions is another Poisson with 25 times the
mean. Significance level of the test = Prob[reject when H0 is true]
= Prob[number of claims ≤ 3 from a Poisson with mean (25)(0.1) = 2.5] = f(0) + f(1) + f(2) + f(3)
= e-2.5(1 + 2.5 + 2.52 /2 + 2.53 /6) = 75.8%.
Comment: One would not usually perform a statistical test with a significance level of 76%!
Significance levels are usually something like 5% or 1%.
Since H1 : λ < 0.1, it makes sense to reject when the observed number of claims is small.
14.42. C. 2.5% = Probability of a Type I error = Prob[Reject | H0 ] = Prob[X > k | µ = 1] =
1 - Φ[(k - 1)/1.5]. ⇒ (k - 1)/1.5 = 1.960. ⇒ k = 3.94.

Probability of a Type II error = Prob[Do not reject | H1 ] = Prob[X ≤ 3.94 | µ = 5] =
Φ[(3.94 - 5)/1.5] = Φ[-0.71] = 24%.
14.43. D. Let x be the number of games won by the gambler.

Then his net winnings are: (10) {x - (1000 - x)} = 20x - 10,000.
If H0 is true, then X is Binomial with m = 1000, and q = 9/19,
with mean 9000/19, and variance: (1000)(9/19)(10/19) = 90,000/361.
Thus the smallest number of wins at which the chance of a Type I error is 1% is:
9000/19 + 2.326 90,000 / 361 = 510.41.
This corresponds to net winnings of: (20)(510.41) - 10,000 = $208.2.
We can reject when net winnings exceed $208.2.
Comment: We make a Type I Error if we reject H0 when H0 is true.
We are assuming the outcomes of the bets are independent of each other.
For example, if he wins 505 out of 1000 times, his net winnings are 100.
Let us assume we reject when his net winnings are greater than 100.
Then using the continuity correction, probability of Type I Error is:
505.5 - 9000 / 19
Prob[more than 505 winnings | p = 9/19] = 1 - Φ[ ] = 1 - Φ[2.015] = 2.2% > 1%.
90,000 / 361
As we increase k, we reject H0 in fewer cases, and are less likely to make a Type I Error.
14.44. E. Power = Prob[Reject H0 | H1 ] = Prob[X = 0 or X = 1 | β = 0.5]

= 1/1.5 + 0.5/1.52 = 88.9%.
Comment: Significance = Prob[Reject H0 | H0 ] = Prob[X = 0 or X = 1 | β = 1.5]
= 1/2.5 + 1.5/2.52 = 64%.

With a sample size of only one, one can not get both a large power and a small significance.
14.45. C. X has mean µ, and variance: 9/10 = 0.9. Reject when X > c.
5% = Significance level = probability reject when H0 is true =
c - 50 c - 50
Prob[ X > c | µ = 50] = 1 - Φ[ ]. ⇒ 1.645 = . ⇒ c = 51.56.
0.9 0.9
Probability of Type II error when µ = 52 is:

51.56 - 52
Prob[failing to reject | µ = 52] = Prob[ X < 51.56] = Φ[ ] = Φ[-0.464] = 32.14%.
0.9
Comment: In order to determine the significance level of the test, we use µ = 50, the boundary of
the null hypothesis closest to the alternate hypothesis.
The power function at µ = 52 is: 1 - 32.14% = 67.86%.
4.75 - 4
14.46. E. Prob[Type I error] = Prob[ X ≥ 4.75 | µ = 4] = 1 - Φ[ ] = 1 - Φ[1.677] = 4.68%.
20 / 100
4.75 - 5
Prob[Type II error] = Prob[ X < 4.75 | µ = 5] = Φ[ ] = Φ[-0.559] = 28.81%.
20 / 100
|4.68% - 28.81%| = 24.13%.
14.47. E. Given the form of the alternative hypothesis, the critical region is of the form: X > c.
X is Normal with mean µ and variance 25/100 = 0.25.
c - k c - k
0.0013 = Prob[Type I Error] = Prob[ X > c | µ = k] = 1 - Φ[ ]. ⇒ = 3.0. ⇒ c = 1.5 + k.
0.5 0.5
c - 1.2k c - 1.2k
0.9772 = Prob[Type II Error] = Prob[ X < c | µ = 1.2k] = Φ[ ]. ⇒ = 2.0.
0.5 0.5
⇒ c = 1 + 1.2k.
Thus, 1.5 + k = 1 + 1.2k. ⇒ k = 2.5. ⇒ c = 4.
14.48. E. X is Normal with variance 25/80.

Prob[Type II Error] = Prob[Fail to reject | µ = 7] = Prob[ X < 7.25 | µ = 7] =
7.25 - 7
Φ[ ] = Φ[0.447] = 67.3%.
25 / 80
2016-C-6, Fitting Loss Dists. §15 Schwarz Bayesian Criterion, HCM 10/22/15, Page 492
Section 15, The Schwarz Bayesian Criterion204
The Likelihood Ratio Test can be used when one distribution is a special case (or limit) of the other.
Similar tests can be performed even when this is not the case.
By fitting more parameters we can increase the maximum loglikelihood, but there is the potential
problem of overfitting the data. One can avoid this by penalizing those fits with more parameters.
The Schwarz-Bayesian Criterion penalizes for model complexity.
The Schwarz Bayesian Criterion is an example of such a “penalized loglikelihood value”.
One adjusts the loglikelihoods by subtracting in each case the penalty:

(number of fitted parameters) ln(number of data points) / 2.
penalty = (r/2) ln(n) = r ln( n ), where r = number of parameters and n = number of data points.
One then compares these penalized loglikelihoods directly; larger is better.
Exercise: Both a Pareto (2 parameters) and a Transformed Gamma Distribution

(3 parameters) have been fit to the same 200 points via the Method of Maximum Likelihood.
The loglikelihoods are -820.09 for the Transformed Gamma and -822.43 for the Pareto.
Use the Schwarz Bayesian Criterion to compare the fits.
[Solution: The penalized likelihood value for the Transformed Gamma is:
-820.09 - (3)ln(200)/2 = -828.04. The penalized likelihood value for the Pareto is:
-822.43 - (2)ln(200)/2 = -827.73. Since -827.73 > -828.04, the Pareto is the better fit.]
Note that the distribution with more parameters receives a larger penalty to its loglikelihood. This is
consistent with the principle of parsimony. We avoid using more parameters unless it is worthwhile.
Also the penalties are larger for larger data sets; the increase is as per the log of the size of the data
set.205 The improvement in the fit from adding an extra parameter has to be larger in order to be
worthwhile in the case of a larger data set.
204
“Estimating the Dimension of a Model,” by Gideon Schwarz, The Annals of Statistics, 1978, Vol. 6, No. 2.
205
The likelihood ratio test does not use the size of the data set, although since it relies on an asymptotic result the
likelihood ratio test should not be applied to very small data sets.
Testing Other Hypothesis:
As with the likelihood ratio test, one can use the Schwarz Bayesian Criterion in order to test other
hypotheses. For example, one can test hypotheses involving restrictions on the relationships of the
parameters of the distributions of two related data sets, such as in the following previously
discussed example.
Phil and Sylvia are competitors in the light bulb business.

You were able to test 20 of Philʼs bulbs and 10 of Sylviaʼs bulbs:
Phil 20 20,000 1000
Sylvia 10 15,000 1500
You assume that the distribution of the lifetime (in hours) of a light bulb is Exponential.
Using maximum likelihood, separately estimating θ for Philʼs Exponential Distribution,

θ = 1000 with corresponding maximum loglikelihood: -20000/1000 - 20ln(1000) = -158.155.
Separately estimating θ for Sylviaʼs Exponential Distribution, θ = 1500.

Using maximum likelihood applied to all the data, estimating θ for Philʼs Exponential Distribution
restricted by Sylviaʼs claim that her light bulbs burn twice as long as Philʼs,
θP = (20000 + 15000/2)/(20 + 10) = 917. θS = 2θP = 1834.
The maximum loglikelihood is: -20000/917 - 20ln(917) - 15000/1834 - 10ln(1834) = -241.554.
Let the null hypothesis H0 be that Sylviaʼs light bulbs burn twice as long as Philʼs. Let the alternative
H1 be that H0 is not true. Then we can use the Schwarz Bayesian Criterion as follows.
The penalty is: (number of fitted parameters) ln(number of data points)/2 =

(number of fitted parameters)ln(30)/2 = (number of fitted parameters)(1.701).
For the unrestricted model we have two parameters, and the penalized loglikelihood is:
-241.287 - (2)(1.701) = -244.689.
For the restricted model we have one parameters, and the penalized loglikelihood is:
-241.554 - (1)(1.701) = -243.255.
Since -243.255 > -244.689, the unrestricted model is not significantly better than the restricted
model, and we do not reject H0 .
One reason we did not reject Sylviaʼs claim was due to the small sample size.
Exercise: Redo the above example with the following data:

Phil 2000 2,000,000 1000
Sylvia 1000 1,500,000 1500
[Solution: Separate estimate of θ for Philʼs Exponential Distribution, θ = 1000.
Separate estimate of θ for Sylviaʼs Exponential Distribution, θ = 1500.
Restricted by Sylviaʼs claim, θP = (2,000,000 + 1,500,000/2)/(2000 + 1000) = 917.
θS = 2θP = 1834. The maximum loglikelihood is:

-2,000,000/917 - 2000ln(917) - 1,500,000/1834 - 1000ln(1834) = -24,155.38.
Unrestricted loglikelihood is: -15,815.51 - 8313.22 = -24,128.73.
Unrestricted penalized loglikelihood is: -24,128.73 - (2)ln(3000)/2 = -24,136.74.
Restricted penalized loglikelihood is: -24,155.38 - (1)ln(3000)/2 = -24,159.38.
-24,136.74 > -24,159.38, the unrestricted model is significantly better, and we reject H0 .]
BIC (Bayesian Information Criteria):206
Bayesian Information Criteria (BIC) is mathematically equivalent to the Schwarz Bayesian Criterion.
The Bayesian Information Criteria can be used to compare a bunch of models all fit via maximum
likelihood to the same data.207 The model with the smallest BIC is preferred.
For a particular model:

BIC = (-2) (maximum loglikelihood) + (number of parameters) ln(number of data points).
206
See Applied Regression Analysis and Generalized Linear Models, by John Fox, not on the syllabus.
207
Including multiple regressions (with Normal errors) and Generalized Linear Models.
In the Schwarz Bayesian Criterion, one adjusts the loglikelihoods by subtracting in each case the
penalty: (number of fitted parameters) ln(number of data points) / 2.
One then compares these penalized loglikelihoods directly; larger is better.
For a particular model, when the BIC is smaller this penalized loglikelihood is bigger and vice-versa.
BIC is based on Bayes Theorem. We assume that the two models being compared are equally
likely a priori. BIC is an approximation when the “unit-information” prior is assumed.208
If for two models the difference in BIC is b, then the ratio of the posterior probabilities of the two
models is approximately eb/2. For example, if the difference is 6, then the model with the smaller
BIC is more likely by a ratio of e3 = 20; the posterior probability of the model with the smaller BIC
is 20/21, while that for the model with the larger BIC is 1/21.
It is common to interpret differences in BIC between two models as follows:209
Posterior probability of Evidence for

Difference in BIC Model With Smaller BIC Model With Smaller BIC
0-2 50% to 73% “Weak”
2-6 73% to 95% “Positive”
6-10 95% to 99% “Strong”
> 10 > 99% “Conclusive”
AIC (Akaike Information Criteria):210
The Akaike Information Criteria is also used to compare a bunch of models all fit via maximum
likelihood to the same data. The model with the smallest AIC is preferred. For a particular model:
AIC = (-2) (maximum loglikelihood) + (2) (number of parameters).
Both BIC and AIC involve penalizing models for using more parameters. In the case of BIC the
penalty increases with the sample size, while in the case of AIC it does not. Both are useful for
testing non-nested models, but they are based on somewhat different assumptions.211
208
The “unit-information” prior is a Multivariate Normal Distribution of the parameters, with mean equal to the
maximum likelihood estimates of the parameters.
209
The difference in penalized loglikelihoods, as per the Schwarz Bayesian Criterion in Loss Models, would have the
opposite sign and twice the magnitude of the difference in BIC.
210
See Applied Regression Analysis and Generalized Linear Models, by John Fox, not on the syllabus.
211
One can use the likelihood ratio test for nested models, so while BIC or AIC could be used they are not needed.
Problems:
15.1 (2 points) Two distributions have been fit via maximum likelihood to the same data. One of
the distributions is a special case of the other, with one fewer parameter. If the result of applying the
likelihood ratio test at a 1/2% significance level and the Schwarz Bayesian Criterion are the same,
regardless of the specific values of the loglikelihoods, then how many points are in the data set?
A. 1500 B. 2000 C. 2500 D. 3000 E. 3500

• Various distributions have each been fit to the same set of 300 data points via
the method of Maximum Likelihood.
Transformed Beta 4 -2582
Transformed Gamma 3 -2583
Generalized Pareto 3 -2585
Inverse Gaussian 2 -2589
LogNormal 2 -2590
Exponential 1 -2592
15.2 (2 points) Based on the Schwarz Bayesian Criterion penalized likelihood value discussed in
Loss Models, rank the following three models from best to worst:
Transformed Beta, Transformed Gamma, Inverse Gaussian.
A. Transformed Beta, Transformed Gamma, Inverse Gaussian
B. Transformed Beta, Inverse Gaussian, Transformed Gamma
C. Transformed Gamma, Transformed Beta, Inverse Gaussian
D. Inverse Gaussian, Transformed Beta, Transformed Gamma
15.3 (2 points) Based on the Schwarz Bayesian Criterion penalized likelihood value discussed in
Loss Models, rank the following three models from best to worst:
Generalized Pareto, LogNormal, Exponential.
A. Generalized Pareto, LogNormal, Exponential
B. Generalized Pareto, Exponential, LogNormal
C. LogNormal, Generalized Pareto, Exponential
D. Exponential, Generalized Pareto, LogNormal
15.4 (2 points) A two-point mixture of Inverse Gaussian Distributions (5 parameters) and an

Inverse Transformed Gamma (3 parameters) are each fit via maximum likelihood to the same 50
values. The likelihood for the fitted two-point mixture of Inverse Gaussian Distributions is
2.1538 x 10-169. The likelihood for the fitted Inverse Transformed Gamma is 8.4492 x 10-171.
Based on the Schwarz Bayesian Criterion, which of these two is the better fit?
15.5 (4 points) You are given the following data on the number of claims during a year on 421,240
motor vehicle insurance policies.
0 370,412
1 46,545
2 3,935
3 317
4 28
5 3
6 or more 0
A mixture of two Poissons is fit to this data with result:
λ 1 = 0.103, λ2 = 0.366, and the weight to the first Poisson is p = 0.89, with corresponding
loglikelihood of -171,133.5.
Use the Schwarz Bayesian Criterion to compare this fit to that of the maximum likelihood Geometric
Distribution.
Region Number of Claims Aggregate Losses Average Size of Loss
Rural 5000 500,000 100
Urban 10,000 1,250,000 125
You assume that the distribution of sizes of claims is exponential.
Based on data from other states, you assume that the mean claim size for Urban insureds is 1.2
times that for Rural insureds.
Let H0 be that the mean claim size in West Carolina for Urban is 1.2 times that for Rural.
Using the Schwarz Bayesian Criterion, test the hypothesis H0 .

• Various frequency distributions have each been fit to the same set of 500 data points via
the method of Maximum Likelihood.
Poisson 1 -932.11
Negative Binomial 2 -928.80
Compound Poisson-Binomial 3 -926.77
Compound Negative Binomial-Binomial 4 -924.43
2 Point Mixture of Binomial and Negative Binomial 5 -919.62
Based on the Schwarz Bayesian Criterion penalized loglikelihood value, which is the best model?
A. Poisson B. Negative Binomial C. Compound Poisson-Binomial
D. Compound Negative Binomial-Binomial
E. Two Point Mixture of Binomial and Negative Binomial

(i) Sample size = 100
(ii) The negative loglikelihoods associated with five models are:
Model Number Of Parameters Negative Loglikelihood
Generalized Pareto 3 219.1
Burr 3 219.2
Pareto 2 221.2
Lognormal 2 221.4
Inverse Exponential 1 224.2
(iii) The form of the penalty function is r ln(n)/2.
Which of the following is the best model, using the Schwarz Bayesian Criterion?
(A) Generalized Pareto (B) Burr (C) Pareto (D) Lognormal (E) Inverse Exponential
Comment: The original question has been rewritten in order to match the current syllabus.
15.9 (4, 11/06, Q.22 & 2009 Sample Q.266) (2.9 points)
Five models are fitted to a sample of n = 260 observations with the following results:
Model Number of Parameters Loglikelihood
I 1 - 414
II 2 - 412
III 3 - 411
IV 4 - 409
V 6 - 409
Determine the model favored by the Schwarz Bayesian criterion.
(A) I (B) II (C) III (D) IV (E) V
15.1. C. Let d = difference in loglikelihoods. Using the likelihood ratio test, one consults the Chi-
Square Table for one degree of freedom and rejects the simpler distribution at 1/2% if and only if
2d > 7.88. ⇔ d > 3.94.
Using the Schwarz Bayesian Criterion, with one extra parameter the penalized loglikelihood for the
more complicated distribution is better if and only if d > ln(n)/2.
The two results are always the same if 3.94 = ln(n)/2. ⇒ n = e7.88 = 2644.
15.2. C. One adjusts the loglikelihoods by subtracting in each case:

(number of fitted parameters) ln(number of data points)/2. In this case with 300 points, ln(number of
data points)/2 = ln(300)/2 = 2.852.
The penalized likelihood values are: Transformed Beta: -2582 -(4)(2.852) = -2593.4.
Transformed Gamma: -2583 -(3)(2.852) = -2591.6.
Inverse Gaussian: -2589 -(2)(2.852) = -2594.7. Thus the models from best to worst are:
Transformed Gamma, Transformed Beta, Inverse Gaussian.
15.3. B. One adjusts the loglikelihoods by subtracting in each case:

(number of fitted parameters) ln(number of data points)/2. In this case with 300 points, ln(number of
data points)/2 = ln(300)/2 = 2.852.
The penalized likelihood values are: Generalized Pareto: -2585 - (3)(2.852) = -2593.6.
LogNormal: -2590 - (2)(2.852) = -2595.7. Exponential: -2592 - (1)(2.852) = -2594.9.
Thus the models from best to worst are: Generalized Pareto, Exponential, LogNormal.
15.4. The loglikelihood for the two-point mixture of Inverse Gaussian Distributions is:
ln(2.1538 x 10-169) = ln(2.1538) - 169 ln(10) = -388.37.
Penalized loglikelihood is: -388.37 - (5/2)ln(50) = -398.15.
The loglikelihood for the Inverse Transformed Gamma is:
ln(8.4492 x 10-171) = ln(8.4492) - 171 ln(10) = -391.61.
Penalized loglikelihood is: -391.61 - (3/2)ln(50) = -397.48. -397.48 > -398.15.
Thus based on the Schwarz Bayesian Criterion, the Inverse Transformed Gamma is better.
Comment: Based on “Efficient Stochastic Modeling”, by Yvonne C. Chueh, in Contingencies,
January/February 2005.
15.5. For the Geometric Distribution, the method of maximum likelihood is equal to the method of
^
moments. β = X = {(0)(370412) + (1)(46545) + (2)(3915) + (3)(317) + (4)(28) + (5)(3)}/421240
= 0.13174. f(0) = 1/(1+β). ln f(0) = -ln(1+β) = -ln(1.13174) = -0.123756.

f(x+1) = f(x)β/(1+β). ln f(x+1) = ln f(x) + ln(.13174/1.13174) = ln f(x) - 2.150681.
Loglikelihood is: (370412)(-.123756) + (46545)(-2.274437) + (3935)(-4.425118)
+ (317)(-6.575799) + (28)(-8.726480) + (3)(-10.877161) = -171478.7.
The penalty for the Schwarz Bayesian Criterion is:
(# of parameters)ln(n)/2 = (# of parameters)ln(421240)/2 = (# of parameters)6.5.
Penalized loglikelihood for the Geometric: -171478.7 - 6.5 = -171,485.2.
Penalized loglikelihood for the mixture of two Poissons: -171,133.5 - (3)(6.5) = -171,153.
The penalized loglikelihood for the mixture of two Poissons is larger and therefore the mixture of two
Poissons is a (much) better fit to this data than the Geometric Distribution.
Comment: The mixture of Poissons has three parameters, λ1, λ2, and p.
The data is taken from page 45 of Risk Theory, by Beard, Pentikainen, and Pesonen. See also
Tables 14.7 and 16.18 in Loss Models.
15.6. For an Exponential Distribution, ln f(x) = -x/θ - ln(θ). Loglikelihood is: -Σxi/θ - nln(θ).
Separate estimate of θ for Rural Exponential Distribution, θ = 100, same as the method of
moments. The maximum loglikelihood is: -500,000/100 - 5000ln(100) = -28025.85.
Separate estimate of θ for Urban Exponential Distribution, θ = 125.
Restricted by H0 , θU = 1.2θR, the loglikelihood for the combined sample is:
-500,000/θR - 5000ln(θR) -1,250,000/(1.2θR) - 10000ln(1.2θR).
Setting the partial derivative with respect to θR equal to zero, and solving:
θR = (500,000 + 1,250,000/1.2) / (5000 + 10,000) = 102.78. θU = (1.2)(102.78) = 123.33.

-500,000/102.78 - 5000ln(102.78) -1,250,000/123.33 - 10000ln(123.33) = -86311.76.
Unrestricted loglikelihood is: -28025.85 - 58283.14 = -86,308.99.
The penalty is: (number of fitted parameters) ln(number of data points)/2.
Unrestricted penalized loglikelihood is: -86308.99 - (2)ln(15000)/2 = -86,318.61.
Restricted penalized loglikelihood is: -86311.76 - (1)ln(15000)/2 = -86,316.57.
-86,318.61 < -86,316.57, the unrestricted model is not significantly better than the restricted model,
and we do not reject H0 .
Comment: See 4, 11/00, Q. 34.
15.7. B. Using the Schwarz Bayesian Criterion one adjusts the loglikelihoods by subtracting in
each case: (number of fitted parameters) ln(number of data points)/2 =
(number of fitted parameters) ln(500)/2 = (number of fitted parameters)(3.107).
Model Number Loglikelihood Penalty Penalized
Of Parameters Loglikelihood
Poisson 1 -932.11 3.107 -935.22
Negative Binomial 2 -928.80 6.214 -935.01
Compound Pois.-Bin. 3 -926.77 9.321 -936.09
Comp. Neg. Bin. - Bin. 4 -924.43 12.428 -936.86
Mixed Bin. and Neg. Bin. 5 -919.62 15.535 -935.15
The largest penalized loglikelihood is that for the Negative Binomial.
15.8. C. Using the Schwarz Bayesian Criterion one adjusts the loglikelihoods by subtracting in
Gen. Pareto 3 -219.1 6.909 -226.01
Burr 3 -219.2 6.909 -226.11
Pareto 2 -221.2 4.606 -225.81
LogNormal 2 -221.4 4.606 -226.01
Inverse Expon. 1 -224.2 2.303 -226.50
The largest penalized loglikelihood is that for the Pareto.
Comment: From best to worst, the models are: Pareto, LogNormal and Generalized Pareto tied,
Burr, Inverse Exponential. Since they have the same number of parameters and the Pareto has a
larger loglikelihood, the Pareto is better than the LogNormal. Since they have the same number of
parameters and the Generalized Pareto has a larger loglikelihood, the Generalized Pareto is better
than the Burr.
15.9. A. Using the Schwarz Bayesian Criterion one adjusts the loglikelihoods by subtracting in
I 1 -414.00 2.78 -416.78
II 2 -412.00 5.56 -417.56
III 3 -411.00 8.34 -419.34
IV 4 -409.00 11.12 -420.12
V 6 -409.00 16.68 -425.68
The largest penalized loglikelihood is that for the Model I.
Comment: Since Model V has the same loglikelihood as Model IV, but more parameters, Model V
is inferior to Model IV.
2016-C-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/22/15, Page 502
Section 16, Kolmogorov-Smirnov Test, Basics212
The Kolmogorov-Smirnov Statistic is a formal way to compare the fitted/assumed and empirical
distribution functions, which can be applied to ungrouped data.
The Kolmogorov-Smirnov Statistic is computed for ungrouped data by finding the maximum
absolute difference between the empirical distribution function and the fitted/assumed distribution
function:
D = Max | ( empirical distrib. function at x) - (theoretical distrib. function at x) |.

x
This maximum occurs just before or just after one of the observed points, where the empirical
distribution function has a jump.213 By definition the Kolmogorov-Smirnov Statistic is greater than or
equal to zero.214 The smaller the Kolmogorov-Smirnov statistic the better the fit between the
distribution and the observed data.215
212
213
This maximum difference between any two distributions is sometimes referred to as the maximum discrepancy
between the two distributions.
214
Therefore, one performs a one-sided test, rather than a two-sided test as with the use of the Normal distribution.
215
Instead of testing the fit of a single sample to a distribution, one can perform a similar test to check whether two
samples of data come from the same distribution. This test is sometimes also referred to as a Kolmogorov-Smirnov
test. One takes the maximum difference of the two empirical distribution functions. But then one compares to a
different statistic then the one considered in Loss Models.
Exercise: An Exponential distribution with θ = 1000 is compared to the following data set:
197, 325, 981, 2497. What is the value of the Kolmogorov-Smirnov (K-S) statistic?
[Solution: The K-S statistic is 0.2225, from the absolute difference at or just after 325.
Empirical Absolute Value of
x Assumed F(x) Distribution Assumed - Empirical
0
0.1788
197 0.1788
0.0712
0.25
0.0275
325 0.2775
0.2225
0.5
0.1251
981 0.6251
0.1249
0.75
0.1677
2497 0.9177
0.0823
1
For example, at 324.999 the assumed Exponential Distribution is: 1 - e-324.999/1000 = 0.2775.
The empirical distribution at 324.999 is 1/4. The absolute difference at 324.999 is:
|0.2775 - 0.25| = 0.0275. At 325, the fitted Exponential Distribution is: 1 - e-325/1000 = 0.2775.
The empirical distribution at 325 is 2/4.
Therefore, the absolute difference at 325 is: |0.2775 - 0.5| = 0.2225.
Comment: While the Exponential Distribution is continuous at 325, and therefore F(324.999) ≅
F(325), the empirical distribution function has a jump discontinuity of size 1/4 at 325, as well as the
other observed points.]
Note the way the spreadsheet in the above solution was set up. Leave three blank lines between
each of the observed values, where the observed values are ranked from smallest to largest. Put
next to each observed value the fitted/assumed distribution function at that value.
On the line between each observed value, put the empirical distribution function. The empirical
distribution function starts at zero before the first observation and increases each time by
1
, up to one after the last observation. In the case of a repeated value in the data
number of points
set, there would be a double jump in the empirical distribution function.216
216
See for example, 4B, 5/95, Q.11.
If a value shows up three times there will be a triple jump in the empirical distribution function, etc.
Then the absolute differences are taken between all of the adjacent rows. For example, I took
|0.6251 - 0.75| = 0.1249 and |0.9177 - 0.75| = 0.1677.
In this case with four points, there are a total of eight comparisons. Going down this column of
absolute differences, the K-S statistic is the largest value, in this case 0.2225.
Some people may prefer to arrange this same calculation somewhat differently.217
Let F*(x) be the assumed distribution, Fn (x-) be the empirical distribution function just before x,
and Fn (x) be the empirical distribution function at x.
Then we compute both |Fn (x-)- F*(x)| and |Fn (x)- F*(x)| :
x Absolute Fn(x-) F*(x) Fn(x) Absolute

difference difference
197 0.1788 0 0.1788 0.25 0.0712
325 0.0275 0.25 0.2775 0.5 0.2225
981 0.1251 0.5 0.6251 0.75 0.1249
2497 0.1677 0.75 0.9177 1 0.0823
As previously, the K-S statistic is the largest absolute difference, D = 0.2225.218
217
218
We check both columns of absolute differences.
A Weibull Example:
For the ungrouped data in Section 2 there are 130 points. Thus one must compute the absolute
difference between the empirical and fitted distribution 260 times, twice the number of points.
For example, the 78th point is 146,100. Thus the empirical distribution function is 77 / 130 = 0.5923
just before 146,100 and 78 / 130 = 0.6000 just after 146,100.
There is a jump discontinuity at 146,100.
For the Weibull distribution fit to this data via maximum likelihood with parameters θ = 231,158 and
τ = 0.6885, F(146,100) = 1 - exp[-(146100/231158)0.6885] = 0.5177. Thus just before 146,100
the absolute difference between empirical and fitted is: 0.5923 - 0.5177 = 0.0746. Just after
146,100 the absolute difference between empirical and fitted is: 0.6000 - 0.5177 = 0.0823.
It turns out that the maximum such absolute difference occurs just after the 77th point, 135,800.
The empirical distribution is 0.5923, the fitted distribution is 0.5001, and the absolute difference is
0.0922. Thus 0.0922 is the Kolmogorov-Smirnov Statistic for the Weibull distribution fit by
maximum likelihood to the ungrouped data in Section 2.
Here is a larger portion of the calculation of the Kolmogorov-Smirnov Statistic for the Weibull
distribution fit by maximum likelihood:
Fitted Empirical Difference Difference
x Weibull Distribution just before x at x
122,000 0.4748 0.5077 0.0252 0.0329
123,100 0.4769 0.5154 0.0308 0.0385
126,600 0.4835 0.5231 0.0319 0.0396
127,300 0.4848 0.5308 0.0383 0.0460
127,600 0.4853 0.5385 0.0454 0.0531
127,900 0.4859 0.5462 0.0526 0.0603
128,000 0.4861 0.5538 0.0601 0.0678
131,300 0.4921 0.5615 0.0618 0.0695
132,900 0.4950 0.5692 0.0666 0.0743
134,300 0.4975 0.5769 0.0718 0.0795
134,700 0.4982 0.5846 0.0788 0.0865
135,800 0.5001 0.5923 0.0845 0.0922
146,100 0.5177 0.6000 0.0746 0.0823
150,300 0.5246 0.6077 0.0754 0.0831
171,800 0.5574 0.6154 0.0502 0.0579
173,200 0.5595 0.6231 0.0559 0.0636
177,700 0.5659 0.6308 0.0572 0.0649
183,000 0.5732 0.6385 0.0576 0.0653
183,300 0.5736 0.6462 0.0649 0.0726
190,100 0.5827 0.6538 0.0634 0.0711
209,400 0.6071 0.6615 0.0467 0.0544
Difference between Empirical and Fitted Distributions, D(x) plot:
Let D(x) = empirical distribution - fitted/assumed distribution.219

Then a good fit would have D(x) of small magnitude for all x.
K-S Statistic = Max | D(x) |
x
The smaller the Kolmogorov-Smirnov Statistic, the better the fit.
One can usefully graph, D(x), the difference function.

For example, here is the difference between the empirical distribution function and the maximum
likelihood Weibull Distribution, for the ungrouped data in Section 2:220
D(x)
0.10
0.05
0.00 x (000)
1 10 100 1000
- 0.05
- 0.10
For example, 140,000 is between 135,800 and 146,100, the 77th and 78th values out of 130.
The fitted Weibull distribution at 140,000 is: 1 - exp[-(400000/231158)0.6885] = 0.5074.
Therefore, D(140,000) = 77/130 - 0.5074 = 0.0849.
Graphically the K-S statistic is the maximum distance this difference curve, the plot of
D(x), gets from the x-axis, either above or below. In this case this maximum distance is
0.0922, which occurs at 135,800. In the region from about 100,000 to 200,000 there is a poor fit;
the empirical distribution function increases quickly in this region due to a large number of claims
reported in this size category.
219
See Section 16.3 and Figures 16.4 and 16.5 in Loss Models. D stands for difference, deviation or discrepancy.
220
While the computer has drawn the graph as continuous, there is a jump discontinuity at each of the data points.
The Kolmogorov-Smirnov Statistic has a clear and simple graphical interpretation. It is the maximum
distance D(x) gets from the x-axis, or equivalently the maximum distance between the graphs of the
empirical and the fitted distribution functions.
The larger the Kolmogorov-Smirnov Statistic, the worse the fit. ⇔

The further the difference curve gets from the x-axis, the worse the fit.
For curves fit to the ungrouped data in Section 2 by the Method of Maximum Likelihood,
the values of the Kolmogorov-Smirnov Statistic are:221
K-S Statistic K-S Statistic

LogLogistic 0.047 LogNormal 0.082
ParaLogistic 0.049 Weibull 0.092
Pareto 0.059 Gamma 0.132 reject fit at 5%
Gen. Pareto 0.060 Exponential 0.240 reject fit at 1%
Burr 0.060 Inverse Gaussian 0.373 reject fit at 1%
Trans. Gamma 0.064
Hypothesis Testing:
Loss Models states that the Kolmogorov-Smirnov test should only be used on individual
data.222
One can use the Kolmogorov-Smirnov Statistic to test whether some ungrouped data was drawn
from an assumed distribution. For the Kolmogorov-Smirnov Statistic the critical value is
inversely proportional to the square root of the number of points.
Here is a table of critical values:223 224 225

| | | |
Critical Value = c 1.07/ n 1.22/ n 1.36/ n 1.63/ n
221
How to determine the significance levels is discussed below. See the Section on fitting to ungrouped data via
maximum likelihood for the parameters of the fitted distributions.
222
See page 428. As discussed in the next section, one can get bounds on the K-S statistic for grouped data
223
If needed to answer an exam question, this or a similar table will be included within the question.
224
These critical values to more accuracy plus some additional ones: 20% 1.0727, 10% 1.2238, 5% 1.3581,
2.5% 1.4802, 1% 1.6276, 0.5% 1.7308, 0.1% 1.9495.
225
These critical values are approximate and should be good for 15 or more data points. For smaller sample sizes, the
critical values are somewhat larger than those given by these formulas. For example, here are 20% critical values for
various small sample sizes (do not divide by n ): 4 0.494, 5 0.446, 6 0.411, 7 0.381, 8 0.358, 9 0.339, 10 0.322.
Similar 5% critical values: 4 0.624, 5 0.564, 6 0.521, 7 0.486, 8 0.457, 9 0.432, 10 0.411.
Taken from “Kolmogorov-Smirnov: A Goodness of Fit Test for Small Samples,” by J. Romeu.
These critical values, d/ n , for significance level α are determined from the asymptotic
distribution of the maximum absolute deviation, by finding the value d such that:
∞
α= 2 ∑ (-1)r -1 exp(-2r2d2) .
r =1
While these critical values are often used by actuaries when comparing data to distributions fit to that
same data, the correct critical values in this situation are smaller than those in the table.226 If we used
fitted parameters, then the distribution would seem to match the data better and the K-S statistic
would be smaller, since we picked those parameters that worked well. Therefore, the K-S critical
values to which we compare would also have to be smaller.227
The distribution of the K-S Statistic is independent of the loss distribution.228 Thus we can use a
single table to apply the K-S test, regardless of whether the distribution to which we compare is an
Exponential, Pareto, etc. In order to enter the table, we only need to know the number of data
points, n.
So for example, with 130 points as in the ungrouped data in Section 2,

the critical values for the K-S Statistic are:

| | | |
Critical Value for n = 130 0.0938 0.107 0.119 0.143
For the Gamma Distribution, the K-S Statistic is 0.132.
Since 0.132 > 0.119, we can reject the fit of the Gamma at a 5% significance level. On the other
hand, the critical value for 1% is: 1.63 / 130 = 0.143 > 0.132, so one can not reject the Gamma at
the 1% significance level.
Mechanically, the K-S Statistic for the Gamma of 0.132 is bracketed by 0.119 and 0.143.
One rejects to the left and does not reject to the right.
Reject at 5%, and do not reject at 1%.
For the Pareto, the K-S Statistic is 0.059. Since 0.059 < 0 0938, we do not reject the fit of the
Pareto at 20%.
226
See page 332 of Loss Models,, “Mahlerʼs Guide to Simulation,” and 4, 11/05, Q.34 (2009 Sample Q.244).
There is no simple adjustment based on the number of fitted parameters.
227
This is the same reason why we reduce the number of degrees of freedom by the number of fitted parameters
when performing the Chi-Square Goodness of Fit Test.
228
In my problems, I give a demonstration of why the K-S Statistic is distribution free.
Exercise: A distribution has been fit to 10,000 claims. The Kolmogorov-Smirnov

(K-S) statistic is 1.30%. Test the hypothesis that the data came from this distribution.
[Solution: With 10,000 points, the critical values for the K-S Statistic are:
| | | |
Critical Value for n = 10000 1.07% 1.22% 1.36% 1.63%
Since 1.22% < 1.30% < 1.36%, we reject the hypothesis at 10% and do not reject at 5%.]
Graphical version of Rejection:
Here is the difference between the empirical distribution function and the maximum likelihood Pareto
Distribution, for the ungrouped data in Section 2:
D(x)
0.10
0.0938
0.05
0.00 x (000)
1 10 100 1000
- 0.05
-0.0938
- 0.10
The maximum absolute difference is 0.059, the K-S statistic; the Pareto fits this data well.
The critical value for 130 data points and 20% significance level is 0.0938. The critical region or
rejection region is outside the horizontal lines at ±0.0938.
The difference curve stays between the horizontal lines at ±0.0938, the critical value for 20%, and
therefore we do not reject the fit (we accept the fit) at 20%.
For the Weibull, the K-S Statistic is 0.0922. Since 0.0922 < 0.0938, we (just barely) do not reject
the fit of the Weibull at a 20% significance level.
Below is shown the difference between the empirical distribution function and the maximum
likelihood Weibull Distribution. Since the difference curve gets further from the x-axis than was the
case for the Pareto Distribution, the Weibull is not as good a fit.
D(x)
0.10 0.0938
0.05
0.00 x (000)
1 10 100 1000
- 0.05
-0.0938
- 0.10
The difference curve (barely) stays between the horizontal lines at ±0.0938, the critical value for
20%, and therefore we do not reject the fit (we accept the fit) at 20%.
We note that for small sizes of loss, the empirical distribution is less than the fitted Weibull
Distribution, and therefore the fitted Weibull has a thicker lefthand tail than the empirical.229
For large sizes of loss, the empirical distribution is somewhat less than the fitted Weibull Distribution;
the empirical survival function is somewhat greater than the fitted survival function. Therefore the fitted
Weibull Distribution has a somewhat thinner righthand tail than the empirical. Since there are only four
losses of size greater than 2 million in the ungrouped data set in Section 2, it is difficult to draw
conclusions about the righthand tail.
For example, for the fitted Weibull with θ = 231,158 and τ = 0.6885, F(10000) = 10.9%, while there are 8 out of
229
130 values in the data set ≤ 10,000, for an Empirical Distribution Function at 10,000 of: 8/130 = 6.2%.
For the Exponential, the K-S Statistic is 0.240. Since 0.240 > 0.143, we reject the fit of the
Exponential at a 1% significance level. Below is shown the difference between the empirical
distribution function and the maximum likelihood Exponential Distribution:
D(x)
0.2
0.143
0.1
0.0 x (000)
1 10 100 1000
- 0.1
-0.143
- 0.2
Since the difference curve gets far from the x-axis, the Exponential is not a good fit. One can reject
the Exponential distribution at 1%, since it goes outside the band around the x-axis formed by the
1% significance line, y = ±0.143.
The rejection or critical region for 1% is outside the horizontal lines at y = ±.143. If the difference curve
anywhere enters that critical region, then we reject at 1%.
Tails of Distributions:
D(x) plots can be used to compare the tails of a distribution to those of the data.
The righthand tail, refers to large losses, in other words as x approaches infinity.
Prob[X > c] = S(c).
Thus the Survival Function measures how much probability is in the righthand tail.
If the model has more probability in the righthand tail than the data, then the righthand tail of the
model distribution is said to be too thick. For the righthand tail, that means for large x the Survival
Function of the model is larger than the empirical survival function.
Thus, if the righthand tail of the model distribution is too thick,
then D(x) = empirical distribution - assumed distribution
= assumed survival function - empirical survival function
is positive in the righthand tail.
Here is an example, where the model has too thick of a righthand tail compared to the data:230
D(x)
If the model has less probability in the righthand tail than the data, then the righthand tail of the model
distribution is said to be too thin. For the righthand tail, that means for large x the Survival Function of
the model is smaller than the empirical survival function. Thus, if the righthand tail of the model
distribution is too thin, then D(x) is negative in the righthand tail.
230
Both the model and the data have a distribution function of one at infinity, so D(∞) = 0.
The lefthand tail, refers to small losses, in other words as x approaches zero, (or in the case of the
Normal Distribution negative infinity.)
Prob[X ≤ c] = F(c).
Thus the Distribution Function measures how much probability is in the lefthand tail.
If the model has more probability in the lefthand tail than the data, then the lefthand tail of the model
distribution is said to be too thick. For the lefthand tail, that means for small x the Distribution Function
of the model is larger than the empirical distribution function. Thus, if the lefthand tail of the model
distribution is too thick, then D(x) is negative in lefthand tail.
Here is an example, where the model has too thick of a lefthand tail compared to the data:231
D(x)
x
If the model has less probability in the lefthand tail than the data, then the lefthand tail of the model
distribution is said to be too thin. For the lefthand tail, that means for small x the Distribution Function
of the model is smaller than the empirical distribution function. Thus, if the lefthand tail of the model
distribution is too thin, then D(x) is positive in the lefthand tail.
Righthand Tail of the Model is Too Thick. ⇔ D(x) is Positive for large x.
Righthand Tail of the Model is Too Thin. ⇔ D(x) is Negative for large x.
Lefthand Tail of the Model is Too Thick. ⇔ D(x) is Negative for small x.
Lefthand Tail of the Model is Too Thin. ⇔ D(x) is Positive for small x.
In all cases, we compare the probability in the tail for the model and the data.
231
Both the loss model and the data have a distribution function of zero at zero, so D(0) = 0.
Area Under the Difference Graph, D(x):
D(x) = empirical distribution - fitted/assumed distribution = Fn (x) - F(x) = S(x) - Sn (x).
The sample mean, X , is equal to the integral of the empirical survival function, Sn (x).
The mean of a distribution, E[X], is equal to the integral of its survival function, S(x).232
Therefore, the integral of D(x) is: E[X] - X .

In other words, the area under the difference graph, counting areas below the x-axis as negative, is
the difference between the mean of the distribution and the sample mean.
Exercise: One observes losses of sizes: 3, 6, 12, 15.

Assume a distribution uniform from zero to 20. Graph D(x).
[Solution: F(x) = x/20, x < 20.
⎧ 0, x < 3
⎪
⎪⎪ 0.25, 3 ≤ x < 6
Fn (x) = ⎨ 0.50, 6 ≤ x < 12 .
⎪0.75, 12 ≤ x < 15
⎪
⎩⎪ 1, 15 ≤ x
The graph of D(x) = Fn (x) - F(x):
D(x)
0.2
0.1
x
3 6 12 15 20
- 0.1
Comment: There are jump discontinuities in the empirical distribution function at each of the
observed points, and therefore also in the difference graph, D(x).]
232
Assuming the distribution has support starting at zero.
Exercise: Compute the area under D(x), counting areas below the x-axis as negative.
[Solution: From 0 to 3 we have a triangle below the x axis with area: (3)(3/20)/2 = 9/40.
D(x)
0.25
0.2
0.15
0.1
x
3 6 12 15 20
- 0.05
- 0.1
- 0.15
From 3 to 5 we have a triangle above the x axis with area: (2)(2/20)/2 = 4/40.
From 5 to 6 we have a triangle below the x axis with area: (1)(1/20)/2 = 1/40.
From 10 to 12 we have a triangle below the x axis with area: (2)(2/20)/2 = 4/40.
The area below D(x) is: (-9 + 4 - 1 + 16 - 4 + 9 + 25)/40 = 1.]
For the uniform distribution from 0 to 20, E[X] = 10.

For the sample: 3, 6, 12, 15, X = 9.
Thus for this example, E[X] - X = 10 - 9 = 1, which is indeed the area under D(x).
Distributions fit via Method of Moments:
For Curves fit to the ungrouped data in Section 2 by the Method of Moments, the values of the
Kolmogorov-Smirnov Statistic are:233
K-S Statistic
Inverse Gaussian 0.099 reject fit at 20%
LogNormal 0.100 reject fit at 20%
Pareto 0.131 reject fit at 5%
Weibull 0.158 reject fit at 1%
Exponential 0.240 reject fit at 1%
Gamma 0.275 reject fit at 1%
Different Criteria For Choosing Between Fits of Distributions:
We have covered a number of different ways of deciding which distribution best fits a given data
set.
Criterion Good Fit
Chi-Square Small234
Kolmogorov-Smirnov Statistic Small
Anderson-Darling Statistic235 Small
Likelihood or Loglikelihood Large
Penalized Loglikelihood236 Large
233
See the Section on fitting via the Method of Moments for the parameters of the fitted distributions.
234
One wants a large corresponding “p-value”, which is the Survival Function of the Chi-Square Distribution (for the
appropriate number of degrees of freedom) at the value of the Chi-Square Statistic.
235
No longer on the syllabus. To be discussed subsequently.
236
The Schwarz Bayesian Criterion.
Graphing the Difference of Two Empirical Distribution Functions:
One can also graph the difference between two different empirical distributions.
For example, here is a comparison of fire damage for protected and unprotected buildings.237
Frame, Protected Distribution- Unprotected Distribution

0.02
% of Value
0.1 0.5 1 5 10 50 100
- 0.02
- 0.04
- 0.06
Brick , Protected Distribution- UnprotectedDistribution
0.01
% of Value
0.1 0.5 1 5 10 50 100
- 0.01
- 0.02
- 0.03
- 0.04
237
The data were taken from “Rating by Layer of Insurance,” by Ruth E. Salzmann, PCAS 1963. For four different
classes of building, she shows the number of fire losses of size less than or equal to a given percent of value of the
building. This data were previously displayed in my section on ogives.
Approaches to Selecting Models:238
Loss Models refers to looking at the above items as “score based approaches” to selecting a
model.
Score Based Approaches:

1. Chi-Square Statistic
2. p-value of Chi-Square Test
3. Kolmogorov-Smirnov Statistic
4. Anderson-Darling Statistic
5. Likelihood or Loglikelihood
6. Schwarz Bayesian Criterion
Which distribution fits better may depend on which criterion one uses. Remember the principle of
parsimony, which says one should not use more parameters than are necessary.
Also Loss Models suggests that one limit the number of different models considered. If one has a
vast array of models, one of them will happen to fit the data, even if it would not help to predict the
future.
Loss Models summarizes its advice as:239

1. Use a simple model if possible.
2. Restrict the universe of possible models.
In addition to the “score based approaches”, there are graphical techniques, such as ogives,
histograms, graphs of difference functions, p-p plots, graphical comparisons of mean excess losses,
etc. Loss Models includes these graphical approaches in what he calls “judgement based
approaches” to selecting a model.
Judgement Based Approaches:

1. Reviewing graphs.
2. Focusing on important items for the particular application.240
3. Relying on models that have worked well in similar situations in the past.241
In many cases, score based approaches will narrow down the viable models to a few, and then
judgement would be required in order to decide between these good candidates.
238
239
240
For example, when pricing excess insurance, one would focus on the righthand tail of a size of loss distribution.
241
An actuary is not expected to reinvent the wheel each time he estimates a quantity. If fitting a Pareto distribution
using maximum likelihood has worked well for many years for pricing your insurerʼs excess insurance, then you are
likely to do so again this year without necessarily checking alternatives. Also modeling judgments are often based in
whole or in part on reading the actuarial literature describing how other actuaries have modeled similar situations.
Problems:
For the following questions, use the following table for the Kolmogorov-Smirnov statistic.
α 0.20 0.10 0.05 0.025 0.01
| | | | |
c 1.07/ n 1.22/ n 1.36/ n 1.48/ n 1.63/ n
16.1 (1 point) A distribution has been fit to 1000 claims. The Kolmogorov-Smirnov (K-S) statistic is
0.035. Which of the following is true?
A. Do not reject the fit at 20%.
B. Do not reject the fit at 10%. Reject the fit at 20%.
C. Do not reject the fit at 5%. Reject the fit at 10%.
D. Do not reject the fit at 1%. Reject the fit at 5%.
E. Reject the fit at 1%.
16.2 (3 points) A Pareto distribution with parameters α = 1.5 and θ = 1000 is compared to the
following five claims: 179, 352, 918, 2835, 6142.
What is the value of the Kolmogorov-Smirnov (K-S) statistic?
A. less than 0.24
E. at least 0.27
16.3 (3 points) A LogNormal distribution with parameters µ = 7.72 and σ = 0.944 was fit to the
following five claims: 410, 1924, 2635, 4548, 6142.
What is the value of Kolmogorov-Smirnov (K-S) statistic?
A. less than 0.15
E. at least 0.30
16.4 (3 points) For f(x) = 2e-2x, using the method of inversion, five claims are simulated using
random numbers: 0.280, 0.673, 0.372, 0.143, 0.961. Compute the Kolmogorov-Smirnov Statistic.
A. less than 0.13
E. at least 0.22
16.5 (3 points) For f(x) = 3,000 / (10 +x)4 , using the method of inversion, five claims are simulated
using random numbers: 0.280, 0.673, 0.372, 0.143, 0.961.
Compute the Kolmogorov-Smirnov Statistic.
A. less than 0.13
E. at least 0.22
16.6 (2 points) Given the distribution f(x) = 5 x4 , 0 < x < 1,

and the sample: 0.90, 0.70, 0.75, 0.80, 0.50, 0.65,
what is the value of the Kolmogorov-Smirnov Statistic?
A. less than 0.35
E. at least 0.50
16.7 (1 point) Using the result of the previous question, which of the following is true?
D. Do not reject the fit at 1%. Reject the fit at 5%.
E. Reject the fit at 1%.

241,513 231,919 105,310 125,152 116,472
110,493 139,647 220,942 161,964 105,829
A Distribution Function: F(x) = 1 - (x/100000)-2.8, x > 100,000, has been fit to this data.
Determine the value of the Kolmogorov-Smirnov Statistic.
A. less than 0.13
E. at least 0.22
16.9 (3 points) You observe 5 claims of different sizes.

A distribution function is compared to this data.
What is the smallest possible value of the Kolmogorov-Smirnov Statistic?
(A) 0 (B) 0.05 (C) 0.10 (D) 0.15 (E) 0.20
16.10 (2 points) Below is shown a graph of D(x), the difference between the empirical distribution
function and a fitted distribution function.
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
10000 20000 30000 40000

- 0.01
- 0.02
- 0.03
- 0.04
- 0.05
- 0.06
- 0.07
- 0.08

1. The left tail of the fitted distribution is too thick.
2. The right tail of the fitted distribution is too thick.
3. The Kolmogorov-Smirnov Statistic is between 0.05 and 0.06.
A. 1 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C, or D
16.11 (4 points) For the following four losses: 40, 150, 230, 400, and an Exponential Distribution
as per Loss Models, which of the following values of θ has the best value of the
Kolmogorov-Smirnov (K-S) goodness of fit statistic?
A. 190 B. 210 C. 230 D. 250 E. 270
16.12 (1 point) Below is shown a graph of, D(x), the difference between the empirical distribution
function and a fitted distribution function.
0.2
0.19
0.18
0.17
0.16
0.15
0.14
0.13
0.12
0.11
0.1
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
- 0.01 5000 10000 15000 20000

- 0.02
- 0.03
- 0.04
- 0.05
- 0.06
- 0.07
- 0.08
- 0.09
- 0.1
- 0.11
- 0.12
- 0.13
- 0.14
- 0.15
- 0.16
- 0.17
- 0.18
- 0.19
- 0.2
What is the value of the Kolmogorov-Smirnov Statistic?

A. 0.12 B. 0.14 C. 0.16 D. 0.18 E. 0.20
16.13 (1 point) According to Loss Models, which of the following are score-based approaches to
selecting models?
1. Comparing Kolmogorov-Smirnov Statistics.
2. Examining graphs of the difference between the empirical distribution and the models.
3. Comparing p-values of the Chi-Square Statistics.
A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C or D
16.14 (2 point) For a single data set, for various continuous distributions on [0, 1], below are shown
graphs of D(x). In which case is the mean of the distribution greater than the sample mean?
A. B.
x
1
x
1
C. D.
x
1
x
1
E.
x
1
16.15 (2 points) A Uniform distribution on 0 to 10 is compared to the following data set:

1, 1, 3, 3, 6, 6, 6, 7, 9, 9. What is the value of the Kolmogorov-Smirnov (K-S) statistic?
A. 0.10 B. 0.15 C. 0.20 D. 0.25 E. 0.30

(i) The following are six observed claim amounts:
700 1200 2000 4600 6000 9500
(ii) An Exponential Distribution is fit to this data via maximum likelihood.
16.16 (1 point) Determine D(5999), the difference function at 5999.

A. -0.13 B. -0.11 C. -0.09 D. -0.07 E. -0.05
16.17 (1 point) Determine D(6000), the difference function at 6000.

A. 0.02 B. 0.04 C. 0.06 D. 0.08 E. 0.10
16.18 (2 points) You are testing the null l hypothesis that f(x) = x/18 for 0 < x < 6.
A random sample of size 5 is drawn in which 2 values are equal to 2, and 3 values are equal to 4.
What is the conclusion of the Kolmogorov-Smirnov test?
A. Do not reject H0 at 20%.
B. Do not reject H0 at 10%. Reject H0 at 20%.
C. Do not reject H0 at 5%. Reject H0 at 10%.
E. Reject H0 at 1%.
16.19 (3 points) You are given a random sample of observations: 1500, 2500, 4000.
You test the hypothesis that the data was drawn from an Inverse Gamma Distribution
with α = 3 and θ = 6000.
Calculate the Kolmogorov-Smirnov test statistic.
(A) 0.24 (B) 0.26 (C) 0.28 (D) 0.30 (E) 0.32
16.20 (2 points) A random sample of size one from the uniform distribution on (0 , 1) is compared
to the uniform distribution on (0 , 1).
Determine the distribution of the Kolmogorov-Smirnov test statistic.
16.21 (3 points) There are four data points: 10, 20, 30, 40.
You are given the following graph of D(x):
D(x)
x
10 20 30 40 50 60
- 0.1
- 0.2
- 0.3
- 0.4
- 0.5
- 0.6
Which of the following is F(x)?

A. Uniform Distribution from 0 to 60.
B. Exponential Distribution with θ = 25.
C. Weibull Distribution with τ = 0.4 and θ = 7.5.
D. Pareto Distribution with α = 4 and θ = 75.
E. LogNormal Distribution with µ = 2 and σ = 1.1.
16.22 (1 point) The following table of critical values for the Kolmogorov-Smirnov (K-S)
goodness-of-fit test is appropriate for a sample size of ten.
10% 5% 2% 1%
0.36866 0.40925 0.45662 0.48893
Comparing a distribution to 10 data points, the K-S statistic is 0.415.
What conclusion should you draw?
A. Do not reject H0 at 10%.
B. Do not reject H0 at 5%. Reject H0 at 10%.
C. Do not reject H0 at 2%. Reject H0 at 5%.
E. Reject H0 at 1%.
16.23 (4, 5/86, Q.5) (3 points) Given the sample 1, 1.5, 2, 2.5, and 2.75, you wish to test the
goodness of fit of the distribution with a probability density function: f(x) = 2x/9, 0 ≤ x ≤ 3.
What is the value of the Kolmogorov-Smirnov (K-S) goodness of fit statistic?
A. Less than 0.11
E. 0.17 or more.
16.24 (4, 5/87, Q.61) (3 points) Assume that the random variable X has the probability density
function: f(x;θ) = θ + 2(1 - θ)x , 0 ≤ x ≤ 1, with parameter θ, 0 ≤ θ ≤ 2.
What is the Kolmogorov-Smirnov statistic to test the fit of the distribution with θ = 0 given the
following sample? 0.45, 0.5, 0.55, 0.75
A. Less than 0.20
E. 0.65 or more.
16.25 (160, 5/87, Q.5) (2.1 points) Two lives are observed, beginning at t = 0.
One dies at t1 = 5; the other dies at t2 = 9.
The survival function S(t) = 1 - t/10 is hypothesized.
Calculate the Kolmogorov-Smirnov statistic.
(A) 0.4 (B) 0.5 (C) 0.6 (D) 0.7 (E) 0.8
16.26 (4, 5/88, Q.57) (2 points) Given the distribution f(x) = θxθ-1, 0 < x < 1, θ = 3, and the
sample 0.7, 0.75, 0.8, 0.5, 0.65, what is the value of the Kolmogorov-Smirnov (K-S) statistic?
A. Less than 0.1
E. 0.4 or more
16.27 (4, 5/90, Q.49) (2 points) The following observations: 1.7, 1.6, 1.6, 1.9 are taken from a
random sample. You wish to test the goodness of fit of a distribution with probability density
function given by f(x) = x/2, for 0 ≤ x ≤ 2. Using the Kolmogorov-Smirnov statistic, you should:
A. Do not reject at both the 0.01 level and the 0.10 level
B. Do not reject at the .01 level but reject at the 0.10 level
C. Do not reject at the .10 level but reject at the .01 level
D. Reject at both the .01 level and the .10 level
E. Cannot be determined
16.28 (160, 5/90, Q.17) (2.1 points) From a laboratory study of nine lives, you are given:
(i) The times of death are 1, 2, 4, 5, 5, 7, 8, 9, 9.
(ii) It has been hypothesized that the underlying distribution is uniform from 0 to 11.
Calculate the Kolmogorov-Smirnov (K-S) statistic for the hypothesis.
(A) 0.12 (B) 0.14 (C) 0.16 (D) 0.18 (E) 0.20
• You have segregated 25 losses into 4 classes based on size of loss.
• A Pareto distribution with known parameters is believed to fit the observed data.
• The chi-square statistic and the Kolmogorov-Smirnov statistic have been calculated
to test the distribution's goodness of fit.
Which of the following are true regarding these two statistics?
1. The chi-square statistic has an approximate chi-square distribution with 4 degrees of freedom.
2. The critical value of the Kolmogorov-Smirnov statistic, c, is inversely proportional to
the square root of the sample size.
3. Calculating the Kolmogorov-Smirnov statistic required testing at most 8 values.
A. 1 only B. 2 only C. 3 only D. 1, 2 only E. 1, 3 only
16.30 (4B, 11/93, Q.10) (3 points) A random sample of 5 claims x1 ,..., x5 is taken from the
α θα
probability density function f(xi) = , α, θ, xi > 0. In ascending order the observations are:
(θ + xi)α + 1
43, 145, 233, 396, 775. Suppose the parameters are α = 1.0 and θ = 400.
Determine the Kolmogorov-Smirnov statistic for the fitted distribution.
A. Less than 0.050
E. At least 0.320
16.31 (4B, 5/95, Q.11) (2 points) Given the sample 0.1, 0.4, 0.8, 0.8, 0.9 you wish to test the
goodness of fit of the distribution with a probability density function given by
f(x) = (1 + 2x) / 2, 0 ≤ x ≤ 1. Determine the Kolmogorov-Smirnov goodness of fit statistic.
A. Less than 0.15
E. At least 0.30

• A random sample of 20 observations of a random variable X yields the following values:
0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0
• The null hypothesis, H0 , is that X has a uniform distribution on the interval [0, 20].
16.32 (4B, 11/95, Q.9) (2 points)

Determine the value of the Kolmogorov-Smirnov statistic used to test H0 .
A. Less than 0.075
E. At least 0.225
16.33 (4B, 11/95, Q.10) (1 point) Which of the following statements is true?
B. H0 will be rejected at the 0.05 significance level but not at the 0.01 level.
C. H0 will be rejected at the 0.10 significance level but not at the 0.05 level.
D. H0 will be rejected at the 0.20 significance level but not at the 0.10 level.

• Claim sizes follow a lognormal distribution with parameters µ and σ .
• A random sample of five claims yields the values 0.1, 0.5, 1.0, 2.0, and 10.0 (in thousands).
16.34 (4B, 11/98, Q.3) (2 points) Determine the maximum likelihood estimate of σ.
A. Less than 1.6
E. At least 2.2
16.35 (4B, 11/98. Q.4) (2 points)

Determine the value of the Kolmogorov-Smirnov statistic using the maximum likelihood estimates.
A. Less than 0.07
E. At least 0.13
16.36 (4, 5/00, Q.11) ( 2.5 points) The size of a claim for an individual insured follows an inverse
exponential distribution with the following probability density function:
θ e- θ / x
f(x | θ) = ,x>0
x2
The parameter θ has a prior distribution with the following probability density function:
e- θ / 4
g(θ) = ,θ>0
4
For a particular insured, the following five claims are observed:
1 2 3 5 13
Determine the value of the Kolmogorov-Smirnov statistic to test the goodness of fit of f(x l θ = 2).
(A) Less than 0.05
(E) At least 0.20
16.37 (4, 5/01, Q.12) (2.5 points) You are given the following random observations:
0.1 0.2 0.5 1.0 1.3
You test whether the sample comes from a distribution with probability density function:
f(x) = 2/ (1+x)3 , x > 0. Calculate the Kolmogorov-Smirnov statistic.
(A) 0.01 (B) 0.06 (C) 0.12 (D) 0.17 (E) 0.19
(i) A sample of claim payments is: 29 64 90 135 182
(ii) Claim sizes are assumed to follow an exponential distribution.
(iii) The mean of the exponential distribution is estimated using the method of moments.
Calculate the value of the Kolmogorov-Smirnov test statistic.
(A) 0.14 (B) 0.16 (C) 0.19 (D) 0.25 (E) 0.27
16.39 (4, 11/04, Q.22 & 2009 Sample Q.149) (2.5 points) If the proposed model is
appropriate, which of the following tends to zero as the sample size goes to infinity?
(A) Kolmogorov-Smirnov test statistic
(B) (Removed as this statement referred to the Anderson-Darling test)
(C) Chi-square goodness-of-fit test statistic
(D) Schwarz Bayesian adjustment
(E) None of (A), (C) or (D)
16.40 (4, 11/04, Q.38 & 2009 Sample Q.160) (2.5 points)
You are given a random sample of observations:
0.1 0.2 0.5 0.7 1.3
You test the hypothesis that the probability density function is: f(x) = 4/(1+x)5 , x > 0.
Calculate the Kolmogorov-Smirnov test statistic.
(A) Less than 0.05
(E) At least 0.35
(i) A random sample of five observations from a population is:
0.2 0.7 0.9 1.1 1.3
(ii) You use the Kolmogorov-Smirnov test for testing the null hypothesis, H0 , that the
probability density function for the population is:
f(x) = 4/(1 + x)5 , x > 0.
(iii) Critical values for the Kolmogorov-Smirnov test are:
Level of Significance: 0.10 0.05 0.025 0.01
Critical Value: 1.22/ n 1.36/ n 1.48/ n 1.63/ n
16.42 (4, 5/05, Q.19 & 2009 Sample Q.189) (2.9 points)
(A) For a null hypothesis that the population follows a particular distribution,
using sample data to estimate the parameters of the distribution tends to decrease
the probability of a Type II error.
(B) The Kolmogorov-Smirnov test can be used on individual or grouped data.
(C) (Removed as this statement referred to the Anderson-Darling test)
(D) For a given number of cells, the critical value for the chi-square goodness-of-fit test
becomes larger with increased sample size.
(E) None of (A), (B), or (D) is true.
16.43 (4, 11/05, Q.34 & 2009 Sample Q.244) (2.9 points)
Which of statements (A), (B), (C), and (D) is false?
(A) The chi-square goodness-of-fit test works best when the expected number of
observations varies widely from interval to interval.
(B) For the Kolmogorov-Smirnov test, when the parameters of the distribution in the
null hypothesis are estimated from the data, the probability of rejecting the null
hypothesis decreases.
(C) For the Kolmogorov-Smirnov test, the critical value for right censored data should
be smaller than the critical value for uncensored data.
(D) (Removed as this statement referred to the Anderson-Darling test)
(E) None of (A), (B), (C) is false.
16.44 (4, 5/07, Q.20) (2.5 points) You use the Kolmogorov-Smirnov goodness-of-fit test to
assess the fit of the natural logarithms of n = 200 losses to a distribution with distribution function F*.
You are given:
(i) The largest value of |Fn (x) - F*(x)| occurs for some x between 4.26 and 4.42.
(ii) Observed x F*(x) Fn (x-) Fn (x)
4.26 0.584 0.505 0.510
4.30 0.599 0.510 0.515
4.35 0.613 0.515 0.520
4.36 0.621 0.520 0.525
4.39 0.636 0.525 0.530
4.42 0.638 0.530 0.535
(iii) Commonly used large-sample critical values for this test are 1.22 / n for α = 0.10,
1.36 / n for α = 0.05, 1.52 / n for α = 0.02, and 1.63 / n for α = 0.01.
(E) Reject H0 at the 0.01 significance level
16.1. B. For 1000 points, the critical values for the K-S stat. are 1.22 / 1000 = 0.0386 for 10%
and 1.07 / 1000 = 0.0338 for 20%. 0.0338 < 0.035 < 0.0386.
Thus one can reject at 20% but does not reject at 10%.
16.2. D. At each of the observed claim sizes, compute the values of the fitted Pareto distribution:
F(x) = 1 - {θ/(θ+x)}α = 1 - {1000/(1000 +x)}1.5.
So for example, F(352) = 1 - {01000/(1000 + 352)}1.5 = 0.3639.
Then compare each fitted probability to the empirical distribution function just before and just after
each observed claim value. (Thus there are twice 5, or 10 comparisons.)
The largest absolute difference is: F(2835) - 0.6 = 0.8668 - 0.6 = 0.2668 = K-S statistic.
X Fitted F(X) Distribution Fitted - Empirical
0
0.2189
179 0.2189
0.0189
0.2
0.1639
352 0.3639
0.0361
0.4
0.2235
918 0.6235
0.0235
0.6
0.2668
2835 0.8668
0.0668
0.8
0.1476
6142 0.9476
0.0524
1
An alternate way to arrange this same calculation, as per Table 16.3 in Loss Models:
179 0.2189 0 0.2189 0.2 0.0189
352 0.1639 0.2 0.3639 0.4 0.0361
918 0.2235 0.4 0.6235 0.6 0.0235
2835 0.2668 0.6 0.8668 0.8 0.0668
6142 0.1476 0.8 0.9476 1 0.0524
The K-S Statistic is 0.2668, the maximum absolute difference, checking in both columns.
Fn (x-) is the empirical distribution function just before x.
F*(x) is the assumed distribution function at x.
Fn (x) is the empirical distribution function at x.
Comment: The Pareto Distribution (shown dashed) and the Empirical Distribution Function:
Prob.
1
0.8
0.6
0.4
0.2
x
2000 4000 6000 8000
A graph of the Empirical Distribution Function minus Fitted Pareto Distribution:

Prob.
0.05
x
2000 4000 6000 8000
- 0.05
-0 . 1
- 0.15
-0 . 2
- 0.25
The maximum absolute difference of 0.2668 occurs just before 2835.

16.3. C. At each of the observed claim sizes, compute the values of the fitted LogNormal
distribution. F(x) = Φ[{ln(x) − µ} / σ] = Φ[{ln(x) - 7.72} / 0.944].
So for example, F(410) = Φ[(ln(410) - 7.72)/0.944] = Φ(-1.80) = 1 - 0.9641 = 0.0359.
each observed claim value. (Thus there are twice 5 or 10 comparisons.) The largest absolute
difference is F(1924) - 20% = 0.4325 - 0.2 = 0.2325 = K-S statistic.
X (LN(X)-7.72)/.944 Fitted F(X) Distribution Fitted - Empirical
0
0.0359
410 -1.80 0.0359
0.1641
0.2
0.2325
1924 -0.17 0.4325
0.0325
0.4
0.1675
2635 0.17 0.5675
0.0325
0.6
0.1704
4548 0.74 0.7704
0.0296
0.8
0.0554
6142 1.06 0.8554
0.1446
1
16.4. E. An Exponential Distribution with (inverse) scale parameter of 2,

F(x) = 1 - e-2x. For the inversion method let y = F(x), 1 - y = e-2x , x = -ln(1-y) / 2.
Thus for y = 0.280, 0.673, 0.372, 0.143, 0.961; x = 0.1643, 0.5589, 0.2326, 0.0772, 1.6221.
Sort the values of x from lowest to highest.
One then computes the values for the exponential distribution at these values of x, and compares
them to the empirical distribution function.
Arranging things in the following pattern aids the comparison, which should take place twice for each
point, once “just before” the value of x and once “just after”.
The maximum absolute difference occurs just after the third claim, at which the empirical distribution
function is 3/5 = 0.6 but the fitted distribution is 0.372.
The K-S statistic is this maximum absolute difference of 0.228.
Empirical Fitted
0.000
0.143
0.0772 0.143
0.057
0.200
0.080
0.1643 0.280
0.120
0.400
0.028
0.2326 0.372
0.228
0.600
0.073
0.5589 0.673
0.127
0.800
0.161
1.6221 0.961
0.039
1.000
Comment: The method of inversion is discussed in “Mahlerʼs Guide to Simulation.”

16.5. E. A Pareto Distribution with α = 3 and θ = 10, F(x) = 1 - (1+ x/10)-3.

For the inversion method let y = F(x), 1 - y = (1+ x/10)-3 . x = 10{ (1-y)-1/3 -1 }.
Thus for y = 0.280, 0.673, 0.372, 0.143, 0.961; x = 1.157, 4.515, 1.677, 0.528, 19.488.
Empirical Fitted
0.000
0.143
0.528 0.143
0.057
0.200
0.080
1.157 0.280
0.120
0.400
0.028
1.677 0.372
0.228
0.600
0.073
4.515 0.673
0.127
0.800
0.161
19.488 0.961
0.039
1.000
Comment: The method of inversion is discussed in “Mahlerʼs Guide to Simulation.”
Note that by the definition of the inversion method the values of the distribution function at the five
simulated claim sizes is given by the five random numbers. Thus performing the inversion was
unnecessary; one need not even compute the values of x.
One computes the K-S statistic by comparing the given random numbers, corresponding to the
values of the distribution function, to the values of the empirical distribution, which at the observed
claim sizes takes on the values 1/5, 2/5, 3/5, 4/5 and 1. Thus one compares the given random
numbers (ordered from smallest to largest) to the uniform distribution.
By this “trick” one sees that the K-S statistic is really independent of the particular distribution.
If you do not follow this important idea, see the previous problem in which the method inversion is
applied with the same random numbers, but to an Exponential Distribution; one gets the same
answer for the K-S Statistic as here.
16.6. E. The Distribution function is F(x) = x5 , 0 ≤ x ≤ 1. Note that one has to rank the sample data
from smallest to largest in order to compute the empirical distribution.
Empirical Fitted
0.000
0.031
0.500 0.031
0.135
0.167
0.051
0.650 0.116
0.217
0.333
0.165
0.700 0.168
0.332
0.500
0.263
0.750 0.237
0.429
0.667
0.339
0.800 0.328
0.506
0.833
0.243
0.900 0.590
0.410
1
16.7. C. For 6 data points, the critical values for the K-S stat. are 1.63 / 6 = 0.665 for 1%,
1.36 / 6 = 0.555 for 5%, 1.22/ 6 = 0.498 for 10%, and 1.07/ 6 = 0.437 for 20%.
0.498 < 0.506 < 0.555 . Thus one can reject at 10%, but does not reject at 5%.
16.8. D.
Empirical Fitted
0.000
0.135
105,310 0.135
0.035
0.100
0.047
105,829 0.147
0.053
0.200
0.044
110,493 0.244
0.056
0.300
0.048
116,472 0.348
0.052
0.400
0.066
125,152 0.466
0.034
0.500
0.107
139,647 0.607
0.007
0.600
0.141
161,964 0.741
0.041
0.700
0.191
220,942 0.891
0.091
0.800
0.105
231,919 0.905
0.005
0.900
0.015
241,513 0.915
0.085
1.000
16.9. C. Let the five values from smallest to largest be: x1 , x2 , x3 , x4 , x5 .

In order to compute the K-S Statistic, the first two comparisons we need to make are:
|F(x1 ) - 0| and |F(x1 ) - 0.2|. At least one of these is greater than or equal to 0.1.
If F(x1 ) = 0.1, then each of these absolute differences is 0.1.
If F(x1 ) = 0.1, F(x2 ) = 0.3, F(x3 ) = 0.5, F(x4 ) = 0.7, and F(x5 ) = 0.9, then all of the comparisons result
in 0.1. The smallest possible value of the Kolmogorov-Smirnov Statistic is 0.1.
Comment: With N distinct values, the smallest possible K-S Statistic is 1/(2N).
16.10. E. For small x, the empirical distribution function is less than the fitted distribution.
The left tail of the fitted distribution is too thick; #1 is true. For large x, the empirical distribution function
is greater than the fitted distribution. For large x, the empirical survival function is less than the fitted
survival function. The right tail of the fitted distribution is too thick; #2 is true.
For some small x, the difference is about -.07. The K-S Statistic is the largest absolute value of the
difference, which is about .07; #3 is false.
Comment: Items 1 and 2 are similar to 4, 11/01, Q.6, involving p-p plots. The graph was based on
1000 simulated values from a Weibull with τ = 1/2 and θ = 1000, and a Lognormal Distribution fit to
this simulated data via maximum likelihood.
16.11. D. As computed below, the K-S statistics are: 0.296, 0.260, 0.229, 0.202, 0.227.
θ = 250, has the smallest and therefore the best K-S statistic.
190 Exponential Empirical Absolute Value of
X Distribution Distribution Fitted - Empirical
0
0.190
40 0.190
0.060
0.25
0.296
150 0.546
0.046
0.5
0.202
230 0.702
0.048
0.75
0.128
400 0.878
0.122
1

0
0.173
40 0.173
0.077
0.25
0.260
150 0.510
0.010
0.5
0.166
230 0.666
0.084
0.75
0.101
400 0.851
0.149
1

0
0.160
40 0.160
0.090
0.25
0.229
150 0.479
0.021
0.5
0.132
230 0.632
0.118
0.75
0.074
400 0.824
0.176
1

0
0.148
40 0.148
0.102
0.25
0.201
150 0.451
0.049
0.5
0.101
230 0.601
0.149
0.75
0.048
400 0.798
0.202
1

0
0.138
40 0.138
0.112
0.25
0.176
150 0.426
0.074
0.5
0.073
230 0.573
0.177
0.75
0.023
400 0.773
0.227
1
Comment: Note that the method of moments and maximum likelihood fit, θ = 205, does not have
the best K-S statistic. Which θ is “best” depends on what criterion you use.
16.12. D. The Kolmogorov-Smirnov Statistic is the maximum distance from the x-axis, either
above or below, of the difference curve, in this case about 0.18.
16.13. B. Examining graphs is a judgment-based approach, according to Prof. Klugman.

16.14. D. The area under D(x), counting areas below the x-axis as negative, is the difference
between the mean of the distribution and the sample mean.
Therefore, when the mean of the distribution is greater than the sample mean, the area under the
D(x) graph is positive, which is graph D.
Alternately, D(x) = empirical distribution - assumed distribution.
Therefore, for graph D, the sample has more probability for small x than does the assumed
distribution. Therefore, the assumed distribution has more large values than the sample.
Therefore, for graph D, the mean of the distribution is greater than the sample mean.
Comment: A data set of size 1000 was simulated from a Beta Distribution with a = 2, b = 2, and
θ = 1. Then the empirical distribution function for this data was compared to:
A. Beta Distribution with a = 2, b = 2, and θ = 1, with mean 1/2.
B. Beta Distribution with a = 1, b = 2, and θ = 1, with mean 1/3.
C. Beta Distribution with a = 3, b = 3, and θ = 1, with mean 1/2.
D. Beta Distribution with a = 2, b = 1, and θ = 1, with mean 2/3.
E. Uniform Distribution from 0 to 1, a Beta Distribution with a = 1, b = 1, and θ = 1, with mean 1/2.
In graphs A, C, and E, the area under D(x) is close to zero, and E[X] is approximately equal to X .
In graph B, the area under D(x) is negative, and E[X] < X .
16.15. C. The K-S statistic is 0.2, from the absolute difference just before 6.
X Assumed F(X) Distribution Assumed - Empirical
0
0.1
1 0.1
0.1
0.2
0.1
3 0.3
0.1
0.4
0.2
6 0.6
0.1
0.7
0.0
7 0.7
0.1
0.8
0.1
9 0.9
0.1
1
Comment: Some data points show up more than once in the data base, resulting in less work for us.
16.16. B. & 16.17. C. For the Exponential Distribution, maximum likelihood equals method of
moments: θ^ = (700 + 1200 + 2000 + 4600 + 6000 + 9500)/6 = 4000.

D(x) = empirical distribution - fitted/assumed distribution.
D(5999) = 4/6 - (1 - e-5999/4000) = 2/3 - 0.7768 = -0.110.
D(6000) = 5/6 - (1 - e-6000/4000) = 5/6 - 0.7769 = 0.056.
Comment: The empirical distribution function, and thus D(x), has a jump discontinuity at each of the
observed values.
16.18. C. The distribution function is: F(x) = x2 /36, 0 < x < 6. F(2) = 1/9. F(4) = 4/9.
The Empirical Distribution Function is 0.4 at 2 and 1 at 4.
Thus the four comparisons are:
|1/9 - 0| = 1/9, |1/9 - 0.4| = 0.2889, |4/9 - 0.4| = 0.0444, |4/9 - 1| = 0.5556.
Thus the K-S statistic is 0.5556.
α 0.20 0.10 0.05 0.025 0.01
| | | | |
c 1.07/ n 1.22/ n 1.36/ n 1.48/ n 1.63/ n
Customizing the above Table for n = 5, the 10% critical value is: 1.22/ 5 = 0.5456.
The 5% critical value is: 1.36/ 5 = 0.6082.
0.5456 < 0.5556 < 0.6082. ⇒ Reject at 10% but not at 5%.
Comment: The probability of drawing a sample of size 5 from a continuous distribution and getting
only two different values is virtually zero.
α-1
∑
xi e- x
16.19. A. As per Theorem A.1 in Loss Models, for alpha integer, Γ(α ; x) = 1 - .
i!
i=0
α-1
∑
(θ / x)i e-θ / x
The Inverse Gamma Distribution: F(x) = 1 - Γ(α; θ/x). Thus for integer α, F(x) = .
i!
i=0
F(1500) = e-4 (1 + 4 + 42 /2) = 0.2381. F(2500) = e-2.4 (1 + 2.4 + 2.42 /2) = 0.5697.
F(4000) = e-1.5 (1 + 1.5 + 1.52 /2) = 0.8088. As computed below, the K-S statistic is 0.238.
Empirical Assumed
0.0000
0.238
1500 0.2381
0.095
0.3333
0.236
2500 0.5697
0.097
0.6667
0.142
4000 0.8088
0.191
1.0000
16.20. Let the data point be x, 0 < x < 1.

For the uniform distribution: F(x) = x.
Just before x the empirical distribution function is zero.
At x the empirical distribution function jumps to one.
Therefore, the K-S statistic is: Max[|0 - x|, |1 - x|] = Max[x, 1- x].
For x ≤ 1/2, the K-S Statistic is: 1 - x. For 0 < x ≤ 1/2, 1 - x is uniformly distributed from 1/2 to 1.
For x ≥ 1/2, the K-S Statistic is: x. For 1 > x ≥ 1/2, x is uniformly distributed from 1/2 to 1.
Thus the K-S Statistic has density 2 from 1/2 to 1.
16.21. E. At 30, the empirical distribution function is 3/4.

D(30) = 3/4 - F(30).
Based on the graph D(30) ≅ -0.14, so F(30) = 0.75 + 0.15 = 0.90.
For the Uniform Distribution from 0 to 60, F(30) = 1/2.
For the Exponential Distribution with θ = 25, F(30) = 1 - e-30/25 = 0.70.
For the Weibull Distribution with τ = 0.4 and θ = 7.5, F(30) = 1 - exp[-(30/7.5)0.4] = 0.82.
For the Pareto Distribution with α = 4 and θ = 75, F(30) = 1 - (75/105)4 = 0.74.
ln(30) - 2
For the LogNormal Distribution with µ = 2 and σ = 1.1, F(30) = Φ[ ] = Φ[1.27] = 0.898.
1.1
Thus of the choices, F(x) is the LogNormal Distribution.
Comment: There are other places one can check, to eliminate the choices other than E.
x Empirical Distribution Function LogNormal Distribution D(x)
10- 0 0.608 -0.608
10 0.25 0.608 -0.358
20- 0.25 0.817 -0.567
20 0.50 0.817 -0.317
30- 0.50 0.899 -0.399
30 0.75 0.899 -0.149
40- 0.75 0.938 -0.188
40 1.00 0.938 -0.062
16.22. C. 0.40925 < 0.415 < 0.45662. ⇒ Reject H0 at 5%; do not reject H0 at 2%.
Comment: Critical values taken from Practical Reliability Engineering, by Patrick D. T. OʼConnor and
Andre Kleyner.
16.23. D. f(x) = (2/9)x. ⇒ F(x) = x2 /9, 0 ≤ x ≤ 3.

At each of the observed claim sizes, compute the values of this distribution.
each observed claim value. (Thus there are twice 5, or 10 comparisons.) The largest absolute
difference is: 1 - F(2.75) = 1 - 0.8403 = 0.1597 = K-S statistic.
0
0.1111
1 0.1111
0.0889
0.2
0.0500
1.5 0.2500
0.1500
0.4
0.0444
2 0.4444
0.1556
0.6
0.0944
2.5 0.6944
0.1056
0.8
0.0403
2.75 0.8403
0.1597
1
16.24. C. For θ = 0, f(x) = 2x and thus F(x) = x2 .

At each of the observed claim sizes, compute the values of this distribution.
The largest absolute difference is: 0.75 - F(0.55) = 0.75 - 0.3025 = 0.4475 = K-S statistic.
0
0.2025
0.45 0.2025
0.0475
0.25
0.0000
0.5 0.2500
0.2500
0.5
0.1975
0.55 0.3025
0.4475
0.75
0.1875
0.75 0.5625
0.4375
1
16.25. B. Just before 5, the empirical distribution function is 0 and F(5) = 5/10 = 1/2.
Absolute difference is 1/2. At 5, the empirical distribution function is 1/2 and F(5) = 5/10 = 1/2.
Absolute difference is 0. Just before 9, the empirical distribution function is 1/2 and F(5) = 9/10 =
0.9. Absolute difference is 0.4. At 9, the empirical distribution function is 1 and F(5) = 9/10 = 0.9.
Absolute difference is 0.1. Kolmogorov-Smirnov statistic is 1/2.
16.26. E. At each of the observed claim sizes, compute the values of the fitted distribution:
F(x) = xθ = x3 . Then compare each fitted probability to the empirical distribution function just before
and just after each observed claim value. (Thus there are twice 5, or 10 comparisons.)
The largest absolute difference is: 1 - F(0.8) = 1 - 0.512 = 0.488.
0
0.1250
0.5 0.1250
0.0750
0.2
0.0746
0.65 0.2746
0.1254
0.4
0.0570
0.7 0.3430
0.2570
0.6
0.1781
0.75 0.4219
0.3781
0.8
0.2880
0.8 0.5120
0.4880
1
16.27. B. Since we have 4 points, the critical values are 1.22/ 4 = 0.610 and
1.63 4 4 = 0.815. By integrating the density function, the distribution function is:
x2 / 4 for 0 ≤ x ≤ 2. One compares the Empirical and Fitted Distribution Functions just before and
just after each observed data point:
Empirical Fitted
0.000
0.640
1.600 0.640
0.140
0.500
0.223
1.700 0.723
0.028
0.750
0.152
1.900 0.902
0.098
1.000
The largest absolute difference of .64 occurs just before x = 1.6. F(1.59999) - 0 = 0.64.
We compare the K-S statistic of 0.64 to the critical values: 0.64 > 0.61, so that we reject at 10%.
0.64 ≤ 0.815 so we do not reject at 1%.
Comment: Choice C is impossible; whenever one rejects at the 1% significance level one
automatically also rejects at the 10% significance level. Note that since 1.6 occurs twice in the
sample, the Empirical Distribution Function jumps from 0 to 2/4 = 0.5 at x = 1.6, (we get a double
jump,) and we have to perform only 6 rather than 8 comparisons.
16.28. D. Note that since 5 appears twice in the data set, the empirical distribution jumps from 3/9
to 5/9.
t Assumed F(t) Distribution Assumed - Empirical
0
0.0909
1 0.0909
0.0202
0.1111
0.0707
2 0.1818
0.0404
0.2222
0.1414
4 0.3636
0.0303
0.3333
0.1212
5 0.4545
0.1010
0.5556
0.0808
7 0.6364
0.0303
0.6667
0.0606
8 0.7273
0.0505
0.7778
0.0404
9 0.8182
0.1818
1
16.29. B. 1. False. The Chi-Square Statistic has a number of degrees of freedom equal to 4 - 1 =
3. 2. True. 3. False. The K-S stat. is computed on the ungrouped data. Since there are 25 losses
and we usually have to test just before and just after each point, we would expect to have to test 50
values.
Comment: If the Pareto was fit to the data, then the degrees of freedom for the Chi-Square would
be two less for two fitted parameters: 4 - 1 - 2 = 1. It is not totally clear from the question whether
one has the 25 ungrouped losses. In any case, one should not apply the
K-S test to grouped data; thus statement #3 is still not true.
16.30. E. For the Pareto Distribution, F(x) = 1 - {θ/(θ+x)}α. For α = 1 and θ = 400,
F(x) = 1 - {400/(400+x)} = x / (400 + x). In order to compute the Kolmogorov-Smirnov statistic, one
must compare the Empirical and Fitted distributions “just before” and “just after” the observed
points. In this case the maximum absolute difference is 0.34 and occurs just after the fifth observed
point. The fitted F(775) = 0.6596, while the empirical distribution function is 1.
0
0.0971
43 0.0971
0.1029
0.2
0.0661
145 0.2661
0.1339
0.4
0.0319
233 0.3681
0.2319
0.6
0.1025
396 0.4975
0.3025
0.8
0.1404
775 0.6596
0.3404
1
16.31. E. F(x) = (x + x2 )/2, 0 ≤ x ≤ 1. In order to compute the Kolmogorov-Smirnov statistic, one

must compare the Empirical and Fitted distributions “just before” and “just after” the observed
points. In this case the maximum absolute difference is 0.320 and occurs just before the third
observed point. The fitted F(0.8) = 0.720, or if you prefer F(0.799999) = 0.720, while the empirical
distribution function is 2/5 = 0.4.
0
0.055
0.1 0.055
0.145
0.2
0.080
0.4 0.280
0.120
0.4
0.320
0.8 0.720
0.080
0.8
0.055
0.9 0.855
0.145
1
Comment: Since the point 0.8 appears twice in the data, there are (2) (4) = 8 comparisons (rather
than 2 times 5 = 10 comparisons) and the empirical distribution function jumps from 2/5 = 0.4 to
4/5 = 0.8 at 0.8 (rather than a jump of 1/5.)
16.32. E. At x = 5 the fitted distribution is 5/20 = 0.25 while the empirical distribution function is 0.5.
At x = 15 the fitted distribution is 15/20 = 0.75 while the empirical distribution function is 1. The K-S
Statistic is maximum over all x of:
| empirical distribution - fitted distribution | = |0.5-0.25| = |1-0.75| = 0.25.
The absolute difference attains it maximum at x = 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,and 15. The
easiest way to see this is via a graph. Hereʼs a graph of the empirical and assumed uniform
distribution functions:
Hereʼs a graph of the difference between the empirical distribution function and the assumed uniform
distribution function:
One can also compare the assumed and empirical distributions just before and just after each data
point. A portion of that calculation can be seen below:
Empirical Assumed
0.400
0.175
4.5 0.225
0.225
0.450
0.200
5 0.250
0.250
0.500
0.200
6 0.300
0.250
0.550
Empirical Assumed
0.900
0.200
14 0.700
0.250
0.950
0.200
15 0.750
0.250
1.000
0.200
16 0.800
0.200
1.000
16.33. D. For 20 observations the critical values are 1.07 / 20 , etc.:

significance level 0.20 0.10 0.05 0.01
critical value 0.239 0.273 0.304 0.364
One compares the K-S Statistic from the previous question of 0.25 to these critical values.
Since 0.25 > 0.239 we reject at the 0.20 significance level.
Since 0.25 ≤ 0.273 we do not reject at the 0.10 significance level.
16.34. A. The Method of Maximum Likelihood applied to the LogNormal is the same as the
Method of Maximum Likelihood applied to the underlying Normal Distribution.
In turn the Method of Maximum Likelihood applied to the Normal is the same as the Method of
Moments. The average of the log claim sizes is 6.9078 and the second moment of the log claim
sizes is 50.03. Thus the variance of the log claim sizes is: 50.03 - (6.90782 ) = 2.312.
σ2 = 2.312, thus σ = 1.521.
x ln(x) square of ln(x)
100 4.6052 21.2076
500 6.2146 38.6214
1000 6.9078 47.7171
2000 7.6009 57.7737
10000 9.2103 84.8304
6.9078 50.0300
Alternately, for a LogNormal Distribution, the log density is:
ln f(x) = -0.5 ({ln(x)−µ} /σ)2 - ln(x) - ln(σ) - 0.5 ln(2π).
The loglikelihood for n points xi is:
Σ {-0.5 ({ln(xi)−µ} /σ)2 - ln(xi)} - n ln(σ) - 0.5n ln(2π).
Taking the partial derivatives with respect to µ and σ and setting them equal to zero:
0 = Σ{ln(xi)−µ} /σ2, or Σln(xi) = nµ. 0 = Σ {ln(xi)−µ}2 /σ3 - n/σ, or Σ {ln(xi)−µ}2 = nσ2.
Thus µ = (1/n)Σ ln(xi) and σ2 = (1/n)Σ {ln(xi)−µ}2 . Then proceed as above.

16.35. E. The maximum likelihood estimate is µ = 6.908 and σ = 1.521. (This was the solution to
the previous exam question.) At each of the observed claim sizes, compute the values of the fitted
LogNormal distribution: F(x) = Φ[{ln(x)-µ} / σ].
So for example, F(2000) = Φ[{ln(x)-µ} / σ] = Φ[{ln(2000)-6.908} / 1.521] = F[0.46] = 0.6772.
The largest absolute difference is 0.1345 = K-S statistic.
Empirical
X ln(x) (ln(x) - mu)/sigma Fitted F(X) Distribution Empirical - Observed
0
0.0655
100 4.605 -1.51 0.0655
0.1345
0.2
0.1228
500 6.215 -0.46 0.3228
0.0772
0.4
0.1000
1000 6.908 -0.00 0.5000
0.1000
0.6
0.0772
2000 7.601 0.46 0.6772
0.1228
0.8
0.1345
10000 9.210 1.51 0.9345
0.0655
1
16.36. D. F(x) = e-θ/x = e-2/x, an Inverse Exponential Distribution.

The K-S statistic is 0.1679.
0
0.1353
1 0.1353
0.0647
0.2
0.1679
2 0.3679
0.0321
0.4
0.1134
3 0.5134
0.0866
0.6
0.0703
5 0.6703
0.1297
0.8
0.0574
13 0.8574
0.1426
1
Comment: We are told to test the goodness of fit of f(x l θ = 2), therefore, one makes no use of
g(θ). Rather than compute a mixed distribution over all θ, θ is set equal to 2.
The information on g(θ) was used to answer the previous exam question, 4, 5/00, Q.10.
16.37. E. By integrating the density, F(x) = 1 - (1+x)-2.

The K-S statistic is the largest absolute difference: 0.189.
0
0.1736
0.1 0.1736
0.0264
0.2
0.1056
0.2 0.3056
0.0944
0.4
0.1556
0.5 0.5556
0.0444
0.6
0.1500
1 0.7500
0.0500
0.8
0.0110
1.3 0.8110
0.1890
1
Comment: This is a Pareto Distribution with α = 2 and θ = 1.
16.38. E. θ = observed mean = (29 + 64 + 90 + 135 + 182)/5 = 100.

At each of the observed claim sizes, compute the values of the fitted Exponential distribution:
F(x) = 1 - e-x/100. Then compare each fitted probability to the empirical distribution function just
before and just after each observed claim value. (Thus there are twice 5, or 10 comparisons.)
The largest absolute difference is F(64) - 0.2 = 0.4727 - 0.2 = 0.2727 = K-S statistic.
0
0.2517
29 0.2517
0.0517
0.2
0.2727
64 0.4727
0.0727
0.4
0.1934
90 0.5934
0.0066
0.6
0.1408
135 0.7408
0.0592
0.8
0.0380
182 0.8380
0.1620
1
16.39. A. If H0 is true, the probability that the K-S test statistic will be greater than the critical value is
by definition the corresponding significance level. The critical values for the Kolmogorov-Smirnov all
contain 1/ n , where n is the sample size. Thus as the sample size goes to infinity the critical values
go to zero. Let δ and ε be very small values. Let the critical value corresponding to ε/2 be c/ n .
Then take c/ n = δ, or n = (c/δ)2 . Prob[K-S ≥ δ | sample size ≥ (c/δ)2 ] ≤ ε/2. Therefore, the
probability that the K-S test statistic is larger than δ can be made less than ε, for sufficiently large n.
The K-S test statistic tends to zero as the sample size goes to infinity.
For the Chi-square goodness-of-fit test statistic, the critical values are independent of the sample
size; the test statistics does not tend to zero.
The Schwarz Bayesian adjustment is: (number of fitted parameters) ln(number of data points) / 2,
which goes to infinity as the sample size goes to infinity.
Comment: The original exam question had: (B) Anderson-Darling test statistic.
For the Anderson-Darling test statistic the critical values are independent of the sample size; the test
statistic does not tend to zero.
As the sample size goes to infinity, the empirical distribution function approaches the underlying
distribution from which the sample was drawn. Therefore, if the distribution we are comparing to is
this underlying distribution, in other words if the proposed model is appropriate, then the empirical
distribution function approaches the model distribution, as the sample size goes to infinity. As the
sample size goes to infinity, the Kolmogorov-Smirnov test statistic tends to zero. When computing
the Chi-square goodness-of-fit test statistic, for a given interval, the Expected does approach the
Observed as a percent. In other words,
|Expected - Observed|/ Expected goes to zero for each interval. However, the contribution from
each interval, (Expected - Observed)2 / Expected, does not decline.
Notice the square in the numerator. If H0 is true, the sum of the contributions, the Chi-Square
Statistic, approaches a Chi-Square Distribution. A key idea, is that the critical values for the Chi-
Square are not a function of the number of data points, as opposed to the number of intervals. For
example, let us assume 5 degrees of freedom. Then the critical value at 5% is 11.07. Thus there is a
5% chance that the Chi-Square Statistic will be greater than 11.07, for large amounts of data, if H0 is
true. Prob[χ2 > 11.07 | n = 1 million] = 5%. Prob[χ2 > 11.07 | n = 100 million] = 5%.
Prob[χ2 > 11.07 | n = 10,000 million] = 5%.

Clearly the Chi-Square Statistic is not going to zero as n approaches infinity.
You might look at some of the simulations of p-values in “Mahlerʼs Guide to Simulation.”
16.40. E. F(x) = 1 - 1/(1 + x)4 , x > 0.

Empirical Assumed
0.0
0.317
0.1 0.317
0.117
0.2
0.318
0.2 0.518
0.118
0.4
0.402
0.5 0.802
0.202
0.6
0.280
0.7 0.880
0.080
0.8
0.164
1.3 0.964
0.036
1.0
16.41. D. At each of the observed values, compute the values of the assumed distribution:
F(x) = 1 - 1/(1 + x)4 . Then compare each to the empirical distribution function just before and just
after each observed value.
The largest absolute difference is: |F(0.7) - 0.2| = |0.8803 - 0.2| = 0.6803.
0
0.5177
0.2 0.5177
0.3177
0.2
0.6803
0.7 0.8803
0.4803
0.4
0.5233
0.9 0.9233
0.3233
0.6
0.3486
1.1 0.9486
0.1486
0.8
0.1643
1.3 0.9643
0.0357
1
The critical values are: 1.22/ 5 = 0.546, 0.608, 0.662, 0.729.
0.662 < 0.680 < 0.729. ⇒ Reject at 2.5% and do not reject at 1%.
16.42. E. Loss Models states in Section 16.4, “When the parameters are estimated from the data,
the test statistic tends to be smaller than it would have been had the parameter values been
prespecified. That is because the method itself tries to choose parameters that produce a
distribution that is close to the data. In that case, the test become approximate. Because rejection of
the null hypothesis occurs for large values of the test statistic (for the K-S, and Chi-Squar tests), the
approximation tends to increase the probability of a Type II error while decreasing the probability of
a Type I error. Among the test presented here, only the chi-square test has a built-in correction for
this situation. Modifications for the other tests have been developed, but they will not be presented
here.” Thus statement A is false.
Loss Models states that the K-S test should only be used on individual data. (In a footnote at page
448 it mentions that it is possible to modify the K-S test for use with grouped data.) Thus B is false.
For the chi-square goodness-of-fit test, the critical values depend on the number of degrees of
freedom, which depend in turn on the number of intervals rather than the sample size.
Thus statement D is false.
Comment: The original exam question had: (C) The Anderson-Darling test tends to place more
emphasis on a good fit in the middle rather than in the tails of the distribution.
The Anderson-Darling test tends to place more emphasis on a good fit in the tails of the distribution
rather than in the middle of the distribution. Thus statement C is false.
In the case of the Chi-Square or K-S statistics, using the sample data to estimate the parameters of
the distribution produces a better than expected match between assumed and expected, and
therefore tends to decrease the statistic. If one did not somehow compensate for this, (for example
by reducing the degrees of freedom in the Chi-Square by the number of fitted parameters), then
the probability of rejecting (at a specific significance level) would decline. Therefore, the probability
of rejecting when H0 is true would go down; the probability of a Type I error would decline. The
probability of failing to reject when H0 is false would go up; the probability of a Type II error would
increase. In practical applications, one should compensate for using the sample data to estimate the
parameters of the distribution. (Loss Models states that when the data set is used to estimate the
parameters of the null hypothesis distribution, the correct critical values for the Kolmogorov-Smirnov
test are smaller.) If one adjusts properly, it is not obvious why the probability of a Type II error
would change. The K-S statistic can be calculated for individual data, and the critical values to apply
a hypothesis test to this statistic have been tabulated. (There are separate tables to apply for small
samples, not covered on your exam.) If one has grouped data, a maximum and minimum for the
K-S statistic can be determined. In some situations, such as when one has many narrow
intervals and lots of data, this is sufficient to allow one to usefully apply a statistical hypothesis
using these same K-S critical values. However, usually one does not apply the K-S test to
grouped data. One could run a simulation to come up with critical values to apply to a
particular grouped data situation with many narrow intervals, that an actuary might be
frequently encountering. Such a test, using the maximum observed absolute difference, would
not be called the K-S test, although it would clearly be somehow related. The power of such a
test would be less than the K-S test applied to the same data in ungrouped form.
16.43. A. Loss Models states that, the chi-square goodness-of-fit test works best when the
expected number of observations are about equal from interval to interval. Thus Statement A is
false. Loss Models states that, when the data is used to estimate the parameters of the distribution,
the correct critical values for the Kolmogorov-Smirnov test decrease. One rejects the null hypothesis
when the test statistic is greater than the critical values. Therefore, if one does not revise the critical
values, the probability of rejecting the null hypothesis decreases.
More generally, Loss Models states that when the parameters are estimated from the data, if one
does not revise the critical values, the probability of a Type I error is lowered, since the probability
of rejecting the null hypothesis decreases. Statement B is not false.
Loss Models states that when u < ∞, in other words when the data is right censored, the (correct)
critical values for the Kolmogorov-Smirnov test are smaller. Statement C is true.
Comment: The original exam question had:
(D) The Anderson-Darling test does not work for grouped data.
As with the Kolmogorov-Smirnov test, the Anderson-Darling test is designed to work with
ungrouped data. Statement D is not false.
The writer of this question searched several pages of the textbook to find verbal statements to test.
While this type of question is not common on this exam, there have been others such as
4, 5/05, Q.19.
16.44. D. We take all the absolute differences |Fn (x) - F*(x)| and |Fn (x-) - F*(x)|, and find the
largest one. The K-S statistic is 0.111.
4.26 0.0790 0.505 0.584 0.510 0.0740
4.30 0.0890 0.510 0.599 0.515 0.0840
4.35 0.0980 0.515 0.613 0.520 0.0930
4.36 0.1010 0.520 0.621 0.525 0.0960
4.39 0.1110 0.525 0.636 0.530 0.1060
4.42 0.1080 0.530 0.638 0.535 0.1030
Customize the table for n = 200:
α 10% 5% 2% 1%
c 0.086 0.096 0.107 0.115
Since 0.107 < 0.111 < 0.115, we reject H0 at 2% and do not reject at 1%.
Comment: I have no idea why the question said “the fit of the natural logarithms of n = 200 losses
to a distribution.” I think it meant to say, “the fit of n = 200 losses to a distribution.”
F*(x) is the assumed distribution, Fn (x-) is the empirical distribution function just before x,
and Fn (x) is the empirical distribution function at x. See Table 16.3 in Loss Models.
We only look between 4.26 and 4.42, since we are given that the maximum absolute difference
occurs in that region; we are not given any information on what happens outside that region.
2016-C-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/22/15, Page 567
Section 17, Kolmogorov-Smirnov Test, Advanced
In this section, additional ideas related to the Kolmogorov-Smirnov Statistic are discussed:
how to compute the K-S Statistic when data is truncated and/or censored, confidence intervals for the
underlying distribution function, how to get bounds on the K-S Statistic when one has grouped data,
and the critical values for small samples.
Censoring:
One can also compute the Kolmogorov-Smirnov Statistic in the case of censoring from above. Let u
be the censorship point from above.
Then the Kolmogorov-Smirnov Statistic is defined as:
D = Max | ( empirical distrib. function at x) - (theoretical distrib. function at x) |

x≤u
Exercise: An Exponential distribution with θ = 1000 is compared to the following data set censored
from above at 2000: 197, 325, 981, 2000.
[Solution: The K-S statistic is 0.2225, from the absolute difference at or just after 325.
0
0.1788
197 0.1788
0.0712
0.25
0.0275
325 0.2775
0.2225
0.5
0.1251
981 0.6251
0.1249
0.75
0.1147
2000 0.8647
Comment: In this case the K-S statistic was the same as in a similar exercise without censorship. If
prior to censorship the maximum departure had occurred for large losses, the K-S statistic would
have been smaller after censorship.]
With censoring from above, we only make comparisons below the censorship point,
including one just before the censorship point.
At the censorship point, both the empirical distribution and the fitted/assumed distribution after
adjusting it for censorship, are unity; the difference is zero. In the presence of censoring from
above, the critical values should be somewhat smaller; because we are unable to make
comparisons above the censorship point, there is less opportunity for random fluctuation to
produce a large K-S statistic.242
The assumption we are testing is that the unlimited losses came from a certain distribution.
When we have data censored from above, the distribution function jumps to 1 at the censorship
point.
For example, with a policy limit of 25,000, losses of size 25,000 or greater all appear in our data
base at 25,000.
If for example the unlimited data followed an Exponential Distribution with mean 100,000,
F(25,000) for this Exponential Distribution is: 1 - e-0.25 = 0.221. Yet the empirical distribution
function based on data simulated from this Exponential censored from above at 25,000 would be
one at 25,000. We would expect to see about 78% of the losses listed in our simulated censored
data base as 25,000!
A comparison at 25,000 would tell us nothing about whether our assumption is true. It would be
inappropriate to compare 1 to F(25000) for a distribution such as the Exponential that has not been
altered for censoring.243
Truncation:
One can also compute the Kolmogorov-Smirnov Statistic in the case of truncation from below. Let t
be the truncation point from below.
Then the Kolmogorov-Smirnov Statistic is defined as:
D = Max | ( empirical distrib. function at x) - (theoretical distrib. function at x) |

t≤x
Where the theoretical distribution has been altered for the effects of truncation. 244
242
See page 332 of Loss Models and “Mahlerʼs Guide to Simulation.”
243
One could compare 1 to 1, the theoretical distribution at the censorship point after adjusting for the effects of
censoring. This is similar to comparing the empirical and theoretical distributions at infinity in the absence of
censoring. Taking the absolute difference between 1 and 1, would not affect the Kolmogorov-Smirnov Statistic.
244
See “Mahlerʼs Guide to Loss Distributions,” or Section 16.2 of Loss Models.
Exercise: Losses prior to truncation are assumed to follow an Exponential distribution with θ = 1000.
This assumption is compared to the following data set truncated from below at 250:
325, 981, 2497.
F(x) - F(250) e- 250 / 1000 - e - x / 1000
[Solution: After truncation from below at 250, G(x) = =
S(250) e- 250 / 1000
= 1 - e-(x-250)/1000, x > 250.

Assumed Dist. Empirical Absolute Value of
X after Distribution Assumed - Empirical
Truncation 0
0.0723
325 0.0723
0.2611
0.3333
0.1852
981 0.5186
0.1481
0.6667
0.2276
2497 0.8943
0.1057
1
The K-S statistic is 0.2611, from the absolute difference at or just after 325.]
Truncation and Censoring:
Let t be the truncation point from below, and u be the censorship point from above.
F* is the fitted/assumed distribution, after altering a ground-up, unlimited size of loss distribution for
the effects of any truncation from below.245 Fn is the empirical distribution.
The Kolmogorov-Smirnov Statistic is defined as:246

D = Max | Fn (x) - F*(x) |
t≤x≤u
In the absence of truncation and censoring, this reduces to the previously discussed definition of the
Kolmogorov-Smirnov Statistic.
245
What I have previously called G(x).
246
Exercise: Losses prior to truncation and censoring are assumed to follow a Weibull distribution with
θ = 3000 and τ = 1/2.
This assumption is compared to the following data set truncated from below at 1000 and censored
from above at 10,000: 1219, 1737, 2618, 3482, 4825, 6011, 10,000, 10,000, 10,000, 10,000.
[Solution: After truncation from below at 1000,
F(x) - F(1000) exp[- 1000 / 3000 ] - exp[- x / 3000 ]
F*(x) = =
S(1000) exp[- 1000 / 3000 ]
= 1 - 1.7813 exp[- x / 3000 ], 1000 < x < 10,000.

Assumed Dist. Empirical Absolute Value of
X after Truncation Distribution Assumed - Empirical
0.0
0.0583
1219 0.0583
0.0417
0.1
0.0677
1737 0.1677
0.0323
0.2
0.1001
2618 0.3001
0.0001
0.3
0.0935
3482 0.3935
0.0065
0.4
0.0989
4825 0.4989
0.0011
0.5
0.0675
6011 0.5675
0.0325
0.6
0.1130
10000 0.7130
The K-S statistic is 0.1130, from the absolute difference just before 10,000.]
Confidence Interval for the Distribution Function:
The Kolmogorov-Smirnov Statistic can be used to place error bars around the empirical distribution.
In this case of the ungrouped data from Section 2 with 130 data points, the critical value for 1% is
0.143.
Prob[Max | Empirical Distribution - Actual Underlying Distribution Function | > 0.143] = 1%.
There is a 99% probability that the actual distribution function from which this data was drawn is within
±0.143 of the empirical distribution function, for all x.
Below are shown bands of ±0.143 around the empirical distribution function (thick line).
0.8
0.6
0.4
0.2
x
1000 10,000 100,000 1 million
The bands are restricted to never go below 0 or above 1, as for any distribution function.
99% of the time, the distribution will remain everywhere inside these bands.
Only 1 time in a 100 will the true distribution lie outside for any size of loss.
One could get a confidence interval for the distribution function at one specific size of loss, by using
the variance of the empirical distribution function. However, this confidence band gotten via the K-S
Statistic applies to all sizes of loss simultaneously.
As shown below, the 10% critical value of 0.107 would lead to narrower error bars that would contain
the true distribution 100% - 10% = 90% of the time:
0.8
0.6
0.4
0.2
x
1000 10,000 100,000 1 million
In general, if the Kolmogorov-Smirnov critical value is c for a significance level of α, then

with a probability of 1 - α :
{empirical distribution function - c} ≤ F(x) ≤ {empirical distribution function + c}
For the ungrouped data in Section 2, there are 130 claims. Therefore, the critical value for the K-S
Statistic at 20% is: 1.07 / 130 = 9.38%. Thus at any point the actual distribution function F(x)
underlying the risk process that presumably generated this data, whatever it is, has a 80% chance of
being within ±9.38% of the observed value. Note that this result does not depend on any
assumption about the form of F(x)!
For example, the empirical distribution function for the ungrouped data in Section 2 is 60.00% at
150,000.247 Thus there is an 80% chance that F(150000) is within the interval 60.00% ± 9.38% or
[0.5062, 0.6938]. Similarly, the empirical distribution function is 94.62% at
1 million.248 Thus there is an 80% chance that F(1 million) is within the interval 94.62% ± 9.38%.
247
Of the 130 claims in Section 2, 78 are less than or equal to 150,000. 78/130 = 0.6.
248
Of the 130 claims in Section 2, 123 are less than or equal to 1,000,000. 123/130 = 0.9462.
However, since any distribution function never exceeds unity, the 80% confidence interval for
F(1 million) is: [0.8524, 1]. The same adjustment would need to be made for confidence intervals
that would go below zero, since the distribution function is always greater than or equal to zero.
Below are shown the 99% confidence intervals, as well as the maximum likelihood Exponential
Distribution (dashed). Since the Exponential Distribution gets outside of the 99% confidence bands,
we reject it at 1%.249
0.8
0.6
0.4
0.2
x
1000 10,000 100,000 1 million
Exercise: You observe a set of 1000 claims. Ranked from smallest to largest, the 721st claim is
$24,475, while the 722nd claim is $25,050. Using the K-S Statistic, determine a 95% confidence
interval for the value of the underlying distribution function at $25,000.
[Solution: The observed value of the distribution function at $25000 is: 721 / 1000 = 0.721. The
critical value for the K-S statistic at 5% is: 1.36/ 1000 = 0.043.
So the 95% confidence interval is: 0.721 ± 0.043 = (0.678, 0.764). ]
To get a confidence interval around the empirical distribution function for the underlying distribution
from which the data was drawn:
80% confidence interval ⇔ ± 20% critical value of K-S Statistic

Restrict the interval to be at least 0 and no more than 1.
249
This is just the same information as presented in a previous difference graph, arranged in a different manner.
This result of using the K-S Statistic to get confidence intervals for the Distribution Function is
very powerful. It has great practical value when one is dealing with the center of the
Distribution function or one has many data points. For example, with 10,000 data points the
critical value for a 1% significance level is 1.63%. So one can construct narrow bands around
the empirical distribution function in this case. Another way to think about it, with enough data
there is sometimes no reason to fit a distribution rather than just use the empirical distribution.
In contrast, for the ungrouped data in Section 2 with 130 data points, in the tail of the
distribution the error bars constructed by the Kolmogorov-Smirnov Statistic are not useful.
Grouped Data:
The Kolmogorov-Smirnov Distribution is designed to work with individual data, in other words
ungrouped data. However, for grouped data one can compute the maximum absolute difference
between the fitted and empirical distributions. While this maximum discrepancy between the two
distributions is useful to test the relative closeness of the different fits, the same statistical test for the
Kolmogorov-Smirnov Statistic as applied with ungrouped data does not apply, since one is only
testing the difference at the endpoints of the intervals.250
For example for the Burr distribution251 fit via maximum likelihood to the grouped data in Section 3,
the maximum absolute difference between fitted and empirical distribution functions is .0014 and is
calculated as follows:
Bottom of Top of # claims Burr Empirical Absolute
Interval Interval in the F(upper) F(upper) Difference
0 5 2208 0.2202 0.2208 0.0006
5 10 2247 0.4464 0.4455 0.0009
10 15 1701 0.6170 0.6156 0.0014
15 20 1220 0.7364 0.7376 0.0012
20 25 799 0.8175 0.8175 0.0000
25 50 1481 0.9652 0.9656 0.0004
50 75 254 0.9909 0.9910 0.0001
75 100 57 0.9970 0.9967 0.0003
100 Infinity 33 1.0000 1.0000 0.0000
10000
250
As the intervals become narrower the distinction between grouped and ungrouped data is less important.
251
The Burr Distribution, F(x) = 1- {1/(1+(x/θ)γ)}α, fit via maximum likelihood has parameters:
α = 3.9913, θ = 40,467 and γ = 1.3124.
For the curves fit via Maximum Likelihood to the grouped data in Section 3 the results are:252
Maximum Absolute Difference

Fitted vs. Empirical Distribution
Burr 0.0014
Generalized Pareto 0.0037
Transformed Gamma 0.0063
Gamma 0.0171
Weibull 0.0199
LogNormal 0.0274
The Burr distribution is once again the best fit to the grouped data in Section 3, while none of the two
parameter curves fit this grouped data well. When a curve fits as badly as the Weibull, one can in
fact reject the fit. The Kolmogorov-Smirnov Statistic is at least 0.0199, since the Kolmogorov-
Smirnov Statistic would be the maximum, taken over more points, of the absolute difference.
Therefore, since the critical value at a 1% significance level for 10,000 points is 0.0163, we can in fact
reject the Weibull at a 1% significance level.
In any particular situation involving grouped data, one can put bounds on the K-S Statistic.
The K-S Statistic is at least the maximum absolute discrepancy between the empirical and
theoretical Distribution Functions. For example, in the case of the Burr, the K-S Statistic is at least
0.0014 as shown above.
252
See the Section on fitting via maximum likelihood to grouped data, for the parameters of the fitted distributions.
One can also put an upper bound on the K-S Statistic, by putting bounds on what the Empirical
Distribution Function would look like if the data had been ungrouped.
For example in the case of the Burr, by checking just before and just after each endpoint:253 254
x Empirical Distribution Theoretical Absolute Differences

Maximum Minimum Distribution Theoretical vs. Empirical
0.001 0.2208 0 0 0.2208 0
4999.999 0.2208 0 0.2202 0.0006 0.2202
5000.001 0.4455 0.2208 0.2202 0.2253 0.0006
9999.999 0.4455 0.2208 0.4464 0.0009 0.2256
10,000.001 0.6156 0.4455 0.4464 0.1692 0.0009
14,999.999 0.6156 0.4455 0.6170 0.0014 0.1715
15,000.001 0.7376 0.6156 0.6170 0.1206 0.0014
19999.999 0.7376 0.6156 0.7364 0.0012 0.1208
20,000.001 0.8175 0.7376 0.7364 0.0811 0.0012
24,999.999 0.8175 0.7376 0.8175 0.0000 0.0801
25,000.001 0.9656 0.8175 0.8175 0.1481 0.0000
49999.999 0.9656 0.8175 0.9652 0.0004 0.1477
50,000.001 0.9910 0.9656 0.9652 0.0258 0.0004
74,999.999 0.9910 0.9656 0.9909 0.0001 0.0253
75,000.001 0.9967 0.9910 0.9909 0.0058 0.0001
99,999.999 0.9967 0.9910 0.9970 0.0003 0.0060
100,000.001 1.0000 0.9967 0.9970 0.0030 0.0003
1 trillion255 1.0000 0.9967 1.0000 0.0000 0.0033
This results in absolute differences of up to 0.2256, resulting from the maximum possible empirical
distribution just below 10,000 of 0.2208 versus the theoretical distribution there of 0.4464.
One can arrange this calculation in the form of a spreadsheet. It makes use of the same values for the
theoretical and empirical distribution functions as used in the previous spreadsheet used to compute
the lower bound for the K-S Statistic. However, now one offsets the Empirical Distribution Function
both up a row and down a row compared to the Theoretical Distribution Function. Then one has to
compute two columns of absolute differences and search for the largest value in either of the two
columns.
253
The Burr Distribution, F(x) = 1 - 1/{1+(x/θ)γ}α, fit via maximum likelihood has parameters:
α = 3.9913, θ = 40,467 and γ = 1.3124.
254
While the computation is somewhat mind numbing, some clever people can pick out the likely placed where the
largest values will result. In real world applications one would have this computation programmed, while on the exam
one would hopefully have very few intervals.
255
Any very large number such that the theoretical distribution function is very close to unity.
The spreadsheet to calculate the maximum possible K-S Statistic looks as follows for the maximum
likelihood Burr Distribution256 versus the Grouped Data from Section 3:
Endpoint Absolute Empirical Burr Empirical Absolute

of Interval Difference Distribution Distribution Difference
$ Thous. For Maximum For Maximum
K-S Stat. K-S Stat.
0 0.0000 0.2208 0.2208
5 0.2202 0.0000 0.2202 0.4455 0.2253
10 0.2256 0.2208 0.4464 0.6156 0.1692
15 0.1715 0.4455 0.6170 0.7376 0.1206
20 0.1208 0.6156 0.7364 0.8175 0.0811
25 0.0799 0.7376 0.8175 0.9656 0.1481
50 0.1477 0.8175 0.9652 0.9910 0.0258
75 0.0253 0.9656 0.9909 0.9967 0.0058
100 0.0060 0.9910 0.9970 1.0000 0.0030
Infinity 0.0033 0.9967 1.0000
Based on the above spreadsheet and the previous spreadsheet, in the case of the Burr, the
Kolmogorov-Smirnov statistic is at least 0.0014 and at most 0.2256. If there were many narrow
intervals, this technique would allow one to get a fairly good estimate of the K-S statistic.
The same computation can be understood graphically. Here the Burr Distribution (dashed) is
compared to the minimum possible empirical distribution function (solid), which assumes that in each
interval all of the claims occur at the upper end:
Prob.
1.0
0.8
0.6
0.4
0.2
size
20000 40000 60000 80000 100000
The Burr Distribution, F(x) = 1 - {1/(1+(x/θ)γ)}α, fit via maximum likelihood has parameters:
256
α = 3.9913, θ = 40,467 and γ = 1.3124.

Here the Burr Distribution (dashed) is compared to the maximum possible empirical distribution
function (solid), which assumes that in each interval all of the claims occur at the lower end:
Prob.
1.0
0.8
0.6
0.4
0.2
size
20000 40000 60000 80000 100000
Here are the differences between the fitted Burr Distribution minus the minimum possible Empirical
Distribution Function (above the x-axis), and the fitted Burr Distribution minus the maximum possible
Empirical Distribution Function (below the x-axis):
0.2
0.1
20000 40000 60000 80000 100000
- 0.1
- 0.2
The maximum absolute difference, at the marked point (10000, 0.2256), corresponds to the upper
bound on the K-S statistic of 0.2256, computed previously.
Exercise: Compute the bounds on the Kolmogorov-Smirnov statistic for the Weibull
distribution fit via maximum likelihood to the Grouped Data in Section 3.
Note this Weibull has parameters: θ = 16,184 and τ = 1.0997.
[Solution:
Bottom of Top of # claims Weibull Empirical Absolute

Interval Interval in the F(upper) F(upper) Difference
0 5 2208 0.2403 0.2208 0.0195
5 10 2247 0.4451 0.4455 0.0004
10 15 1701 0.6014 0.6156 0.0142
15 20 1220 0.7170 0.7376 0.0206
20 25 799 0.8007 0.8175 0.0168
25 50 1481 0.9685 0.9656 0.0029
50 75 254 0.9955 0.9910 0.0045
75 100 57 0.9994 0.9967 0.0027
100 Infinity 33 1.0000 1.0000 0.0000
10000
Thus the K-S Statistic is at least 0.0206.

Endpoint Absolute Empirical Weibull Empirical Absolute
of Interval Difference Distribution Distribution Difference
K-S Stat. K-S Stat.
0 0.0000 0.2208 0.2208
5 0.2403 0.0000 0.2403 0.4455 0.2052
10 0.2243 0.2208 0.4451 0.6156 0.1705
15 0.1559 0.4455 0.6014 0.7376 0.1362
20 0.1014 0.6156 0.7170 0.8175 0.1005
25 0.0631 0.7376 0.8007 0.9656 0.1649
50 0.1510 0.8175 0.9685 0.9910 0.0225
75 0.0299 0.9656 0.9955 0.9967 0.0012
100 0.0084 0.9910 0.9994 1.0000 0.0006
Infinity 0.0033 0.9967 1.0000
Thus the K-S Statistic is at most 0.2403.]
Loss Models states that the Kolmogorov-Smirnov test should only be used on individual data,
in other words on ungrouped data. However, as has been discussed, if one has grouped data, a
maximum and minimum for the Kolmogorov-Smirnov statistic can be determined.
In some situations, such as when one has many narrow intervals and a large amount of data,
one can usefully apply a statistical hypothesis using the same Kolmogorov-Smirnov critical
values as used for ungrouped data. However, usually one does not apply the
Kolmogorov-Smirnov test to grouped data.
One could run a simulation in order to determine critical values to apply to a particular grouped
data situation with many narrow intervals, that an actuary might be frequently encountering.257
Such a test, using the maximum observed absolute difference, would not be called the
Kolmogorov-Smirnov test, although it would clearly be somehow related. The power of such a
test would be less than the Kolmogorov-Smirnov test applied to the same data in ungrouped
form.
Comparing Two Empirical Distributions:258
Similar techniques to those discussed can be used to test the hypothesis that two data sets were
each drawn from the same distribution of unknown type.
The test statistic is:
D = Max | (first empirical distrib. function at x) - (second empirical distrib. function at x) |.

x
As previously, the maximum occurs just before or just at one of the data points.
However, now we have to check at each of the data points in each of the two data sets.
Let n1 be the size of the first data set, and n2 be the size of the second data set.
The critical values can be approximated for large data sets by letting n = n1 n2 / (n1 + n2 ),
and using the previously given table of critical values for the K-S Statistic.
Exercise: If the sample sizes are 30 and 50, what is the critical value for 10%?
[Solution: n = (30)(50)/80 = 18.75. The critical value is: 1.22/ 18.75 = 0.282.]
Thus, with sample sizes of 30 and 50, if D calculated as above were greater than 0.282, one would
reject at 10% the null hypothesis that the two data sets were drawn from the same distribution.
257
See “Mahlerʼs Guide to Simulation.”
258
K-S Critical Values for Small Samples:
As has been discussed, one can obtain K-S critical values for sample size n by using tables such as
the following:259
α 0.20 0.10 0.05 0.025 0.01

| | | | |
c 1.07/ n 1.22/ n 1.36/ n 1.48/ n 1.63/ n
This is an asymptotic approximation, which gets better as the sample size increases.
For small samples, for some practical applications, this approximation may not be good enough.
In such cases, one can use statistical tables which are available in various textbooks.
For example, here is a table of (exact) critical values for sample sizes of 1 to 10:260
Level of significance
n 0.10 0.05 0.02 0.01
1 0.95000 0.97500 0.99000 0.99500
2 0.77639 0.84189 0.90000 0.92929
3 0.63604 0.70760 0.78456 0.82900
4 0.56522 0.62394 0.68887 0.73424
5 0.50945 0.56328 0.62718 0.66853
6 0.46799 0.51926 0.57741 0.61661
7 0.43607 0.48342 0.53844 0.57581
8 0.40962 0.45427 0.50654 0.54179
9 0.38746 0.43001 0.47960 0.51332
10 0.36866 0.40925 0.45662 0.48893
Exercise: For a sample sizes of 5 and 10, compare the 10% critical value from the asymptotic
approximation to that from the above table of exact values.
[Solution: For n = 5, 1.22 / 5 = 0.546, compared to 0.50945.
For n = 10, 1.22 / 10 = 0.386, compared to 0.36866.]
259
Somewhat more precise critical values can be obtained by taking c/( n + 0.12 + 0.11/ n ) rather than c/ n .
See page 105 of Goodness-of-Fit-Techniques by DʼAgostino and Stephens.
260
From Practical Reliability Engineering, Fifth Edition, by Patrick D. T. OʼConnor and Andre Kleyner.
K-S Distribution for Small Samples:
For a sample size of 10, here is a graph of the K-S survival function:261
S(d)
1.0
0.8
0.6
0.4
0.2
d
0.1 0.2 0.3 0.4 0.5
For example, S(0.36866) = 10%, so 0.36866 is the 10% critical value.

For a sample size of 10, here is a graph of the density of the K-S statistic:
f(d)
d
0.1 0.2 0.3 0.4 0.5 0.6
261
Computed using a technique not on the syllabus, to be discussed subsequently.
For a sample size of 10, here is comparison between the exact survival function and that based on
∞
the asymptotic approximation, Prob[ n Dn < x] = 1 - 2 ∑ (-1)i - 1 exp[-2 i2 x2]:
i=1
S(d)
0.30
0.25
0.20 Asymptotic
Approximation
0.15
0.10 Exact
0.05
d
0.35 0.40 0.45 0.50
Here is a similar comparison, but for a sample size of 100:
S(d)
0.25
Asymptotic
0.20 Approximation
0.15
Exact
0.10
0.05
d
0.12 0.14 0.16 0.18 0.20
Computing the K-S Distribution Function:262
We wish to compute Prob[Dn ≤ d]. Here is an example for sample size 10.
Let k = 1 + [dn], where [ ] represents the largest integer in.

Let h = k - dn, 0 ≤ h < 1. So that d = (k-h)/n. Also let m = 2k - 1.
For example, if d = 0.36866 and n = 10, then k = 4, m = 7, and h = 0.3134.
Then we need to compute the symmetric m by m matrix H, which has the following form for m = 7:
⎛ (1- h) / 1! 1 0 0 0 0 0 ⎞
⎜ 2
(1- h ) / 2! 1/ 1! 1 0 0 0 0 ⎟
⎜ ⎟
⎜ (1- h3) / 3! 1/ 2! 1/ 1! 1 0 0 0 ⎟
⎜ (1- h4) / 4! 1/ 3! 1/ 2! 1/ 1! 1 0 0 ⎟
⎜ ⎟
⎜ (1- h5) / 5! 1/ 4! 1/ 3! 1/ 2! 1/ 1! 1 0 ⎟
⎜ (1- h6) / 6! 1/ 5! 1/ 4! 1/ 3! 1/ 2! 1/ 1! 1 ⎟
⎜ ⎟
⎝ {1 - 2hm + max(0, 2h - 1)m} / m! (1- h 6) / 6! (1- h5) / 5! (1- h4) / 4! (1- h3 ) / 3! (1- h 2) / 2! (1- h)/ 1!⎠
For h = 0.3134:
⎛ 0.6866 1 0 0 0 0 0 ⎞
⎜ 0.45089 1 1 0 0 0 0 ⎟
⎜ ⎟
⎜ 0.161536 0.5 1 1 0 0 0 ⎟
H = ⎜ 0.0412647 0.166667 0.5 1 1 0 0 ⎟.
⎜ ⎟
⎜ 0.00830814 0.0416667 0.166667 0.5 1 1 0 ⎟
⎜ 0.00138757 0.00833333 0.0416667 0.166667 0.5 1 1 ⎟
⎜ ⎟
⎝ 0.000198295 0.00138757 0.00830814 0.0412647 0.161536 0.45089 0.6866⎠
Then Prob[Dn ≤ d] is the k-k element of: n! (H/n)n .
In this case, we want the 4-4 element of: 10! (H/10)10.

We compute in this manner that Prob[D10 ≤ 0.36866] = 0.899997,
so that for a sample size of 10, the 10% critical value is 0.36866.263
262
Based on “Evaluating Kolmogorovʼs Distribution,” by George Marsaglia, Wai Tan Tsang, and Jinbo Wang,
Journal of Statistical Software.
263
All computations were done in Mathematica.
Problems:
For the following questions, use the following table for the Kolmogorov-Smirnov statistic.
α 0.20 0.10 0.05 0.025 0.01
| | | | |
c 1.07/ n 1.22/ n 1.36/ n 1.48/ n 1.63/ n
17.1 (2 points) You observe a set of 10,000 claims. Ranked from smallest to largest, the 9700th
claim is $1,096,000, while the 9701th claim is $1,112,000.
Use the Kolmogorov-Smirnov statistic in order to determine a 90% confidence interval for the value
of the underlying distribution function at $1,100,000.
A. [0.94, 1.00]
B. [0.9537, 0.9863]
C. [0.9564, 0.9836]
D. [0.9578, 0.9822]
E. [0.9593, 0.9807]
17.2 (2 points) You are given the following information about a random sample:
(i) The sample size equals five.
(ii) Two of the sample observations are known to exceed 50, and the remaining three
observations are 20, 30 and 45.
Ground up unlimited losses are assumed to follow an Exponential Distribution with θ = 65.
A. Less than 0.20
E. At least 0.26
17.3 (4 points) You observe the following five ground-up claims from a data set that is truncated
from below at 100: 120 150 190 260 580
Loss sizes prior to truncation are assumed to follow a Loglogistic distribution with γ = 2 and θ = 150.
A. Less than 0.16
E. At least 0.22

241,513 231,919 105,310 125,152 116,472
110,493 139,647 220,942 161,964 105,829
Use the Kolmogorov-Smirnov statistic in order to determine an 80% confidence interval for the value
of the underlying distribution function at $150,000.
A. [0.24, 0.96] B. [0.26, 0.94] C. [0.28, 0.92] D. [0.30, 0.90] E. [0.32, 0.88]
17.5 (3 points) Losses truncated from below at 50 are of size: 64, 90, 132, 206.
Loss sizes prior to truncation are assumed to follow an Exponential distribution with θ = 100.
(A) 0.19 (B) 0.21 (C) 0.23 (D) 0.25 (E) 0.27
17.6 (3 points) With a deductible of 500 and a maximum covered loss of 5000, the following five
payments are made: 179, 352, 968, 1421, 4500.
Loss sizes prior to the effect of the deductible and maximum covered loss are assumed to follow a
Pareto Distribution with parameters α = 2 and θ = 1000.
(A) 0.19 (B) 0.21 (C) 0.23 (D) 0.25 (E) 0.27
17.7 (3 points) Losses prior to censoring are assumed to follow a Weibull distribution
with θ = 3000 and τ = 2.
This assumption is compared to the following data set censored from above at 5000:
737, 1618, 2482, 3003, 5000.
(A) 0.17 (B) 0.19 (C) 0.21 (D) 0.23 (E) 0.25
17.8. You observe the following 35 losses:

6 17 30 48 78 198 514
7 18 34 49 103 227 546
11 19 38 53 124 330 750
14 25 40 60 140 361 864
15 29 41 63 192 421 1638
Use the Kolmogorov-Smirnov statistic in order to get a 90% confidence band for the distribution
from which this data was drawn.
Which of the following is the resulting interval for the value of the underlying distribution function at
200?
A. [0.47, 1.00] B. [0.49, 0.99] C. [0.51, 0.97] D. [0.53, 0.95] E. [0.55, 0.93]
17.9 (3 points) Data has been left truncated at 100: 140, 260, 450, 700, 1100.
Prior to truncation, the assumed distribution is a Pareto with α = 3 and θ = 1000.
(A) 0.16 (B) 0.18 (C) 0.20 (D) 0.22 (E) 0.24
17.10 (4, 5/88, Q.58) (2 points) A random sample was taken from an unknown distribution
function F(x):
Rank x Rank x Rank x Rank x
1 3 5 12 9 30 13 99
2 4 6 13 10 40 14 105
3 9 7 15 11 55 15 115
4 10 8 20 12 78 16 129
Using the Kolmogorov-Smirnov statistic, calculate a 90% confidence band for F(x) at x = 40.
A. The lower bound of the confidence band is less than 0.30
and the upper bound of the confidence band is greater than 0.95
B. The lower bound of the confidence band lies between 0.30 and 0.35
and the upper bound of the confidence band lies between 0.90 and 0.95
C. The lower bound of the confidence band lies between 0.35 and 0.40
D. The lower bound of the confidence band lies between 0.40 and 0.45
E. The lower bound of the confidence band lies between 0.45 and 0.50
17.11 (4B, 11/93, Q.22) (2 points) A random sample, x1 , ..., x20 is taken from a probability
distribution function F(x).
1.07, 1.07, 1.12, 1.35, 1.48, 1.59, 1.60, 1.74, 1.90, 2.02,
2.05, 2.07, 2.15, 2.16, 2.21, 2.34, 2.75, 2.80, 3.64, 10.42
Determine the 90% confidence band for F(x) at x = 1.50 using the Kolmogorov-Smirnov statistic.
A. (-0.023, 0.523)
B. (0, 0.523)
C. (0, 0.804)
D. (1.070, 1.900)
E. (1.070, 2.020)
17.12 (4B, 5/97, Q.28 & Course 4 Sample Exam 2000, Q. 23) (2 points)
• Forty (40) observed losses have been recorded in thousands of dollars and
are grouped as follows:
Interval Number of
($000) Losses
(1, 4/3) 16
[4/3, 2) 10
[2, 4) 10
[4, ∞) 4
• The null hypothesis, H0 , is that the random variable X underlying the observed
losses, in thousands, has the density function f(x) = 1/x2 , 1 < x < ∞.
Since exact values of the losses are not available, it is not possible to compute the exact value of
the Kolmogorov-Smirnov statistic used to test H0 . However, it is possible to put bounds on the
value of this statistic. Based on the information above, determine the smallest possible value and
the largest possible value of the Kolmogorov-Smirnov statistic used to test H0 .
A. Smallest possible value = 0.10, Largest possible value = 0.25
B. Smallest possible value = 0.10, Largest possible value = 0.40
C. Smallest possible value = 0.15, Largest possible value = 0.25
D. Smallest possible value = 0.15, Largest possible value = 0.40
E. Smallest possible value = 0.25, Largest possible value = 0.40
(3,000, 5,000] 6
(5,000, 10,000] 29
(10,000, 25,000] 39
(25,000, ∞) 26
with parameters α = 2 and θ = 25,000.
Since exact values of the claims are not available, it is not possible to compute the exact value of
the Kolmogorov-Smirnov statistic used to test H0 . However, it is possible to put bounds on the
value of this statistic. Referring to the information above, determine the smallest possible value of
the Kolmogorov-Smirnov statistic used to test H0 .
A. Less than 0.03
E. At least 0.12
17.1. D. The value of the empirical distribution function at $1,100,000 is 9700 / 10000 = 0.9700.
The critical value for the K-S- stat. at 10% is 1.22/ 10,000 = 0.0122.
So the 90% confidence interval is 0.9700 ± 0.0122.
Comment: Note that since the K-S statistic is never negative, the 90% confidence interval uses the
critical value for 10%.
The variance of the empirical distribution function at 1.1 million is approximately:
(0.97)(1 - 0.97)/10,000 = 0.00000291. Thus a 90% confidence interval for the empirical distribution
function at 1.1 million is approximately:
0.9700 ± 1.645 0.00000291 = 0.9700 ± 0.0028. This is much narrower than the confidence
interval gotten via the K-S Statistic, which applies to all sizes of loss simultaneously.
17.2. E. F(x) = 1 - exp[-(x/65)], x < 50.

0.0
0.2649
20 0.2649
0.0649
0.2
0.1697
30 0.3697
0.0303
0.4
0.0996
45 0.4996
0.1004
0.6
0.0634
50 0.5366
The largest absolute difference is: |0.2649 - 0| = 0.2649 = K-S statistic.
Comment: We make the final comparison just before the censorship point of 50:
|0.5366 - 0.6| = 0.0634.
17.3. B. Prior to truncation the distribution function is:

F(x) = (x/150)2 / {1 + (x/150)2 } = x2 /(22500 + x2 ).
F(100) = 0.3077. S(100) = 0.6923.
After truncation from below at 100 the distribution function is:
G(x) = {F(x) - F(100)} / S(100) = (F(x) - 0.3077)/0.6923.
At each of the observed loss sizes, compute the values of G(x).
Then compare each value to the empirical distribution function just before and just after each
observed loss. The largest absolute difference is 0.161 = K-S statistic.
X F(x) 0.0000
G(x) 0
Distribution G(x)0.0000
- Empirical
100 0.3077
0
0.1192
120 0.3902 0.1192
0.0808
0.2
0.0778
150 0.5000 0.2778
0.1222
0.4
0.0454
190 0.6160 0.4454
0.1546
0.6
0.0393
260 0.7503 0.6393
0.1607
0.8
0.1094
580 0.9373 0.9094
0.0906
1
17.4. B. Since 6 claims out of 10 are less than or equal to 150,000, the value of the empirical
distribution function at $150,000 is 6/10 = 0.6. The critical value for the K-S Statistic at 20% is:
1.07/(100.5) = 0.34. So the 80% confidence interval is: 0.60 ± 0.34.
Comment: Since the K-S Stat. is never negative, the 80% confidence interval uses the critical value
for 20%.
1.07/ n is the 20% critical value from the K-S table.
You will be provided the K-S Table in any question where you need it.
17.5. B. Prior to truncation the distribution function is: F(x) = 1 - e-x/100.

G(x) = {F(x) - F(50)}/S(50) = {e-50/100 - e-x/100}/e-50/100 = 1 - e-(x-50)/100.
At each of the observed loss sizes, compute the values of G(x).
observed loss. The largest absolute difference is 0.2101 = K-S statistic.
X G(x) Distribution G(x) - Empirical
0
0.0000
0.0000
0.0000
0
0.1306
64 0.1306
0.1194
0.25
0.0797
90 0.3297
0.1703
0.5
0.0596
132 0.5596
0.1904
0.75
0.0399
206 0.7899
0.2101
1
17.6. C. Prior to truncation and censoring the distribution function is:

F(x) = 1 - {1000/(1000 + x)}2 .
After truncation from below at 500 and censoring from above at 5000 the distribution function is:
G(x) = {F(x) - F(500)}/S(500) = 1 - S(x)/S(500) = 1 - {1000/(1000 + x)}2 /{1000/1500}2 =
1 - {1500/(1000 + x)}2 , 500 < x < 5000.
Payments of: 179, 352, 968, 1421, 4500, correspond to losses of size: 679, 852, 1468, 1921,
5000 or more.
At each of the loss sizes below the censorship point of 5000, compute the values of G(x).
observed loss. We also compare the empirical distribution function and G(x) just before 5000.
The largest absolute difference is 0.2306 = K-S statistic.
X G(x) Distribution G(x) - Empirical
0
0.0000
0.0000
0.0000
0
0.2019
679 0.2019
0.0019
0.2
0.1440
852 0.3440
0.0560
0.4
0.2306
1468 0.6306
0.0306
0.6
0.1363
1921 0.7363
0.0637
0.8
0.1375
5000 0.9375
17.7. A. F(x) = 1 - exp[-(x/3000)2 ], x < 5000.

0.0
0.0586
737 0.0586
0.1414
0.2
0.0524
1618 0.2524
0.1476
0.4
0.0956
2482 0.4956
0.1044
0.6
0.0329
3003 0.6329
0.1671
0.8
0.1378
5000 0.9378
The largest absolute difference is: |0.6329 - 0.8| = 0.1671 = K-S statistic.
Comment: We make the final comparison just before the censorship point of 5000:
|0.9378 - 0.8| = 0.1378.
17.8. D. Since 26 claims out of 35 are less than or equal to 200, the empirical distribution function at
200 is 26/35 = 0.74. The critical value for the K-S Statistic at 10% is 1.22/350.5 = 0.0206.
So the 90% confidence interval is: 0.74 ± 0.21.
17.9. A. If there were a 100 deductible, then the non-zero payments would follow a Pareto with
α = 3 and θ = 1000 + 100 = 1100.
Compare this to the given sizes of loss minus 100: 40, 160, 350, 600, 1000.
payment Assumed Pareto Distribution Assumed - Empirical
0.0
0.1016
40 0.1016
0.0984
0.2
0.1346
160 0.3346
0.0654
0.4
0.1634
350 0.5634
0.0366
0.6
0.1291
600 0.7291
0.0709
0.8
0.0563
1000 0.8563
0.1437
1.0
The largest absolute difference is: |0.5634 - 0.4| = 0.1634 = K-S statistic.
Comment: I have compared the truncated and shifted data and distribution solely for convenience.
Comparing the truncated data and distribution would produce the same answer.
17.10. B. For the Kolmogorov-Smirnov Statistic, if the critical value is c for a significance level of α,
then with a probability of 1 - α:
{empirical distribution function - c} ≤ F(x) ≤ {empirical distribution function + c}.
For a probability of 90%, α = 0.10, so with n = 16, c = 1.22 / 16 = 0.305.
The empirical distribution function at x = 40 is: 10/16 = .625.
The confidence interval is: 0.625 ± 0.305 = 0.32 to 0.93.
17.11. B. Since we want a 90% confidence interval α = 0.10; for α = 0.10, there is only a 10%
chance that the K-S Statistic will be greater than the critical value. For 20 observed points and
α = 0.10, the critical value is: 1.22(20-0.5) = 0.273. Thus the absolute value of the difference
between the empirical and actual underlying distribution functions is at most .273, with 90%
confidence.
(The K-S Stat. is the maximum over all values of x of this absolute difference.)
The empirical distribution at x = 1.50 is 5/20 = 0.25, since the fifth observed point is 1.48 and the
sixth observed point is 1.59.
Thus F(1.50) is 0.250 ± 0.273, which leads to a confidence interval of: (0 , 0.523).
Comment: Note that since any distribution function is always greater than or equal to zero and less
than or equal to one, the distribution function at 1.50 must not be negative. This is why choice A is
not correct. (Select the one best answer.)
17.12. D. The K-S statistic is defined as the maximum absolute difference of the empirical and
theoretical Distribution Functions. By integrating the given density function, the theoretical Distribution
Function is: F(x) = 1 - 1/x, 1 < x < ∞. Thus:
Empirical Theoretical Absolute
x Distribution Distribution Difference
1 0 0 0
4/3 0.40 0.25 0.15
2 0.65 0.50 0.15
4 0.90 0.75 0.15
∞ 1 1 0
Thus we observe absolute differences of 0, 0.15, 0.15, 0.15, and 0. Thus the K-S statistic is at least
.15. However, it may be that the absolute difference is bigger at some point which is not the end
point of an interval. For example, we know that the empirical distribution function at 1.99999 is at
least 0.40 and at most 0.65. Thus the absolute difference of the empirical and theoretical distributions
is at most the larger of: |0.65-0.50| = 0.15 and |0.45-0.50| = 0.05. Similarly checking just before and
just after each endpoint:
x Empirical Distribution Theoretical Absolute Differences
Maximum Minimum Distribution Theoretical vs. Empirical
1.0001 0.40 0 0 0.40 0
1.3333 0.40 0 0.25 0.15 0.25
1.3334 0.65 0.40 0.25 0.40 0.15
1.9999 0.65 0.40 0.50 0.15 0.10
2.0001 0.90 0.65 0.50 0.40 0.15
3.9999 0.90 0.65 0.75 0.15 0.10
4.0001 1 0.90 0.75 0.25 0.15
extremely large 1 0.90 1.00 0 0.10
This results in absolute differences of up to 0.40 (the maximum possible empirical distribution just
above either 1, 4/3, or 2, versus the theoretical distribution there.)
This calculation can be arranged in a spreadsheet (similar to that used to get the minimum
K-S Statistic for grouped data) as follows:
Endpoint Absolute Empirical Theoretical Empirical Absolute
of Interval Difference Distribution Distribution Distribution Difference
K-S Stat. K-S Stat.
1 0.0000 0.4000 0.4000
1.3333333 0.2500 0.0000 0.2500 0.6500 0.4000
2 0.1000 0.4000 0.5000 0.9000 0.4000
4 0.1000 0.6500 0.7500 1.0000 0.2500
Infinity 0.1000 0.9000 1.0000
Comments: If there were many narrow intervals, this technique would allow one to get a fairly good
estimate of the K-S statistic. One needs to offset the columns of Empirical Distributions one row up
and down and then get two columns of absolute differences.
17.13. C. The Kolmogorov-Smirnov statistic is the maximum over all x of the absolute difference
between the empirical and theoretical distribution functions. In this case we can only observe the
absolute difference at a few values of x. However, the K-S statistic must be greater than or equal to
the maximum of these observed absolute differences.
x Number of Claims Empirical Theoretical Absolute
≤ x and > 3000 Distribution Function Distribution Function Difference
3000 0 0 0.0000 0.0000
5000 6 0.06 0.1289 0.0689
10000 35 0.35 0.3600 0.0100
25000 74 0.74 0.6864 0.0536
Comment: Both distributions are for the data truncated from below at 3000.
2016-C-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/23/15, Page 599
Section 18, p-p Plots264 265
The K-S Statistic provides a formal technique of comparing the fitted and empirical distribution
functions. Plots of the difference function provide a graphical technique to compare empirical and
fitted distributions. Another graphical technique to analyze a fit is via a p-p plot.
In a p-p plot one graphs the fitted distribution versus the estimated percentile at that
data point.
For example, the ungrouped data in Section 2 has 130 values: 300, 400, 2800, ... , 4,802,200.
The Pareto fit by Maximum Likelihood to this data has parameters α = 1.702 and θ = 240,151.
⎧ 240,151 ⎫1.702
F(300) = 1 - ⎨ ⎬ = 0.0021.
⎩ 300 + 240,151⎭
So corresponding to the loss of 300, we plot the point (1/131, 0.0021).

⎧ 240,151 ⎫1.702
F(400) = 1 - ⎨ ⎬ = 0.0028.
⎩ 400 + 240,151⎭
Thus the second point plotted is (2/131, 0.0028).

⎧ 240,151 ⎫ 1.702
F(4,802,200) = 1 - ⎨ ⎬ = 0.9944.
⎩ 4,802,200 + 240,151⎭
So corresponding to the loss of 4,802,200, we plot the point (130/131, 0.9944).
For the ungrouped data in Section 2, the 104th loss out of 130 is 406,900.
Thus 406,900 is the estimate of the 104/131 = 79.39th percentile. For the Maximum Likelihood
Pareto, F(406,900) = 1 - {240,151/(406,900 + 240,151)}1.702 = 0.8149.
So corresponding to the loss of 406,900, we plot the point (0.7939 , 0.8149).
We note that 0.7939 corresponding to a smoothed empirical percentile, and 0.8149 representing
the fitted Distribution Function are close, indicating a reasonable fit at this portion of the distribution.
For the p-p plot a better fit occurs when the plotted points are close to the comparison line x = y.
264
See pages Example 16.3 and the prior text in Loss Models.
265
Personally, I find the type of exhibits discussed previously in which one graphs the difference between the
empirical and fitted distribution functions, much more useful in practical applications than p-p plots. The eye can
easily distinguish the differences from the horizontal x-axis, rather than comparing to a line at a 45 degree angle as in
the p-p plots. In addition, one can easily add the K-S critical values as horizontal lines in difference graphs, allowing
one to perform the K-S test graphically. Finally one can easily translate back to the sizes of loss, which are shown
right on the x-axis of the difference graph; this allows one to quickly pick out those size ranges in which the fit is not
as good. This would require the backup calculations that produced the p-p plot; it can not be done directly from the
p-p plot itself.
One could proceed similarly for each of the 130 losses in the ungrouped data set in Section 2.
For this maximum likelihood Pareto, the resulting p-p plot is:
Fitted
1.0
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1.0
If one has a set of n losses, x1 , x2 , ... , xn , from smallest to largest, to which a Distribution F(x) has
been fit, then the p-p plot consists of the n points: ( i/(n+1), F(xi) ).
One also includes on the p-p plot, the comparison line x=y. The closer the plotted points
stay to the comparison line, the better the fit.
Exercise: For the ungrouped data in Section 2, the 78th loss out of 130 is 146,100.
The Weibull fit by Maximum Likelihood to this data has parameters θ = 231,158 and τ = 0.6885.
What point would this produce on a p-p plot?
[Solution: 146,100 is the estimate of the 78/131 = 59.5th percentile.
For this Weibull, F(146,100) = 1 - exp[-(146,100/231,158).6885] = 0.518.
Therefore, corresponding to the loss of 146,100, we plot the point (0.595 , 0.518).]
The whole p-p plot for the maximum likelihood Weibull fit to the ungrouped data in Section 2 is:
Fitted
1.0
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1.0
For small losses, the fitted Weibull Distribution function is larger than the empirical distribution
function.266 Therefore, the fitted Weibull distribution has too thick of a left tail.
For very large size losses, the fitted Weibull Distribution function is larger than the empirical
distribution function; for large losses the fitted survival function is smaller than the sample survival
function. Therefore, the fitted Weibull distribution has too thin of a right tail.
In the neighborhood of where the sample distribution is 0.5, the slope of the curve is less than 1, the
slope of the comparison line. Therefore, the fitted Weibull distribution has less probability than the
sample distribution, near the sample median.
266
The curved line is above the 45° comparison line.
By comparing the two p-p plots, one sees that the Weibull Distribution is a worse fit to this data than
the Pareto (thick),267 since the Weibullʼs furthest departure from the comparison line
x = y is larger than that of the Pareto:
Fitted
1.0
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1.0
While we have only applied the p-p plot to ungrouped data, one can apply a similar technique
to grouped data. Of course one can only plot at the endpoints of the intervals. One plots the
empirical distribution at each endpoint versus the fitted distribution function at that endpoint.
267
This was seen previously by looking at the K-S statistics and graphs of the differences between the fitted and
empirical distributions.
Tails of Distributions:
One can use p-p plots to compare the tails of a fitted distribution and the data.268
For example, here is a p-p plot:
For small losses, the dashed line is below the 45° comparison line. Therefore, for small losses, the
fitted distribution function is smaller than the sample distribution function. Therefore, the fitted
distribution has too thin of a left tail.
For large size losses, the fitted distribution function is smaller than the sample distribution function; for
large losses the fitted survival function is larger than the sample survival function.
268
See 4, 11/01, Q.6.
Here is a graph of the survival functions of the sample (solid) and fitted (dashed), for very large
sizes of loss:
F(x) S(x)
Sample Fitted Sample Fitted
0.900 0.870 0.100 0.130
0.950 0.910 0.050 0.090
0.970 0.940 0.030 0.060
0.990 0.975 0.010 0.025
Fitted S(x) → 0 less quickly as x → ∞. ⇒ The fitted distribution has too thick of a right tail.
Here is an approximate graph of the difference of the sample and fitted distribution functions, as a
function of the sample distribution function:
This fitted distribution has too thin of a lefthand tail and too thick of a righthand tail.
In the neighborhood of where the sample distribution is 0.5, the slope of the curve is more than 1.
As the sample distribution (solid) increases from 0.4 to 0.6, the fitted distribution (dashed) increases
from about 0.34 to 0.65:
Therefore, the fitted distribution has more probability than the sample distribution, near the sample
median.
N versus N+1:
In some cases we use in the denominator N, the number of data points,

while in others we use N + 1:
Smoothed empirical estimate of percentiles ⇒ N+1 in the denominator.
p-p plots ⇒ N+1 in the denominator.
Empirical Distribution Function ⇒ N in the denominator.
Kolmogorov-Smirnov Statistic ⇒ N in the denominator.

Quantile-Quantile Plots:269
Quantile-Quantile plots (q-q plots) are similar to p-p plots. They can be used to either compare two
samples or data to a theoretical distribution. For example, we can compare the ungrouped data in
Section 2 with 130 data points to the maximum likelihood Pareto with α = 1.702 and θ = 240,151.
We estimate the percentiles (quantiles) as i/(n+1) = i/131, i = 1, 2, ..., 130.270

Then we find the corresponding quantiles for the Pareto, using the VaRp formula:
θ {(1-p)-1/α - 1} = 240,151 {(1 - i/131}-1/1.702 - 1}.
The ungrouped data in Section 2 has 130 values: 300, 400, 2800, ... , 4,802,200.
Thus the first plotted point is: (240,151 {(1 -1/131}-1/1.702 - 1}, 300) = (1084, 300). The last
plotted point is: (240,151 {(1 -130/131}-1/1.702 - 1}, 4,802,200) = (3,971,715, 4,802,200).
We add a 45 degree comparison line; a good match or fit would result in points close to this
comparison line. The q-q plot, with each quantile in millions is:
sample quantiles
5
theoretical quantiles
1 2 3 4 5
269
270
One could instead use (2i-1)/(2n).
Problems:
18.1 (2 points) The graph below shows a p-p plot of a fitted distribution compared to a sample.
Fitted
1
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1

1. The lefthand tail of the fitted distribution is too thin.
2. The righthand tail of the fitted distribution is too thin.
3. The fitted distribution has less probability around the sample median than does the
sample distribution.
A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. none of A, B, C, or D
18.2 (1 point) You are constructing a p-p plot. The 27th out of 83 losses from smallest to largest is
142. The fitted distribution is an Exponential with mean 677.
What point should one plot?
A. (0.321, 0.189) B. (0.325, 0.189) C. (0.321, 0.811) D. (0.325, 0.811)
18.3 (3 points) A Pareto distribution with parameters α = 1.5 and θ = 1000 is being compared to
the following five claims: 179, 352, 918, 2835, 6142.
Construct the p-p plot.
18.4 (1 point) Which of the following p-p plots indicates the best model of the data?
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
A. 0.2 0.4 0.6 0.8 1

B. 0.2 0.4 0.6 0.8 1
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
C. 0.2 0.4 0.6 0.8 1

D. 0.2 0.4 0.6 0.8 1
0.8
0.6
0.4
0.2
E. 0.2 0.4 0.6 0.8 1

18.5. (3 points) You are given the following p-p plot:

Fitted
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1
The plot is based on the sample:

10 15 20 30 50 70 100 150 200
Determine the fitted model underlying the p-p plot.
(A) Inverse Exponential with θ = 23.
(B) Pareto with α = 1 and θ = 50.
(C) Uniform on [0, 200].
(D) Exponential with mean 95.
(E) Normal with mean 50 and standard deviation 30.
18.6 (8 points) You are given 19 data points:

1258, 1636, 1652, 1814, 1853, 1860, 1895, 1947, 1950, 2009,
2020, 2029, 2103, 2123, 2127, 2139, 2246, 2335, 2770.
You wish to compare this data to a Normal Distribution with µ = 2000 and σ = 300.
With the aid of a computer, draw a p-p plot.
18.7 (2 points) You observe 40 values. The 11th value from smallest to largest is 166.
The 37th value from smallest to largest is 2000.
An exponential distribution is hypothesized for the data.
Let (q, r) be the coordinates of the p-p plot for a claim amount of 166. q - r = 0.0266.
Let (s, t) be the coordinates of the p-p plot for a claim amount of 2000. Determine s - t..
18.8 (2 points) A Weibull distribution with parameters τ = 0.8 and θ = 200 is being compared to the
following four losses: 19, 62, 98, 385.
Which of the following is the p-p plot?
A. B.
C. D.
18.9 (4, 11/01, Q.6 & 2009 Sample Q.59) (2.5 points)
The graph below shows a p-p plot of a fitted distribution compared to a sample.
Fitted
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Sample
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(A) The tails of the fitted distribution are too thick on the left and on the right,
and the fitted distribution has less probability around the median than the sample.
(B) The tails of the fitted distribution are too thick on the left and on the right,
and the fitted distribution has more probability around the median than the sample.
(C) The tails of the fitted distribution are too thin on the left and on the right,
(D) The tails of the fitted distribution are too thin on the left and on the right,
and the fitted distribution has more probability around the median than the sample.
(E) The tail of the fitted distribution is too thick on the left, too thin on the right,
18.10 (4, 5/05, Q.5 & 2009 Sample Q.176) (2.9 points) You are given the following p-p plot:
F(x)
1
0.8
0.6
0.4
0.2
Fn(x)
0.2 0.4 0.6 0.8 1
The plot is based on the sample:

1 2 3 15 30 50 51 99 100
Determine the fitted model underlying the p-p plot.
(A) F(x) = 1 - x-0.25, x ≥ 1
(B) F(x) = x / (1 + x), x ≥ 0
(C) Uniform on [1, 100]
(D) Exponential with mean 10
(E) Normal with mean 40 and standard deviation 40
(i) The following are observed claim amounts:
400 1000 1600 3000 5000 5400 6200
(ii) An exponential distribution with θ = 3300 is hypothesized for the data.
(iii) The goodness of fit is to be assessed by a p-p plot and a D(x) plot.
Let (s, t) be the coordinates of the p-p plot for a claim amount of 3000.
Determine (s - t) - D(3000).
(A) -0.12 (B) -0.07 (C) 0.00 (D) 0.07 (E) 0.12
18.1. D. For small losses, the fitted distribution function is smaller than the empirical distribution
function. (The curve is below the 45° comparison line.) Therefore, the fitted distribution has too thin
of a left tail.
For large size losses, the fitted distribution function is larger than the empirical distribution function; for
large losses the fitted survival function is smaller than the sample survival function. Therefore, the
fitted distribution has too thin of a right tail.
In the neighborhood of where the sample distribution is 0.5, the slope of the curve is less than 1.
(At x = 0.4, y is about 0.24; at x = 0.6, y is about 0.36. For a 0.2 change in x, y changes by only
about 0.12.) Therefore, the fitted distribution has less probability than the sample
distribution, near the sample median.
18.2. A. 142 is the estimate of the 27/(83+1) = 32.1th percentile.

For this Exponential, F(142) = 1 - exp(-142/677) = 0.189.
So corresponding to the loss of 142, plot the point (0.321 , 0.189).
18.3. For the Pareto Distribution, F(x) = 1 - ( {θ/(θ+x)}α = 1 - {1000/(1000 +x)}1.5.

F(179) = 0.2189. F(352) = 1 - {1000/(1000 + 352)}1.5 = 0.3639.
F(918) = 0.6235. F(2835) = 0.8668. F(6142) = 0.9476.
One plots the points ( i/(n+1), F(xi) ), for i = 1 to n = 5.
The five plotted points are:
(1/6, 0.2189), (2/6, 0.3639), (3/6, 0.6235), (4/6, 0.8668), (5/6, 0.9476).
18.4. B. If the model is a good one, the points in the p-p plot will lie very close to the straight line
from (0, 0) to (1, 1). Models C and E are very poor. Of the remaining three, plot B appears to have
the points closest to the line.
Comment: These are all p-p plots to my ungrouped data in Section 2. Plot A is vs. the maximum
likelihood LogNormal with µ = 11.5875 and σ = 1.60326. Plot B is vs. the maximum likelihood
Pareto with α = 1.702 and θ = 240,151. Plot C is vs. the maximum likelihood Exponential with
θ = 312,675. Plot D is vs. the maximum likelihood Weibull with θ = 231,158 and τ = 0.6885.
Plot E is vs. a Pareto with α = 2 and θ = 150,000.
18.5. D. There are 9 values in the sample. Each plotted point should be: {i/10, F(xi)}.
The first plotted point is about {0.1 , 0.1}. For choice B, F(10) = 1 - 50/(50 + 10) = 1/6 = 0.166.
For choice C, F(10) = 10/200 = 0.05. Eliminating choices B and C.
The last plotted point is about {0.9 , 0.88}. For choice E, F(200) = Φ((200 - 50)/30) = Φ(5) ≅ 1.
Eliminating choice E.
The eighth plotted point is about {0.8, 0.8}. For choice A, F(150) = e-23/150 = 0.858.
Eliminating choice A.
One should check
10 the 15
plotted points
20 versus 30 choice50
D. They should
70 be:
100 150 200
i/10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
F(xi) 0.100 0.146 0.190 0.271 0.409 0.521 0.651 0.794 0.878
Here are p-p plots of the various choices:
(A) F(x) = e-23/x, x ≥ 0:
Fitted
1
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1
(B) F(x) = x / (50 + x), x ≥ 0:

Fitted
1
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1
(C) Uniform on [0, 200]:

Fitted
1
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1
(D) Exponential with mean 95:

Fitted
1
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1
(E) Normal with mean 50 and standard deviation 30:

Fitted
1
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1
1258 - 2000
18.6. The first plotted point is: (1/20, Φ[ ]) = (0.05, 0.0067).
300
The remaining plotted points are: (0.1, 0.1125), (0.15, 0.1230), (0.2, 0.2676), (0.25, 0.3121),
(0.3, 0.3204), (0.35, 0.3632), (0.4, 0.4299), (0.45, 0.4338), (0.5, 0.5120), (0.55, 0.5266),
(0.6, 0.5385), (0.65, 0.6343), (0.7, 0.6591), (0.75, 0.6640), (0.8, 0.6784), (0.85, 0.7939),
(0.9, 0.8679), (0.95, 0.9949).
The resulting p-p plot, including the 45 degree comparison line:
Fitted
1.0
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1.0
Comment: The p-p plot indicates a relatively good match between the data and the distribution.
The data was simulated from the given Normal Distribution.
In this case, a Normal Probability Plot, as discussed on Exam MFE, would be more helpful than a
p-p plot.
18.7. q = 11/(40+1). r = 1 - e-166/θ. q - r = 11/41 - (1 - e-166/θ) = 0.0266.
⇒ e-166/θ = 0.7583. ⇒ θ = 600.

s = 37/(40+1). r = 1 - e-2000/600 = 0.9643. s - q = 37/41 - 0.9643 = -0.0619.
18.8. A. F(19) = 1 - exp[-(19/200)0.8] = 0.141. F(62) = 1 - exp[-(62/200)0.8] = 0.324.

F(98) = 1 - exp[-(98/200)0.8] = 0.432. F(385) = 1 - exp[-(385/200)0.8] = 0.815.
Thus we plot 4 points: (0.2, 0.141), (0.4, 0.324), (0.6, 0.432), (0.8, 0.815).
18.9. E. For small losses, the fitted distribution function is larger than the sample distribution function.
(The curved line is above the 45° comparison line.) Therefore, the fitted distribution has too thick
of a left tail.
For large size losses, the fitted distribution function is larger than the sample distribution function; for
large losses the fitted survival function is smaller than the sample survival function. Here is a graph of
the survival functions of the sample and fitted, for very large sizes of loss:
Survival Function
0.10
0.08 Sample
0.06
0.04
0.02 Fitted
F(x) S(x)
Sample Fitted Sample Fitted
0.890 0.930 0.110 0.070
0.900 0.945 0.100 0.055
0.920 0.965 0.080 0.035
0.970 0.990 0.030 0.010
Fitted S(x) → 0 more quickly as x → ∞. ⇒ The fitted distribution has too thin of a right tail.
In the neighborhood of where the sample distribution is 0.5, the slope of the curve is less than 1.
As the sample distribution increases from 0.4 to 0.6, the fitted distribution only increases from about
0.34 to 0.42. Therefore, the fitted distribution has less probability than the sample
distribution, near the (sample) median.
Comment: The slope is greater than one in the left hand tail. Therefore, the fitted density was larger
than the sample density in the left hand tail. The slope is less than one in the right hand tail.
Therefore, the fitted density was smaller than the sample density in the right hand tail.
18.10. A. There are 9 values in the sample. Each plotted point should be: {i/10, F(xi)}.
The first plotted point is {0.1, 0}. For choices B, D, and E, F(1) ≠ 0, so these choices are eliminated.
The last plotted point is about {0.9, 0.7}. For choice A, F(100) = 1 - 1/ 10 = 0.684 ≅ 0.7.
For choice C, F(100) = 1 ≠ 0.7, eliminating choice C.
Comment: One 1 could check
2 more3 of the plotted
15 points
30 versus
50 choice51
A. They should
99 be:
100
i/10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
F(xi) 0.000 0.159 0.240 0.492 0.573 0.624 0.626 0.683 0.684
For example the fifth plotted point is at about {.5, .58}, and F(30) = 1 - 30-0.25 = 0.573 ≅ 0.58.
Here are p-p plots of the various choices:
(A) F(x) = 1 - x-0.25, x ≥ 1:

Fitted
1
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1
(B) F(x) = x / (1 + x), x ≥ 0:

Fitted
1
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1
(C) Uniform on [1, 100]:

Fitted
1
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1
(D) Exponential with mean 10:

Fitted
1
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1
(E) Normal with mean 40 and standard deviation 40:

Fitted
1
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1
18.11. B. The claim of size 3000 is the 4th out of 7, so the first coordinate of the p-p plot is:
4/(7 + 1) = 0.5.
For the Exponential, F(3000) = 1 - e-3000/3300 = 0.5971.
Thus the point corresponding to the claim of size 3000 in the p-p plot is: (0.5, 0.5971).
The D(x) plot is the difference graph, the difference between the empirical and theoretical distribution
functions. The empirical distribution function at 3000 is 4/7, while the theoretical distribution function is:
1 - e-3000/3300 = 0.5971. Therefore D(3000) = 4/7 - 0.5971 = -0.0257.
(s - t) - D(3000) = (0.5 - 0.5971) - (4/7 - 0.5971) = -0.0714.
Comment: Note the order of the difference in D(x): empirical - theoretical.
The p-p plot uses n + 1 in its denominator, while the empirical distribution function uses n.
Also note that the empirical distribution jumps to 4/7 at 3000, but is 3/7 just before 3000.
Here is the entire p-p plot:
Fitted
1
0.8
0.6
0.4
0.2
Sample
0.2 0.4 0.6 0.8 1
Here is the difference graph out to 10,000:

D(x)
0.15
0.10
0.05
x
2000 4000 6000 8000 10000
- 0.05
- 0.10
- 0.15
- 0.20
The largest distance from the x-axis occurs just before 5000. At 4999.99, the empirical distribution
function is 4/7, while the Exponential Distribution is: 1 - e-4999.99/3300 = 0.780.
Therefore, the Kolmogorov-Smirnov Statistic is: |4/7 - 0.780| = 0.209.
2016-C-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/23/15, Page 623
Section 19, Anderson-Darling Test271
Similar to the Kolmogorov-Smirnov (K-S) statistic, the Anderson-Darling statistic also tests how well
an ungrouped data set is fit by a given distribution. The computation of the
Anderson-Darling statistic is somewhat different, and is based on giving more weight to differences
in either of the two tails.
No Truncation or Censoring:
In the absence of truncation or censoring, for a data set of size n, {y1 , y2 , ..., yn }, from smallest to
largest, the Anderson-Darling statistic, A2 , can be computed as:
n n
A2 = -n - (1/n) ∑ (2i - 1) {ln[F(yi)] + ln[S(yn + 1- i)]} = -n - (1/n) ∑ (2i - 1) ln[F(yi) S(y n + 1- i)] .272
i=1 i=1
Exercise: An Exponential distribution with θ = 1000 was fit to the following four claims:
197, 325, 981, 2497. What is the value of the Anderson-Darling statistic?
[Solution: F(x) = 1 - e-x/1000. F(197) = 0.1788. ln(F(197)) = -1.7214.
S(x) = e-x/1000. ln(S(x)) = -x/1000.
(1)ln(F(197)) + (3)ln(F(325)) + (5)ln(F(981)) + (7)ln(F(2497)) =
(1)(-1.7214) + (3)(-1.2820) + (5)(-0.4699) + (7)(-0.0859) = -8.518.
(1)ln(S(2497)) + (3)ln(S(981)) + (5)ln(S(325)) + (7)ln(S(197)) =
(1)(-2.497) + (3)(-0.981) + (5)(-0.325) + (7)(-0.197) = -8.444.
i 2i - 1 yi F(yi) lnF(yi) S(yn +1 - i) ln S(yn+1 - i)
1 1 197 0.1788 -1.7214 0.0823 -2.497
2 3 325 0.2775 -1.2820 0.3749 -0.981
3 5 981 0.6251 -0.4699 0.7225 -0.325
4 7 2497 0.9177 -0.0859 0.8212 -0.197
-8.5185 -8.4440
Anderson-Darling statistic = -4 - (1/4)(-8.518 - 8.444) = 0.241.
Comment: For this situation, we have previously calculated the K-S statistic as 0.2225.]
Just as with the K-S statistic, the Anderson-Darling statistic is always positive, and large
values indicate a bad fit.273
271
No longer on the syllabus of this exam! See Section 16.4.2 of Loss Models.
272
See for example, Survival Models by Dick London, not on the syllabus.
273
Unlike the K-S statistic, the Anderson-Darling Statistic can be greater than 1.
Hypothesis Testing:
According to Loss Models, the critical values for the Anderson-Darling statistic are:274
significance level α: 10% 5% 1%

critical value c: 1.933 2.492 3.880
Thus for the above exercise with an Anderson-Darling statistic of 0.241, we do not reject the fit of
the Exponential Distribution at 10%.
Exercise: If the Anderson-Darling Statistic had been 3, what conclusion would we draw?
[Solution: 2.492 < 3 < 3.880. Reject to the left and do not reject to the right.
Reject at 5%, and do not reject the fit at 1%.]
Note that unlike the critical values for the Kolmogorov-Smirnov Statistic, the critical values for the
Anderson-Darling Statistic do not depend on the sample size.
Comparing the Anderson-Darling and Kolmogorov-Smirnov Statistics:
For various distributions fit by the Method of Maximum Likelihood to the ungrouped data in Section
2 with 130 losses,275 the values of the Anderson-Darling Statistic are:
Anderson-Darling Statistic K-S Statistic

Pareto 0.229 0.059
LogNormal 0.825 0.082
Weibull 1.222 0.092
Gamma 2.501 reject at 5% 0.132 reject at 5%
Exponential 11.313 reject at 1% 0.240 reject at 1%
The Pareto is an excellent fit, while the Exponential is a horrible fit. In this case, the ranking of the fits
and the results of hypothesis testing are the same for the Anderson-Darling and the Kolmogorov-
Smirnov statistics. In general, the Anderson-Darling statistic may give different results than the K-S
statistic, due to the Anderson-Darling applying more weight to the differences in the tails.276
274
See Section 16.4.1 of Loss Models. If asked to do hypothesis testing, this table should be included in the
question. These critical values are not strictly applicable to small samples. While these critical values are often used
when comparing data to distributions fit to that same date, the correct critical values in this case are smaller.
When there is censoring from above, then the critical values need to be smaller.
275
See the Section on fitting to ungrouped data via maximum likelihood for the parameters of the fitted distributions.
276
Looking at the previously presented graphs of the difference functions, in the case of the Method of Maximum
Likelihood applied to the ungrouped data in Section 1, the major discrepancies were in the range 100,000 to
200,000 rather than in the tails.
For many applications of loss distributions, we are very concerned with the fit in the right hand
tail, and thus the extra weight applied there in the computation of the Anderson-Darling statistic
is very useful. On the other hand, for many applications of loss distributions, we are
unconcerned with the fit in the left hand tail, and thus the extra weight applied there in the
computation of the Anderson-Darling statistic is counterproductive.
General Formula:
The Anderson-Darling Statistic is defined as:

u
{Fn (x) - F * (x)} 2 f * (x)
A2 ≡ n ∫ dx ,
t F * (x) S * (x)
where F* is the model distribution to which we are comparing,277 Fn is the empirical distribution, n is
the number of data points, t is the truncation point from below, and u is the censorship point from
above.
Thus the Anderson-Darling statistic is small when the model distribution closely matches the empirical
distribution, and thus their squared difference is small.
Anderson-Darling statistic small ⇔ good fit.
Anderson-Darling statistic large ⇔ bad fit.
The Anderson-Darling statistic is a weighted average of the squared difference between

the empirical distribution function and the model distribution function.
The weights are 1/{F(x)S(x)}. F(x)S(x) is the variance of the empirical distribution function.
Thus the weights are approximately inversely proportional to the variance of the empirical
distribution.
1
Near the middle the weights are close to = 4.
(1/ 2) (1/ 2)
1
In the left hand tail, the weights are larger, for example = 11.1.
(1/ 10) (9 / 10)
1
Similarly, in the right hand tail, the weights are larger, for example = 11.1.
(9 / 10) (1/ 10)
Thus, the Anderson-Darling Statistic weights more heavily discrepancies in either tail.
277
After altering a ground-up, unlimited size of loss distribution for the effects of any truncation from below.
What I have called G.
It turns out using the fact that Fn (x) is constant on intervals, the above integral reduces to:278 279
k k
S * (yi) F * (yi + 1 )
A2 = -nF*(u) + n ∑ Sn(yi)2 ln[ ] + n ∑ Fn(yi)2 ln[ ],
i= 0 S * (yi + 1 ) i=1 F * (y i )
where n is the number of data points, k is the number of data points which are not censored from
above, t is the truncation point from below, and u is the censorship point from above.
t = y0 < y1 < y2 ... < yk-1 < yk < yk+1 = u.
In the absence of truncation and censoring, the above formula becomes:280
n-1 n
A2 = -n + n ∑ Sn (yi)2 {ln[S(yi)] - ln[S(yi + 1 )]} + n ∑Fn (yi)2 {ln[F(yi + 1 )] - ln[F(y i)]} ,
i=0 i=1
where Fn is the empirical distribution function, Fn (yi) = i/n, Sn is the empirical survival function,
S n (yi) = 1 - i/n, y0 = 0, and yn+1 = ∞.
Exercise: An Exponential distribution with θ = 1000 was fit to the following four claims:
197, 325, 981, 2497. Using the above formula, compute the Anderson-Darling statistic.
[Solution: F(x) = 1 - e-x/1000. S(x) = e-x/1000. ln(S(x)) = -x/1000.
n = number of data points = 4. Fn (yi) = i/4. Sn (yi) = 1 - Fn (yi) = 1 - i/4.
n-1
∑ Sn(yi)2 {ln[S(yi)] - ln[S(yi + 1 )]} = 0.5278.
i=0
i yi Sn(yi) lnS(yi) lnS(yi+1) contribution

0 0 1.0000 0.0000 -0.1970 0.1970
1 197 0.7500 -0.1970 -0.3250 0.0720
2 325 0.5000 -0.3250 -0.9810 0.1640
3 981 0.2500 -0.9810 -2.4970 0.0948
0.5278
278
See Section 16.4.2 of Loss Models. In the absence of censoring from above, yn+1 = ∞ and one should not
include the final term in the first summation; otherwise one would be asked to take ln(0).
279
One can instead of having the first term, have the final summation go to i = k+1, with yk+2 = ∞, and F(yk+2) = 1.
280
With no truncation, t = 0 = y0 .
With no censoring k = n and c = ∞ = yn+1, and we do not include the final term in the sum involving lnS.
n
∑Fn(yi)2 {ln[F(yi + 1)] - ln[F(y i)]} = 0.5324.
i=1
i yi Fn(yi) lnF(yi) lnF(yi+1) contribution

1 197 0.2500 -1.7214 -1.2820 0.0275
2 325 0.5000 -1.2820 -0.4699 0.2030
3 981 0.7500 -0.4699 -0.0859 0.2160
4 2497 1.0000 -0.0859 0.0000 0.0859
0.5324
Anderson-Darling statistic = -4 + (4)(0.5278) + (4)(0.5324) = 0.241.]
This matches the result previously obtained. One can show that in the absence of truncation and
censoring, the first formula I gave for the Anderson-Darling statistic, matches the more complicated
formula given in Loss Models.
Derivation of the formula for the Anderson-Darling Statistic:
Assume no truncation or censoring. Let x1 , ..., xn be the data from smallest to largest.
Let x0 and xn+1 be the lower and upper endpoints of the support of F.
Then the empirical distribution function is: Fn (x) = j/n for xj ≤ x < xj+1, j = 0 to n.
For xj ≤ x < xj+1:

n{Fn (x) - F(x)}2 f(x) /{F(x)S(x)} = n{j/n - F(x)}2 f(x) /{F(x)S(x)} =
(j2 /n) f(x) /{F(x)S(x)} - 2j f(x)/S(x) + nf(x)F(x)/S(x).
Let y = S(x). dy = -f(x) dx.
∑ (j2 / n)∫ F(x)S(x) dx - ∑ 2j∫ S(x) dx + ∑ n∫

{Fn(x) - F(x)}2 f(x)
∫
f(x) f(x) f(x) F(x)
A 2 ≡ n dx = dx
F(x)S(x) S(x)
∑ (j2 / n)∫ y - y2 dy + 2 ∑ j∫ y1 dy - n ∑ ∫ 1 y- y dy .
1
=-
∫ ∫ ∫
1 1 1
dy = dy - dy = ln(y) + ln(1-y).
y - y2 y 1 - y
∫ ∫ ∫ dy = ln(y) - y.
1 - y 1
dy = dy -
y y
At the top of each interval, y = S(xj+1). At the bottom of each interval, y = S(xj).
A 2 = (-1/n)Σ j2 {lnS(xj+1) + lnF(xj+1) - lnS(xj) - lnF(xj)} + 2Σ j {lnS(xj+1) - lnS(xj)}
- nΣ {lnS(xj+1) - S(xj+1) - lnS(xj) + S(xj)} .
Assume for example that n = 3, with three data points from smallest to largest: x1 , x2 , x3 .
Then, A2 = (-1/3){lnS(x2 ) + lnF(x2 ) - lnS(x1 ) - lnF(x1 ) + 4(lnS(x3 ) + lnF(x3 ) - lnS(x2 ) - lnF(x2 ))

+ 9(lnS(x4 ) + lnF(x4 ) - lnS(x3 ) - lnF(x3 )) }
+ 2{lnS(x2 ) - lnS(x1 ) + 2(lnS(x3 ) - lnS(x2 )) + 3(lnS(x4 ) - lnS(x3 ))}
- 3{lnS(x1 ) - S(x1 ) - lnS(x0 ) + S(x0 ) + lnS(x2 ) - S(x2 ) - lnS(x1 ) + S(x1 )
+ lnS(x3 ) - S(x3 ) - lnS(x2 ) + S(x2 ) + lnS(x4 ) - S(x4 ) - lnS(x3 ) + S(x3 )}
= -3lnS(x4 ) + (5/3)lnS(x3 ) + lnS(x2 ) + (1/3) lnS(x1 )
- 3 lnF(x4 ) + (5/3)lnF(x3 ) + lnF(x2 ) + (1/3) lnF(x1 )
+ 6lnS(x4 ) - 2lnS(x3 ) - 2lnS(x2 ) - 2lnS(x1 )
- 3 lnS(x4 ) + 3 lnS(x0 ) + 3 S(x4 ) - 3 S(x0 )
= -3 + (-1/3){lnS(x3 ) + 3lnS(x2 ) + 5 lnS(x1 )} + (-1/3){lnF(x1 ) + 3lnF(x2 ) + 5 lnF(x3 )}.281
This is of the stated form:

n n
A2 = -n - (1/n) ∑ (2i - 1) ln[F(yi) S(y n + 1- i)] = -n - (1/n)
i=1
∑ (2i - 1) {ln[F(yi)] + ln[S(yn + 1- i)]} .
i=1
Censoring:282
Here is an example of the computation of the Anderson-Darling Statistic, when there is censorship
from above at u.
k k
S * (yi) F * (y )
A2 = -nF*(u) + n ∑ Sn (yi)2 ln [S * (y )] + n ∑Fn(yi)2 ln[ F * (yi +)1 ].
i=0 i +1 i=1 i
k = the number of data points which are not censored from above.
Therefore, the summations in the Anderson-Darling statistic are only taken over those data points
that have not been censored from above. Also the first term is the number of data points multiplied
by the value of the uncensored distribution at the censorship point.
281
Using the facts that: S(x4 ) = 0, S(x0 ) = 1, lnS(x0 ) = ln(1) = 0, and lnF(x4 ) = ln(1) = 0.
282
See the second half of example 16.6 in Loss Models.
Exercise: An Exponential distribution with θ = 1000 is compared to the following data set censored
from above at 2000: 197, 325, 981, 2000.
What is the value of the Anderson-Darling statistic?
[Solution: F*(x) = 1 - e-x/1000, x < 2000. S*(x) = e-x/1000, x < 2000. ln(S(x)) = -x/1000.
n = number of data points = 4. k = number of data points not censored from above = 3.
Fn (yi) = i/4. Sn (yi) = 1 - Fn (yi) = 1 - i/4. y5 = u = 2000.
k
∑ Sn(yi)2 ln[SS* (y* (yi) )] = 0.4967.
i=0 i +1
i yi Sn(yi) lnS*(yi) lnS*(yi+1) contribution

0 0 1.0000 0.0000 -0.1970 0.1970
1 197 0.7500 -0.1970 -0.3250 0.0720
2 325 0.5000 -0.3250 -0.9810 0.1640
3 981 0.2500 -0.9810 -2.0000 0.0637
4 2000
0.4967
k
∑Fn(yi)2 ln[F F* (y* (yi +)1) ] = 0.4130.
i=1 i
i yi Fn(yi) lnF*(yi) lnF*(yi+1) contribution

1 197 0.2500 -1.7214 -1.2820 0.0275
2 325 0.5000 -1.2820 -0.4699 0.2030
3 981 0.7500 -0.4699 -0.1454 0.1825
4 2000 1.0000 -0.1454 0.0000
0.4130
F*(u) = 1 - e-2000/1000 = 0.8647.
Anderson-Darling statistic = -(4)(0.8647) + (4)(0.4967) + (4)(0.4130) = 0.180.]
Truncation:283
Here is an example of the computation of the Anderson-Darling Statistic, when there is truncation
from below at t.
Exercise: Losses prior to truncation are assumed to follow an Exponential distribution with θ = 1000.
This assumption is compared to the following data set truncated from below at 250:
325, 981, 2497.
What is the value of the Anderson-Darling Statistic?
[Solution: After truncation from below at 250, F*(x) = {F(x) - F(250)} / S(250) =
(e-250/1000 - e-x/1000)/e-250/1000 = 1 - e-(x-250)/1000, x > 250.
ln(S*(x)) = -(x-250)/1000, x > 250.
Fn (yi) = i/3. Sn (yi) = 1 - Fn (yi) = 1 - i/3.
y 0 = t = truncation point = 250. y4 = ∞.
In the absence of censoring we do not include the last term in the first summation, which would
otherwise involve taking the log of infinity.
2
∑ Sn(yi)2 {ln[S(yi)] - ln[S(yi + 1 )]} = 0.5350.
i=0

0 250 1.0000 0.0000 -0.0750 0.0750
1 325 0.6667 -0.0750 -0.7310 0.2916
2 981 0.3333 -0.7310 -2.2470 0.1684
3 2497
0.5350
3
∑Fn(yi)2 {ln[F(yi + 1)] - ln[F(y i)]} = 0.5729.
i=1

1 325 0.3333 -2.6275 -0.6567 0.2190
2 981 0.6667 -0.6567 -0.1117 0.2422
3 2497 1.0000 -0.1117 0.0000 0.1117
0.5729
F*(u) = F*(∞) = 1.
Anderson-Darling statistic = -3 + (3)(0.5350) + (3)(0.5729) = 0.324.]
283
See the first half of example 16.6 in Loss Models.
Truncation and Censoring:
Here is an example involving both truncation from below and censoring from above.
Exercise: Losses prior to truncation and censoring are assumed to follow a Weibull distribution with
θ = 3000 and τ = 1/2. This assumption is compared to the following data set truncated from below
at 1000 and censored from above at 10,000:
1219, 1737, 2618, 3482, 4825, 6011, 10,000, 10,000, 10,000, 10,000.
What is the value of the Anderson Darling statistic?
[Solution: After truncation from below at 1000, F*(x) = {F(x) - F(1000)}/S(1000) =
( e- 1000/ 3000 - e - x/ 3000 ) / e- 1000/ 3000 = 1 - 1.7813 e - x/ 3000 , 1000 < x < 10,000.
Fn (yi) = i/10. Sn (yi) = 1 - Fn (yi) = 1 - i/10.
k
∑ Sn(yi)2 ln[SS* (y* (yi) )] = 0.5123.
i=0 i +1
i yi Sn(yi) S*(yi) lnS*(yi) lnS*(yi+1) contribution

0 1000 1.0000 1.0000 0.0000 -0.0601 0.0601
1 1219 0.9000 0.9417 -0.0601 -0.1836 0.1000
2 1737 0.8000 0.8323 -0.1836 -0.3568 0.1109
3 2618 0.7000 0.6999 -0.3568 -0.5000 0.0702
4 3482 0.6000 0.6065 -0.5000 -0.6909 0.0687
5 4825 0.5000 0.5011 -0.6909 -0.8382 0.0368
6 6011 0.4000 0.4325 -0.8382 -1.2484 0.0656
10000 0.2870
0.5123
k
∑Fn(yi)2 ln[F F* (y* (yi +)1) ] = 0.2106.
i=1 i
i yi Fn(yi) F*(yi) lnF*(yi) lnF*(yi+1) contribution

1 1219 0.1000 0.0583 -2.8417 -1.7855 0.0106
2 1737 0.2000 0.1677 -1.7855 -1.2036 0.0233
3 2618 0.3000 0.3001 -1.2036 -0.9328 0.0244
4 3482 0.4000 0.3935 -0.9328 -0.6954 0.0380
5 4825 0.5000 0.4989 -0.6954 -0.5665 0.0322
6 6011 0.6000 0.5675 -0.5665 -0.3382 0.0822
10000 0.7130
0.2106
F*(u) = F*(10000) = 1 - 1.7813exp(-(10000/3000)0.5) = 0.7130.
Anderson-Darling statistic = -(10)(0.7130) + (10)(0.5123) + (10)(0.2106) = 0.099.]
Problems:
19.1 (3 points) For the following four losses: 40, 150, 230, 400, and an Exponential Distribution
with θ = 200, what is the value of the Anderson-Darling statistic?
A. 0.05 B. 0.10 C. 0.20 D. 0.30 E. 0.40
19.2 (4 points) A Pareto distribution with parameters α = 1.5 and θ = 1000 was fit to the following
five claims: 179, 352, 918, 2835, 6142.
A. less than 0.35
E. at least 0.50
19.3 (2 points) A Weibull Distribution, F*(x), has been fit to 80 uncensored sizes of loss, yj.
79
∑ {1 - Fn(yj)}2 {ln[1 - F * (yj)] - ln[1 - F * (yj + 1)]} = 0.4982.

j=0
80
∑ Fn(yj )2 {lnF * (yj + 1) - lnF * (yj)} = 0.5398.

j=1
Use the following table for the Anderson-Darling statistic:

Which of the following are true with respect this Weibull fit?
D. Reject the fit at 1%.
19.4 (3 points) A distribution was fit to the following 5 untruncated and uncensored losses:
410, 1924, 2635, 4548, and 6142.
The corresponding values of the fitted distribution are:
0.0355, 0.4337, 0.5659, 0.7720, and 0.8559.
A. 0.15 B. 0.20 C. 0.25 D. 0.30 E. 0.35

• There are five observed sizes of loss: x1 < x2 < x3 < x4 < x5 .
• A parametric distribution, F(x), has been fit to this data.
• F(x1 ) = 0.1. F(x2 ) = 0.3. F(x3 ) = 0.5. F(x4 ) = 0.7. F(x5 ) = 0.9.
19.5 (1 point) Compute the Kolmogorov-Smirnov Statistic.

A. 0.09 B. 0.10 C. 0.11 D. 0.12 E. 0.13
19.6 (2 points) Compute the Anderson-Darling Statistic.

A. 0.09 B. 0.10 C. 0.11 D. 0.12 E. 0.13

• F(x1 ) = 0.1. F(x2 ) = 0.3. F(x3 ) = 0.6. F(x4 ) = 0.7. F(x5 ) = 0.9.

A. 0.17 B. 0.18 C. 0.19 D. 0.20 E. 0.21

A. 0.17 B. 0.18 C. 0.19 D. 0.20 E. 0.21

• F(x1 ) = 0.1. F(x2 ) = 0.3. F(x3 ) = 0.5. F(x4 ) = 0.7. F(x5 ) = 0.8.

A. 0.17 B. 0.18 C. 0.19 D. 0.20 E. 0.21

A. 0.17 B. 0.18 C. 0.19 D. 0.20 E. 0.21
19.11 (1 point) A distribution has been fit to data. The Anderson-Darling Statistic is 2.11.
Which of the following are true with respect this fit?
D. Reject the fit at 1%.
19.12 (6 points) You observe the following 10 losses truncated from below at 100,000:
241,513 231,919 105,310 125,152 116,472
110,493 139,647 220,942 161,964 105,829
A Distribution Function: F(x) = 1 - (x/100000)-2.8, x > 100,000, has been fit to this data.
Determine the value of the Anderson-Darling statistic.
A. 0.1 B. 0.2 C. 0.3 D. 0.4 E. 0.5
19.13 (4 points) Losses prior to truncation and censoring are assumed to follow a Weibull
distribution with θ = 3000 and τ = 2.
This assumption is compared to the following data set censored from above at 5000:
737, 1618, 2482, 3003, 5000. What is the value of the Anderson-Darling statistic?
(A) 0.16 (B) 0.18 (C) 0.20 (D) 0.22 (E) 0.24

1. For the Kolmogorov-Smirnov test, if the sample size were to double,
with each number showing up twice instead of once,
the test statistic would double and the critical values would remain unchanged.
2. For the Anderson-Darling test, if the sample size were to double,
3. For the Likelihood Ratio test, if the sample size were to double,
A. 1 B. 2 C. 3 D. 1, 2 E. 2, 3
19.15 (4 points) Losses truncated from below at 50 are of size: 64, 90, 132, 206.
Loss sizes prior to truncation are assumed to follow an Exponential distribution with θ = 100.
Calculate the value of the Anderson-Darling statistic.
(A) 0.19 (B) 0.21 (C) 0.23 (D) 0.25 (E) 0.27
19.16 (4 points) With a deductible of 500 and a maximum covered loss of 5000, the following five
payments are made: 179, 352, 968, 1421, 4500.
Loss sizes prior to the effect of the deductible and maximum covered loss are assumed to follow a
Pareto Distribution with parameters α = 2 and θ = 1000.
Calculate the value of the Anderson-Darling statistic.
(A) 0.24 (B) 0.26 (C) 0.28 (D) 0.30 (E) 0.32
19.17 (3 points) Assume lifetimes are uniform on (0, 100).

You observe three deaths at ages: 40, 70, and 80.
Compute the Anderson-Darling Statistic.
(A) 0.51 (B) 0.54 (C) 0.57 (D) 0.60 (E) 0.63
19.18 (1 point) A certain distribution is being compared to data.

Let H0 be that the data was drawn from this distribution.
Let H1 be that the data was not drawn from this distribution.
The Anderson-Darling Statistic is 3.2.
What are the probabilities of Type I and Type II errors?
19.19 (3 points) Assume lifetimes are uniform on (0, 100).

You observe a single death at age 60.
Compute the Anderson-Darling Statistic using
u
{Fn(x) - F * (x)}2 f * (x)
A2 = n
∫t F * (x) S * (x)
dx.
(A) 0.31 (B) 0.34 (C) 0.37 (D) 0.40 (E) 0.43
19.20 (3 points) Prior to truncation and censoring f(x) = 2x, 0 < x < 1.
Data has been left truncated at 0.3 and right censored at 0.7: 0.6, 0.7+.
Compute the value of the Anderson-Darling statistic.
A. less than 0.10
E. at least 0.16
19.21 (3 points) Prior to the effect of a maximum covered loss, losses are assumed to follow a
LogNormal Distribution with µ = 11 and σ = 1.
Data has been collected from policies with a 50,000 maximum covered loss:
20,000, 30,000, 50,000+, 50,000+.
(A) 0.08 (B) 0.10 (C) 0.12 (D) 0.14 (E) 0.16
19.22 (3 points) Assume that the random variable x has the probability density function:
f(x) = 0.18 - 0.016x , 0 ≤ x ≤ 10.
Suppose that a sample is truncated at x = 6 so that values below this amount are excluded.
The sample is then observed to be: 7.0, 7.5, 8.0, 8.0.
(A) 0.7 (B) 0.8 (C) 0.9 (D) 1.0 (E) 1.1
19.23 (Course 160 Sample Exam #3, 1994, Q.12) (1.9 points) With respect to methods used
to test the acceptability of a fitted parametric model as a representation of the true underlying model,
according to Loss Models which of the following are true?
I. Let Ej be the expected number of observations in interval j.
Then the Chi-Square goodness of fit test works best when the Ej are about equal.
II. The Kolmogorov-Smirnov (K-S) statistic is the smallest absolute deviation between
the empirical and model distribution functions.
III. The Anderson-Darling statistic is a departure measure that weights
the expected squared deviations between the empirical and model distribution functions.
(A) I and II only (B) I and III only (C) II and III only (D) I, II and III
(E) The correct answer is not given by (A), (8), (C) or (D).
19.1. D. F(x) = 1 - e-x/200. F(40) = 0.1813. ln[F(40)] = -1.7078.

S(x) = e-x/200. ln[S(x)] = - x/200.
(1)ln[F(40)] + (3)ln[F(150)] + (5)ln[F(230)] + (7)ln[F(400)] = -6.5474.
(1)ln[S(2497)] + (3)ln[S(981)] + (5)ln[S(325)] + (7)ln[S(197)] = -10.6.
i 2i - 1 yi F(yi) lnF(yi) S(yn+i-1) lnS(yn+i-1)
1 1 40 0.1813 -1.7078 0.1353 -2
2 3 150 0.5276 -0.6394 0.3166 -1.15
3 5 230 0.6834 -0.3807 0.4724 -0.75
4 7 400 0.8647 -0.1454 0.8187 -0.2
-6.5474 -10.6000
n
Anderson-Darling statistic = A2 = -n - (1/n) ∑ (2i - 1) ln[F(yi) S(y n + 1- i)] =
i=1
- 4 - (1/4)(-6.5474 - 10.6) = 0.287.

n-1 n
Alternately, A2 = -n + n ∑ Sn (yi)2 {ln[S(yi)] - ln[S(yi + 1 )]} + n ∑Fn (yi)2 {ln[F(yi + 1 )] - ln[F(y i)]} =
i=0 i=1
-4 + (4){(12 )ln(1/0.8187) + (0.752 )ln(0.8187/0.4724) + (0.52 )ln(0.4724/0.3166) +

(0.252 )ln(0.3166/0.1353)} + (4){(0.252 )ln(0.5276/0.1813) + (0.52 )ln(0.6834/0.5276) +
(0.752 )ln(0.8647/0.6834) + (12 )ln(1/0.8647)}
= 0.287.
19.2. E. At each of the observed claim sizes, compute the values of the fitted Pareto distribution:
F(x) = 1 - {θ/(θ+x)}α = 1 - {1000/(1000 +x)}1.5.
So for example, F(352) = 1 - {1000/(1000 + 352)}1.5 = 0.3639.
(1)ln(F(179)) + (3)ln(F(352)) + (5)ln(F(918)) + (7)ln(F(2835)) + (9)ln(F(6142)) = -8.3984.
(1)ln(S(6142)) + (3)ln(S(2835)) + (5)ln(S(918)) + (7)ln(S(352)) + (9)ln(S(179)) = -19.2720.
1 1 179 0.2189 -1.5193 0.0524 -2.9490
2 3 352 0.3639 -1.0109 0.1332 -2.0163
3 5 918 0.6235 -0.4724 0.3765 -0.9769
4 7 2835 0.8668 -0.1429 0.6361 -0.4524
5 9 6142 0.9476 -0.0538 0.7811 -0.2470
-8.3984 -19.2720
Anderson-Darling statistic = - 5 - (1/5)(-8.3984 - 19.2720) = 0.534.
n-1 n
i=0 i=1
-5 + (5){(12 )ln(1/0.7811) + (0.82 )ln(0.7811/0.6361) + (0.62 )ln(0.6361/0.3765) +

(0.42 )ln(0.3765/0.1332) + (0.22 )ln(0.1332/0.0524) + (0.22 )ln(0.3639/0.2189) +
(0.42 )ln(0.6235/0.3639) + (0.62 )ln(0.8668/0.6235) + (0.82 )ln(0.9476/0.8668) + (12 )ln(1/0.9476)}
= 0.534.
19.3. C. Since there is no censoring, censorship value = u = ∞,

the term for j = 80 drops out of the first summation, and
k = number of uncensored points = number of points = n = 80.
The Anderson-Darling Statistic is computed as:
k k
∑ ∑ Fn(yi )2 ln[FF* (y* (yi +i)1)]

S * (y )
-nF*(u) + n Sn (yi)2 ln [S * (yi +i 1)] + n
i=0 i=1
= -80 + (80)(0.4982) + (80)(0.5398) = 3.04.

Since 2.492 < 3.04 < 3.880, we reject the fit at 5% and do not reject at 1%.
19.4. E. (1)ln(F(410)) + (3)ln(F(1924)) + (5)ln(F(2635)) + (7)ln(F(4548)) + (9)ln(F(6142)) =

-11.9029.
(1)ln(S(6142)) + (3)ln(S(4548)) + (5)ln(S(2635)) + (7)ln(S(1924)) + (9)ln(S(410)) = -14.8506.
1 1 410 0.0355 -3.3382 0.1441 -1.9372
2 3 1924 0.4337 -0.8354 0.2280 -1.4784
3 5 2635 0.5659 -0.5693 0.4341 -0.8345
4 7 4548 0.7720 -0.2588 0.5663 -0.5686
5 9 6142 0.8559 -0.1556 0.9645 -0.0361
-11.9029 -14.8506
Anderson-Darling statistic = - 5 - (1/5)(-11.9029 - 14.8506) = 0.351.
n-1 n
Alternately, A2 = -n + n ∑ Sn (yi)2 {ln[S(yi)] - ln[S(yi + 1 )]} + n ∑Fn (yi)2 {ln[F(yi + 1 )] - ln[F(y i)]} } =
i=0 i=1
-5 + (5){(12 )ln(1/0.9645) + (0.82 )ln(0.9645/0.5663) + (0.62 )ln(0.5663/0.4341) +

(0.42 )ln(0.4341/0.2280) + (0.22 )ln(0.2280/0.1441) + (0.22 )ln(0.4337/0.0355) +
(0.42 )ln(0.5659/0.4337) + (0.62 )ln(0.7720/0.5659) + (0.82 )ln(0.8559/0.7720) + (12 )ln(1/0.8559)}
= 0.351.
Comment: Based on a fitted LogNormal Distribution with parameters µ = 7.72 and σ = 0.944.
19.5. B. The Kolmogorov-Smirnov Statistic is 0.1.

0
0.1
x1 0.1
0.1
0.2
0.1
x2 0.3
0.1
0.4
0.1
x3 0.5
0.1
0.6
0.1
x4 0.7
0.1
0.8
0.1
x5 0.9
0.1
1
Comment: This is the smallest possible K-S statistic for 5 unique data points.
19.6. E. (1)ln(F(x1 )) + (3)ln(F(x2 )) + (5)ln(F(x3 )) + (7)ln(F(x4 )) + (9)ln(F(x5 )) =

ln(0.1) + 3ln(0.3) + 5ln(0.5) + 7ln(0.7) + 9ln(0.9) = -12.825.
(1)ln(S(x5 )) + (3)ln(S(x4 )) + (5)ln(S(x3 )) + (7)ln(S(x2 )) + (9)ln(S(x1 )) =
ln(0.1) + 3ln(0.3) + 5ln(0.5) + 7ln(0.7) + 9ln(0.9) = -12.825.
n
i=1
- 5 - (1/5)(-12.825 - 12.825) = 0.130.

n-1 n
i=0 i=1
-5 + (5){(12 )ln(1/0.9) + (0.82 )ln(0.9/0.7) + (0.62 )ln(0.7/0.5) + (0.42 )ln(0.5/0.3) + (0.22 )ln(.3/0.1)} +
(5){(0.22 )ln(0.3/0.1) + (0.42 )ln(0.5/0.3) + (0.62 )ln(0.7/0.5) + (0.82 )ln(0.9/0.7) + (12 )ln(1/0.9)} =
0.130.
19.7. D. The Kolmogorov-Smirnov Statistic is 0.2.

0
0.1
x1 0.1
0.1
0.2
0.1
x2 0.3
0.1
0.4
0.2
x3 0.6
0.0
0.6
0.1
x4 0.7
0.1
0.8
0.1
x5 0.9
0.1
1
19.8. A. (1)ln(F(x1 )) + (3)ln(F(x2 )) + (5)ln(F(x3 )) + (7)ln(F(x4 )) + (9)ln(F(x5 )) =

ln(.1) + 3ln(.3) + 5ln(.6) + 7ln(.7) + 9ln(.9) = -11.914.
ln(.1) + 3ln(.3) + 5ln(.4) + 7ln(.7) + 9ln(.9) = -13.941.
n
i=1
- 5 - (1/5)(-11.914 -13.941) = 0.171.

n-1 n
i=0 i=1
-5 + (5){(12 )ln(1/.9) + (.82 )ln(.9/.7) + (.62 )ln(.7/.4) + (.42 )ln(.4/.3) + (.22 )ln(.3/.1)} + (5){(.22 )ln(.3/.1)
+ (.42 )ln(.6/.3) + (.62 )ln(.7/.6) + (.82 )ln(.9/.7) + (12 )ln(1/.9)} = 0.171.
19.9. D. The Kolmogorov-Smirnov Statistic is 0.2.

0
0.1
x1 0.1
0.1
0.2
0.1
x2 0.3
0.1
0.4
0.1
x3 0.5
0.1
0.6
0.1
x4 0.7
0.1
0.8
0.0
x5 0.8
0.2
1
19.10. D. (1)ln(F(x1 )) + (3)ln(F(x2 )) + (5)ln(F(x3 )) + (7)ln(F(x4 )) + (9)ln(F(x5 )) =

ln(.1) + 3ln(.3) + 5ln(.5) + 7ln(.7) + 9ln(.8) = -13.885.
ln(.2) + 3ln(.3) + 5ln(.5) + 7ln(.7) + 9ln(.9) = -12.132.
n
i=1
-5 - (1/5)( -13.885 - 12.132) = 0.203.

n-1 n
i=0 i=1
-5 + (5) {(12 )ln(1/.9) + (.82 )ln(.9/.7) + (.62 )ln(.7/.5) + (.42 )ln(.5/.3) + (.22 )ln(.3/.2)}
+ (5) {(.22 )ln(.3/.1) + (.42 )ln(.5/.3) + (.62 )ln(.7/.5) + (.82 )ln(.8/.7) + (12 )ln(1/.8)} = 0.203.
Comment: One of the values of the fitted distribution function differs by 0.1 from the optimal value.
While the K-S statistic is the same as in the previous set of questions, the Anderson Darling statistic
is not. In this case, the discrepancy was in the right hand tail rather than in the middle. Since the tails
get more weight, the Anderson-Darling statistic is larger here than in the previous set of questions.
19.11. B. 1.933 < 2.11 < 2.492 ⇒ reject the fit at 10% and do not reject at 5%.
19.12. D. This is a Single Parameter Pareto Distribution, set up to work directly with data truncated
n
from below. Anderson-Darling statistic = A2 = -n - (1/n) ∑ (2i - 1) ln[F(yi) S(y n + 1- i)] =
i=1
- 10 - (1/10)(-43.5665 - 60.6007) = 0.417.

1 1 105310 0.1349 -2.0035 0.0847 -2.4689
2 3 105829 0.1467 -1.9194 0.0949 -2.3554
3 5 110493 0.2438 -1.4116 0.1086 -2.2196
4 7 116472 0.3475 -1.0570 0.2592 -1.3502
5 9 125152 0.4665 -0.7626 0.3926 -0.9351
6 11 139647 0.6074 -0.4985 0.5335 -0.6282
7 13 161964 0.7408 -0.3000 0.6525 -0.4269
8 15 220942 0.8914 -0.1150 0.7562 -0.2794
9 17 231919 0.9051 -0.0997 0.8533 -0.1586
10 19 241513 0.9153 -0.0885 0.8651 -0.1449
-43.5665 -60.6007
n-1 n
Alternately, A2 = -nF*(u) + n ∑ Sn (yi)2 {ln[S(yi)] - ln[S(yi + 1 )]} + n ∑Fn (yi)2 {ln[F(yi + 1 )] - ln[F(y i)]}
i=0 i=1
Fn (yi) = i/10. Sn (yi) = 1 - Fn (yi) = 1 - i/10.
y 0 = t = truncation point = 100,000. y11 = ∞.
9
∑ Sn(yj)2 {ln[S * (yj)] - ln[S * (yj + 1)]} = 0.6060.

j=0

0 100000 1.0000 0.0000 -0.1449 0.1449
1 105310 0.9000 -0.1449 -0.1586 0.0111
2 105829 0.8000 -0.1586 -0.2794 0.0773
3 110493 0.7000 -0.2794 -0.4269 0.0723
4 116472 0.6000 -0.4269 -0.6282 0.0725
5 125152 0.5000 -0.6282 -0.9351 0.0767
6 139647 0.4000 -0.9351 -1.3502 0.0664
7 161964 0.3000 -1.3502 -2.2196 0.0783
8 220942 0.2000 -2.2196 -2.3554 0.0054
9 231919 0.1000 -2.3554 -2.4689 0.0011
241513
0.6060
10
∑ Fn(yj )2 {ln[F * (yj + 1)] - ln[F * (yj )]} = 0.4357.

j=1

1 105,310 0.1000 -2.0035 -1.9194 0.0008
2 105,829 0.2000 -1.9194 -1.4116 0.0203
3 110,493 0.3000 -1.4116 -1.0570 0.0319
4 116,472 0.4000 -1.0570 -0.7626 0.0471
5 125,152 0.5000 -0.7626 -0.4985 0.0660
6 139,647 0.6000 -0.4985 -0.3000 0.0715
7 161,964 0.7000 -0.3000 -0.1150 0.0907
8 220,942 0.8000 -0.1150 -0.0997 0.0098
9 231,919 0.9000 -0.0997 -0.0885 0.0091
10 241,513 1 -0.0885 0.0000 0.0885
0.4357
F*(u) = F*(∞) = 1.
Anderson-Darling statistic = -10 + (10)(.6060) + (10)(.4357) = 0.417.
19.13. C. F*(x) = 1 - exp[-(x/3000)2 ], x < 5000.

Fn (yi) = i/5. Sn (yi) = 1 - Fn (yi) = 1 - i/5. y6 = u = 5000.
k
∑ Sn(yj)2 {ln[S * (yj)] - ln[S * (yj + 1)]} = 0.4714.

j=0

0 0 1.0000 0.0000 -0.0604 0.0604
1 737 0.8000 -0.0604 -0.2909 0.1475
2 1618 0.6000 -0.2909 -0.6845 0.1417
3 2482 0.4000 -0.6845 -1.0020 0.0508
4 3003 0.2000 -1.0020 -2.7778 0.0710
5000
0.4714
k
∑ Fn(yj )2 {ln[F * (yj + 1)] - ln[F * (yj )]} = 0.5061.

j=1

1 737 0.2000 -2.8376 -1.3768 0.0584
2 1618 0.4000 -1.3768 -0.7019 0.1080
3 2482 0.6000 -0.7019 -0.4575 0.0880
4 3003 0.8000 -0.4575 -0.0642 0.2517
5000
0.5061
F*(u) = 1 - exp[-(5000/3000)2 ] = 0.9378.
n-1 n
A2 = -nF*(u) + n ∑ Sn (yi)2 {ln[S(yi)] - ln[S(yi + 1 )]} + n ∑Fn (yi)2 {ln[F(yi + 1 )] - ln[F(y i)]} =
i=0 i=1
-(5)(.9378) + (5)(.4714) + (5)(.5061) = 0.199.

19.14. E. 1. False.
The Fitted and Empirical Distribution Functions would remain the same, as would the K-S Statistic.
However, the critical values go down as 1/ n , so they would be 1/ 2 as large as before.
2. True. The Fitted and Empirical Distribution Functions would remain the same.
However, the formula for the Anderson-Darling Statistic has a factor of n, so that the statistic would
double.
The table of critical values for the Anderson-Darling Statistic does not depend on sample size.
3. True. The fitted parameters of the distributions would remain the same.
The same terms would enter into the sum that is the loglikelihood, except each term would show up
twice, resulting in twice the loglikelihood.
The test statistic, twice the difference in loglikelihoods, would be double what it was.
One would still have the number of degrees of freedom equal to the difference in the number of
parameters, and therefore, one would get the same critical values when one entered the Chi-Square
Table.
Comment: See pages Section 16.4 of Loss Models.
19.15. C. After truncation from below at 50, F*(x) = {F(x) - F(50)}/S(50) =

(e-50/100 - e-x/100)/e-50/100 = 1 - e-(x-50)/100, x > 50.
ln(S*(x)) = -(x-50)/100, x > 20.
Fn (yi) = i/4. Sn (yi) = 1 - Fn (yi) = 1 - i/4.
y 0 = t = truncation point = 50. y5 = ∞.
3
∑ Sn(yj)2 {ln[S * (yj)] - ln[S * (yj + 1)]} = 0.4375.

j=0

0 50 1.0000 0.0000 -0.1400 0.1400
1 64 0.7500 -0.1400 -0.4000 0.1462
2 90 0.5000 -0.4000 -0.8200 0.1050
3 132 0.2500 -0.8200 -1.5600 0.0462
4 206
0.4375
4
∑ Fn(yj )2 {ln[F * (yj + 1)] - ln[F * (yj )]} = 0.6199.

j=1

1 64 0.2500 -2.0353 -1.1096 0.0579
2 90 0.5000 -1.1096 -0.5806 0.1323
3 132 0.7500 -0.5806 -0.2359 0.1939
4 206 1.0000 -0.2359 0.0000 0.2359
0.6199
F*(u) = F*(∞) = 1.
n-1 n
A 2 = -nF*(u) + n ∑ Sn (yi)2 {ln[S(yi)] - ln[S(yi + 1 )]} + n ∑Fn (yi)2 {ln[F(yi + 1 )] - ln[F(y i)]}
i=0 i=1
= -4 + (4)(0.4375) + (4)(0.6199) = 0.230.

19.16. D. Prior to truncation and censoring the distribution function is:

F(x) = 1 - {1000/(1000 + x)}2 .
After truncation from below at 500 and censoring from above at 5000 the distribution function is:
F*(x) = {F(x) - F(500)}/S(500) = 1 - S(x)/S(500) = 1 - {1000/(1000 + x)}2 /{1000/1500}2 =
1 - {1500/(1000 + x)}2 , 500 < x < 5000.
Payments of: 179, 352, 968, 1421, 4500, correspond to losses of size: 679, 852, 1468, 1921,
5000 or more.
Fn (yi) = i/5. Sn (yi) = 1 - Fn (yi) = 1 - i/5.
k
∑ Sn(yj)2 {ln[S * (yj)] - ln[S * (yj + 1)]} = 0.6692.

j=0
i yi Sn(yi) S*(yi) lnS*(yi) lnS*(yi+1) contribution

0 500 1.0000 1.0000 0.0000 -0.2255 0.2255
1 679 0.8000 0.7981 -0.2255 -0.4216 0.1255
2 852 0.6000 0.6560 -0.4216 -0.9959 0.2067
3 1468 0.4000 0.3694 -0.9959 -1.3329 0.0539
4 1921 0.2000 0.2637 -1.3329 -2.7726 0.0576
5 5000 0.0625
0.6692
k
∑ Fn(yj )2 {ln[F * (yj + 1)] - ln[F * (yj )]} = 0.3287.

j=1
i yi Fn(yi) F*(yi) lnF*(yi) lnF*(yi+1) contribution

1 679 0.2000 0.2019 -1.6002 -1.0671 0.0213
2 852 0.4000 0.3440 -1.0671 -0.4611 0.0970
3 1468 0.6000 0.6306 -0.4611 -0.3061 0.0558
4 1921 0.8000 0.7363 -0.3061 -0.0645 0.1546
5 5000 0.9375
0.3287
F*(u) = F*(5000) = 1 - {1500/(1000 + 5000)}2 = 0.9375.
k k
S * (yi) F * (yi + 1)
Anderson-Darling statistic = A2 = -nF*(u) + n ∑ Sn (yi)2 ln[ ] + n ∑ Fn (yi)2 ln[ ]=
i=0 S * (y i + 1 ) i=1 F * (y i )
-(5)(.9375) + (5)(.6692) + (5)(.3287) = 0.302.

n
19.17. E. A2 = -n - (1/n) ∑ (2i - 1) ln[F(yi) S(y n + 1- i)] =
i=1
-3 - (1/3){ln[F(40)S(80)] + 3ln[F(70)S(70)] + 5ln[F(80)S(40)]} =

-3 - (1/3){ln[(.4)(.2)] + 3ln[(.7)(.3)] + 5ln[(.8)(.6)]} = 0.626.
19.18. 2.492 < 3.2 < 3.880. Therefore, if one rejects H0 , the chance of making a Type I error is
between 5% and 1%. A Type II error would occur if we failed to reject H0 when it is not true.
There is no way to determine the probability of a Type II from the given information.
19.19. E. In this case, n = 1, t = 0, u = 100, Fn (x) = 0 for x < 60 and 1 for x ≥ 60, F*(x) = x/100, and
f*(x) = 1/100.
u
n ∫ {Fn(x) - F * (x)}2 f * (x) / {F * (x)S* (x)} dx =
t
60
∫ (0 - x / 100)2 (1/ 100) / {(x / 100)(1 - x / 100)} dx +
0
100
∫ (1 - x / 100)2 (1/ 100) / {(x / 100)(1 - x / 100)} dx =
60
60 100
0.01 ∫ x / (100 - x) dx + 0.01 ∫ (100 - x)/ x dx =
0 60
60 100
0.01 ∫ 100 / (100 - x) - 1 dx + 0.01 ∫ 100 / x - 1 dx =
0 60
(0.01) {100 ln(100/40) - 60 - 100 ln(100/60) - 40} = ln(2.5) - ln(.6) - 1 = 0.427.

n
Comment: A2 = -n - (1/n) ∑ (2i - 1) ln[F(yi) S(y n + 1- i)] = -1 - {ln(.6) + ln(.4)} = 0.427.
i=1
19.20. C. Prior to truncation and censoring the distribution function is: F(x) = x2 , 0 < x < 1.
F(0.3) = 0.09.
After truncation from below at 0.3 the distribution function is:
F*(x) = {F(x) - F(0.3)}/S(0.3) = (x2 - 0.09)/0.91, 0.3 < x < 1.
Fn (yi) = i/2. Sn (yi) = 1 - Fn (yi) = 1 - i/2.
k
∑ Sn(yj)2 {ln[S * (yj)] - ln[S * (yj + 1)]} =

j=0
S n (0.3)2 {ln(S*(0.3)) - ln(S*(0.6))} + Sn (0.6)2 {ln(S*(0.6)) - ln(S*(0.7))} =

12 {ln(1) - ln(0.7033)} + (1/2)2 {ln(0.7033) - ln(0.5604)} = 0.4088.
k
∑ Fn(yj )2 {ln[F * (yj + 1)] - ln[F * (yj )]} =

j=1
Fn (0.6)2 {ln(F*(0.7)) - ln(F*(0.6))} = (1/2)2 {ln(0.4396) - ln(0.2967)} = 0.0983.
F*(u) = F*(0.7) = {0.72 - 0.09}/0.91 = 0.4396.

k k
S * (yi) F * (yi + 1)
i=0 S * (y i + 1 ) i=1 F * (y i )
-(2)(0.4396) + (2)(0.4088) + (2)(0.0983) = 0.1350.

ln(20,000) - 11
19.21. E. For the LogNormal Distribution, F(20,000) = Φ[ ] = Φ[-1.10] = 0.1357.
1
ln(30,000) - 11
F(30,000) = Φ[ ] = Φ[-0.69] = 0.2451.
1
ln(50,000) - 11
F(50,000) = Φ[ ] = Φ[-0.18] = 0.4286.
1
Fn (yi) = i/4. Sn (yi) = 1 - Fn (yi) = 1 - i/4.
k
∑ Sn(yj)2 {ln[S * (yj)] - ln[S * (yj + 1)]} =

j=0
S n (0)2 {ln(S*(0)) - ln(S*(20,000))} + Sn (20,000)2 {ln(S*(20,000)) - ln(S*(30,000))}

+ Sn (30,000)2 {ln(S*(30,000)) - ln(S*(50,000))} =
12 {ln(1) - ln(0.8643)} + (3/4)2 {ln(0.8643) - ln(0.7549)} + (1/2)2 {ln(0.7549) - ln(0.5714)} =

0.2916.
k
∑ Fn(yj )2 {ln[F * (yj + 1)] - ln[F * (yj )]} =

j=1
Fn (20,000)2 {ln(F*(30,000)) - ln(F*(20,000))} + Fn (30,000)2 {ln(F*(50,000)) - ln(F*(30,000))} =

(1/4)2 {ln(0.2451) - ln(0.1357)} + (1/2)2 {ln(0.4286) - ln(0.2451)} = 0.1767.
F*(u) = F(50,000) = 0.4286.
k k
S * (yi) F * (yi + 1)
i=0 S * (y i + 1 ) i=1 F * (y i )
-(4)(0.4286) + (4)(0.2916) + (4)(0.1767) = 0.1588.

19.22. B. Prior to truncation the distribution function is: F(x) = 0.18x - 0.008x2 , 0 < x < 10.
F(6) = 0.792.
F*(x) = {F(x) - F(6)}/S(6) = (0.18x - 0.008x2 - 0.792)/0.208, 6 < x < 10.
F*(6) = 0. F*(7) = 0.3654. F*(7.5) = 0.5192. F*(8) = 0.6538.
Fn (yi) = i/4. Sn (yi) = 1 - Fn (yi) = 1 - i/4.
k
∑ Sn(yj)2 {ln[S * (yj)] - ln[S * (yj + 1)]} =

j=0
S n (6)2 {ln(S*(6)) - ln(S*(7))} + Sn (7)2 {ln(S*(7)) - ln(S*(7.5))}

+ Sn (7.5)2 {ln(S*(7.5)) - ln(S*(8))} + Sn (8)2 {ln(S*(8)) - ln(S*(8))} =
12 {ln(1) - ln(0.6346)} + (3/4)2 {ln(0.6346) - ln(0.4808)} + (1/2)2 {ln(0.4808) - ln(0.3462)} + 0

= 0.6930,
where since there is no censoring we have left out the last term that would involve ln(S*(∞)) = ln(0).
k
∑ Fn(yj )2 {ln[F * (yj + 1)] - ln[F * (yj )]} =

j=1
Fn (7)2 {ln(F*(7.5)) - ln(F*(7))} + Fn (7.5)2 {ln(F*(8)) - ln(F*(7.5))} + Fn (8)2 {ln(F*(8)) - ln(F*(8))}

+ Fn (8)2 {ln(F*(∞)) - ln(F*(8))} =
(1/4)2 {ln(0.5192) - ln(0.3654)} + (1/2)2 {ln(0.6538) - ln(0.5192)} + 0 + 12 {ln(1) - ln(.6538)}

= 0.5045
F*(u) = F*(∞) = 1.
k k
S * (yi) F * (yi + 1)
i=0 S * (y i + 1 ) i=1 F * (y i )
-(4)(1) + (4)(0.6930) + (4)(0.5045) = 0.790.
19.23. B. Statement I is true. See Section 16.4.3 of Loss Models.

Statement II is false; the K-S statistic is the largest absolute deviation.
Statement III is true. See Section 16.4.2 of Loss Models.
2016-C-6, Fitting Loss Dists. §20 Percentile Match. Trunc., HCM 10/23/15, Page 652
Section 20, Percentile Matching Applied to Truncated Data
Assume we wish to fit a distribution to the ground-up total limits losses,284 but all we have is truncated
data. Then one can work with the truncated distribution function that corresponds to the distribution
function of the ground-up total limits losses. In this section, fitting via Percentile Matching in this
situation will be discussed. In subsequent sections, the Method of Moments and Maximum
Likelihood will be discussed for this situation.
If one has data truncated and shifted from below, where the payments after the application of a
deductible have been recorded, then one can translate to data truncated from below where the loss
amounts of the insured have been recorded, or vice versa.
When data is truncated from below at the value d, losses of size less than d are not in the reported
data base. As discussed previously, the distribution function is revised as follows:
F(x)- F(d)
G(x) = , x > d.
S(d)
One can apply percentile matching to data truncated from below by working with this revised
distribution function. One matches G(x) to the empirical distribution at a number of points equal to the
number of parameters of the chosen type of distribution.
For example, assume that the ungrouped data in Section 2 were truncated from below at $150,000;
only the 52 losses in the final two columns would be reported.
Then for example for the Exponential Distribution, for truncation from below at 150,000 the revised
Distribution Function is:
(1 - e - x / θ) - (1 - e - 150,000 / θ )
G(x) = = 1 - e-(x−150000)/θ.
1 - (1 - e - 150 ,000 / θ )
If we were to match the observed percentile p at the observed claim size x:

p = 1 - e-(x−150000)/θ. ⇒ θ = -(x - 150000) / {ln(1-p)}.
The 26th claim (out of the 52 losses greater than $150,000) is $406,900. Therefore matching at the
26 / (1 + 52) percentile, would imply θ = -256,900 / {ln(1 - 26/53)} = 3.81 x 105 .
Exercise: Fit an exponential distribution to the data in Section 2 truncated at $150,000, via percentile
matching at the 40th observed claim.
[Solution: θ = - (766,100 - 150,000 )/ {ln(1 - 40/53)} = 4.38 x 105 .]
284
Unaffected by deductibles or maximum covered losses.
Problems:
20.1 (1 point) From a policy with a $1000 deductible, you observe 4 claims with the following
amounts paid by the insurer: $1000, $2000, $3000, $5000.
You fit an exponential distribution F(x) = 1 - e-x/θ to this data via percentile matching.
If the matching is performed at the $5000 claim payment, then the parameter θ is in which of the
following intervals?
A. less than 1500
E. at least 3000
From a policy with a $10,000 deductible, you observe 6 claims with the following amounts paid by
the insurer: $22,000, $28,000, $39,000, $51,000, $80,000, and $107,000.
20.2 (2 points) You assume the losses prior to the impact of the deductible follow a Pareto
Distribution with θ = 40,000.
Percentile matching is performed at the payment of size $51,000.
What is the fitted α parameter?
(A) 1.2 (B) 1.4 (C) 1.6 (D) 1.8 (E) 2.0
20.3 (4 points) You assume the losses prior to the impact of the deductible follow a LogNormal
Distribution with µ = 10.
Percentile matching is performed at the payment of size $80,000.
What is the fitted σ parameter?
(A) 1.2 (B) 1.4 (C) 1.6 (D) 1.8 (E) 2.0
20.4 (2 points) From a policy with a $50 deductible and a 70% coinsurance factor, you observe 9
claims with the following amounts paid by the insurer:
$28, $70, $105, $140, $203, $224, $350, $504, $665.
You assume the losses prior to the impact of the deductible and coinsurance factor follow an
Exponential Distribution F(x) = 1 - e-x/θ.
Percentile matching is performed at the payment of size $224.
What is the fitted θ?
A. 300 B. 325 C. 350 D. 400 E. 450

(i) There is a deductible of 100.
Loss Range Number of Losses
(100 – 200] 31
(200 – 400] 39
(400 – 750] 15
(750 – 1000] 5
over 1000 10
FIt via percentile matching a Weibull Distribution to the payments.
Fit at the 70th and 90th percentiles.
Estimate the probability that a loss will exceed 2000.
A. less than 2.0%
E. at least 3.5%
20.6 (3 points) From a policy with a $100 deductible and an 80% coinsurance factor, you observe
14 claims with the following amounts paid by the insurer:
$40, $80, $80, $120, $160, $200, $320, $400, $560, $720, $1120, $1520, $3920, $7920.
You assume the losses prior to the impact of the deductible and coinsurance factor follow
a Loglogistic Distribution.
Percentile matching is performed at the payment of sizes of $80 and $400.
What is S(10,000) for the fitted Loglogistic Distribution?
A. 0.5% B. 1.0% C. 1.5% D. 2.0% E. 2.5%
20.7 (5 points) You have the following data on the frequency of large claims under Medicare:
Amount Annual Number of Claims Exceeding Amount per 1000 members
50,000 54.8
100,000 14.8
200,000 1.2
Fit a Weibull Distribution to the ground up claims using the above information.
20.1. E. The insurerʼs payments above a deductible is data truncated and shifted.
The distribution function for the data truncated and shifted at 1000 is
G(x) = {F(x+1000) - F(1000)} / {1-F(1000)} = {e-1000/θ - e-(x+1000)/θ } / e-1000/θ = 1 - e-x/θ.
The observed 4/(4+1) = 80th percentile is 5000. Therefore, G(5000) = 1 - e-5000/θ = 0.80.
Solving, θ = 3107.
Comment: Note that the exponential distribution is the same after truncation and shifting; this implies
its constant mean excess loss.
20.2. A. For the data truncated and shifted from below, G(x) = (F(x+d) - F(d))/S(d).
The payment of $51,000 is the 4th of 6 payments, so it corresponds to an estimate of the 4/7
percentile of G. F(d) = F(10,000) = 1 - (1 + 10,000/40,000)−α = 1 - 0.8α.
F(51000 + d) = F(61,000) = 1 - (1 + 61,000/40,000)−α = 1 - 0.396α.
Matching the observed and theoretical percentiles:
4/7 = G(51.000) = (F(61.000) - F(10.000)) / {1 - F(10.000)} = {0.8α - 0.396α} / (0.8)α
= 1 - 0.495α.
Solving, α = ln(3/7) / ln(0.495) = 1.2.
Alternately, for data truncated from below, G(x) = (F(x) - F(d))/{1 - F(d)}.
A payment of $51,000 corresponds to a loss of $61,000. The payment of $51,000 is the 4th of 6
payments, so it corresponds to an estimate of the 4/7 percentile of G.
F(10,000) = 1 - (1 + 10,000/40,000)−α = 1 - 0.8α.
F(61,000) = 1 - (1 + 61,000/40,000)−α = 1 - 0.396α.

4/7 = G(61,000) = (F(61,000) - F(10,000)) / {1 - F(10,000)} = {0.8α - 0.396α} / (0.8)α = 1 - 0.495α.
Solving: α = ln(3/7) / ln(0.495) = 1.2.
Alternately, the non-zero payments follow another Pareto Distribution with the same α and
θ = (original θ) + d = 40,000 + 10,000 = 50,000.
α
⎛ 50,000 ⎞
G(51,000) = 1 - ⎜ ⎟ = 1 - 0.495α.
⎝ 50,000 + 51,000⎠
Matching the observed and theoretical percentiles: 4/7 = 1 - 0.495α. ⇒ α = ln(3/7) / ln(0.495) = 1.2.
Comment: A Pareto with parameters α and θ, after truncating and shifting at d,
is another Pareto with parameters α and θ + d.
20.3. C. For the data truncated from below, G(x) = (F(x) - F(d))/{1 - F(d)}.
A payment of $80,000 corresponds to a loss of $90,000. The payment of $80,000 is the 5th of 6
payments, so it corresponds to an estimate of the 5/7 percentile of G.
5/7 = G(90000) = (F(90000) - F(10000))/{1 - F(10000)} =
{Φ[ln(90000) - 10)/σ] - Φ[ln(10000) - 10)/σ]} / {1 - Φ[ln(10000) - 10)/σ]} =
{Φ(1.408/σ) - Φ(-.790/σ)} / {1 - Φ(-.790/σ)} = {Φ(1.408/σ) + Φ(.790/σ) - 1} /Φ(.790/σ).
Plugging in the given values of σ, for σ = 1.6, G(90,000) is closest to the desired 5/7 = 0.714.
σ {Φ(1.408/σ) + Φ(.790/σ) - 1} / Φ(.790/σ)
1.2 Φ(1.17) + Φ(.66) - 1} /Φ(.66) = (.8790 + .7454 - 1)/.7454 = 0.838
1.4 Φ(1.01) + Φ(.56) - 1} /Φ(.56) = (.8438 + .7123 - 1)/.7123 = 0.781
1.6 Φ(0.88) + Φ(.49) - 1} /Φ(.49) = (.8106 + .6879 - 1)/.6879 = 0.725
1.8 Φ(0.78) + Φ(.44) - 1} /Φ(.44) = (.7823 + .6700 - 1)/.6700 = 0.675
2.0 Φ(0.70) + Φ(.40) - 1} /Φ(.40) = (.7580 + .6554 - 1)/.6554 = 0.631
Comment: Beyond what you are likely to be asked on your exam. One needs to solve
numerically; a more exact answer is σ = 1.642. If both µ and σ were allowed to vary, then one
would have to match at two percentiles. For example, if one matched at payments of $39,000
and $80,000, then solving numerically would give µ = 10.5 and σ = 1.26.
20.4. C. The $224 payment is 6 out of 9, and thus corresponds to an estimate of the 6/(9+1) =
60th percentile. Translate the $224 payment back to what it would have been in the absence of the
coinsurance factor: 224/0.7 = 320.
The insurerʼs payments excess of a deductible is data truncated and shifted.
The distribution function for the data truncated and shifted at 50 is:
G(x) = {F(x+50) - F(50)} / {1-F(50)} = {e-50/θ - e-(x +50)/θ } / e-50/θ = 1 - e-x/θ.
The observed 60th percentile corresponds to a payment of $320 after the deductible. Therefore,
we want G(320) = 1 - e-320/θ = .60. Solving, θ = -320/ln(.4) = $349.
Comment: Due to the memory property of the Exponential Distribution, ignoring the coinsurance,
the distribution of non-zero payments excess of the deductible is the same as that of the ground up
losses. For θ = 349, Prob[Loss ≤ 50] = 1 - e-50/349 = 0.1335,
and Prob[Loss ≤ 370] = 1 - e-370/349 = 0.6536.
For θ = 349, ignoring the coinsurance,
Prob[non-zero payment ≤ 320] = 1 - e-320/349 = .600 = (0.6536 - 0.1335)/(1 - 0.1335).
For θ = 349, including the 70% coinsurance, Prob[non-zero payment ≤ (320)(0.7) = 224] = 0.600.
20.5. C. The 70th and 90th percentiles of the data are losses of size 400 and 1000.
These correspond to payments of 300 and 900.
0.7 = 1 - exp[-(300/θ)τ]. ⇒ (300/θ)τ = 1.20397.
0.9 = 1 - exp[-(900/θ)τ]. ⇒ (650/θ)τ = 2.30259.
(900/300)τ = 1.9125. ⇒ τ = 0.5902. ⇒ θ = 219.0.

Prob[loss > 2000] = Prob[payment > 1900] = S(1900) = exp[-(1900/219)0.5902] = 2.79%.
20.6. B. $80 shows up twice in the sample, so it corresponds to: 2.5/15.

$400 corresponds to smoothed empirical estimate of percentiles of 8/15.
A payment of $80 corresponds to a loss of: (80/0.8) + 100 = $200.
A payment of $400 corresponds to a loss of: (400/0.8) + 100 = $600.
For the Loglogistic, VARp [X] = θ {p-1 - 1}-1/γ.
Thus, 200 = θ {15/2.5 - 1)-1/γ, and 600 = θ {15/8 - 1)-1/γ.
Dividing the two equations: 3 = {(7/8) / 5}-1/γ. ⇒ γ = - ln(7/40) / ln(3) = 1.587.
⇒ θ = (600) (7/8)1/1.587 = 552.

(x / θ)γ 1
For the Loglogistic, F(x) = γ . ⇒ S(x) = 1 + (x / θ)γ .
1 + (x / θ)
1
S(10,000) = = 1.0%.
1 + (10,000 / 552)1.587
(200 / 552)1.587
Comment: Check. F(200) = = 0.1664. 2.5/15 = 1/6 = 1.667.
1 + (200 / 552)1.587
(600 / 552)1.587
F(600) = = 0.5330. 8/15 = 0.5333.
1 + (600 / 552)1.587
20.7. We are not given the survival functions, but we do know that:
S(100,000) / S(50,000) = 14.8/54.8, while S(200,000) / S(50,000) = 1.2/54.8.
For the Weibull Distribution, S(x) = exp[-(x/θ)τ]. Thus we have two equations in two unknowns:
14.8/54.8 = exp[-(100,000 / θ)τ] / exp[-(50,000 / θ)τ] = exp[(50,000τ - 100,000τ) / θτ].
1.2/54.8 = exp[-(200,000 / θ)τ] / exp[-(50,000 / θ)τ] = exp[(50,000τ - 200,000τ) / θτ].

Taking logs:
-1.3091 = (50,000τ - 100,000τ) / θτ.
-3.8214 = (50,000τ - 200,000τ) / θτ.

Dividing the two equations:
2.9191 = (50,000τ - 200,000τ) / (50,000τ - 100,000τ) = (1 - 4τ) / ( 1 - 2τ).
⇒ 2.9191 - 2.9191 2τ = 1 - 4τ.
Let x = 2τ, then x2 - 2.9191 x + 1.9191. ⇒ (x - 1)(x - 1.9191) = 0. ⇒ x = 1 or x = 1.9191.
x = 1 ⇒ 2τ = 1. ⇒ τ = 0, not a viable answer.
x = 1.9191 ⇒ 2τ = 1.9191. ⇒ τ = 0.9404. ⇒
θ0.9404 = (50,0000.9404- 200,0000.9404) / (-3.8214) = 18,419. ⇒ θ = 34,324.

Comment: Mathematically equivalent to percentile matching with data truncated from below at
50,000. For the truncated data, at 100,000 the distribution function is: 1 - 14.8/54.8 = 0.7299.
For the truncated data, at 200,000 the distribution function is: 1 - 1.2/54.8 = 0.9781.
For the Weibull distribution truncated from below at 50,000, at 100,000 the distribution function is:
1 - exp[-(100,000/34,324)0.9404] / exp[-(50,000/34,324)0.9404] = 0.7299.
For the Weibull distribution truncated from below at 50,000, at 200,000 the distribution function is:
1 - exp[-(200,000/34,324)0.9404] / exp[-(50,000/34,324)0.9404] = 0.9781.
Data taken form “Who Moved My Deductible?”, by Mark Troutman, in the May 2012 issue of
Contingencies.
2016-C-6, Fitting Loss Dists. §21 Method Moments Trunc., HCM 10/23/15, Page 659
Section 21, Method of Moments Applied to Truncated Data
One can apply the Method of Moments to truncated data.
If the original data prior to truncation (ground up claim amounts) is assumed to follow F(x), then the
mean of the data truncated and shifted at d is e(d), the mean excess loss at d.
Average Payment per Payment = (E[X] - E[X ∧ d]) / S(d) = e(d).
If data is truncated from below, no shifting, then the average value is d more:
∞
e(d) + d = ∫ t f(t) dt / S(d).
d
Thus if you know the form of e(d), this helps in fitting by the method of moments to truncated data.
Truncation from Below:
For example assume that the losses in the absence of any deductible would follow a Pareto
distribution function with α = 3: F(x) = 1 - {θ /(θ +x)}3 .
Assume that the following data has been truncated from below at 5: 10, 15, 20, 30, and 50. Then
the method of moments can be is used to estimate the parameter θ.
For the Pareto Distribution, the mean excess loss is; e(x) = (θ+x)/(α−1). For α = 3, e(5) = (θ+5)/2.
The average size of the data truncated from below at 5 is: e(5) + 5 = θ/2 + 7.5.
The observed mean for the data truncated from below is: (10+15+20+30+50) / 5 = 25.
Matching the observed mean to the theoretical mean: 25 = θ/2 + 7.5. Thus θ = (2)(25 - 7.5) = 35.
For the ungrouped data in Section 2 truncated from below at $150,000, the mean is 684,550.
For an Exponential Distribution, e(d) = θ. Thus for d = 150,000, e(d) + d = θ + 150000.
Therefore, fitting a ground-up Exponential Distribution to the ungrouped data in Section 2 truncated
at $150,000 via the method of moments: θ + 150,000 = 684,550.285
Therefore, θ = 684,550 - 150,000 = 534,550.
In general, for a truncation point of d from below, the method of moments applied to the Exponential
Distribution: θ = Σ (xi - d) / N = Σxi /N - d = (mean of the truncated data) - d.
285
We assume that the losses prior to truncation from below follow an Exponential Distribution.
Truncation from Below, Two Parameter Case:
Applying the method of moments to two parameter distributions is more complicated, since one
needs to match both the first and second moments.
The second moment can be computed as:
∞
∫ t2 f(t) dt / S(d).
d
Such integrals are computed in the same manner as those for the moments and Limited Expected
Values. While these integrals are normally too complicated to be computed on the exam, one can
make use of the formulas for the limited moments provided in Appendix A of Loss Models.
The second moment of the data truncated from below can be put in terms of the Limited Second
Moment as follows:
d
E[(X ∧ d)2 ] = ∫ t2 f(t) dt + S(d)d2 .
0
∞
E[X2 ] = ∫ t2 f(t) dt .
0
∞ ∞ d
Therefore, ∫ t2 f(t) dt = ∫ t2 f(t) dt - ∫ t2 f(t) dt = E[X2] + S(d)d2 - E[(X ∧ d)2].
d 0 0
Then the second moment of the data truncated from below is the above integral divided by S(d):
∞
2nd moment of the data truncated from below at d = ∫ t2 f(t) dt / S(d) =
d
(E[X2 ] + S(d)d2 - E[(X ∧ d)2 ] ) / S(d) = (E[X2 ] - E[(X ∧ d)2 ]) / S(d) + d2 .
Exercise: What is the second moment for a LogNormal Distribution truncated from below at d?
[Solution: {E[X2 ] + S(d)d2 - E[(X ∧ d)2 ] } / S(d)=
(exp(2µ + 2σ2) - exp(2µ + 2σ2)Φ[(ln(d) -µ - 2σ2)/σ] ) / {1 - Φ[(ln(d) - µ)/σ]} =
exp(2µ + 2σ2){1 - Φ[(ln(d) - µ - 2σ2)/σ]} / {1 - Φ[(ln(d) - µ)/σ] }.]
Exercise: What are the first and second moments for a LogNormal Distribution with parameters
µ = 3.8 and σ = 1.5, truncated from below at 1,000?
[Solution: first moment = e(d) + d =

exp(µ + σ2/2){1 - Φ[(ln(d) − µ − σ2)/σ] }/ {1 - Φ[(ln(d) − µ)/σ]} =
e4.925{1 - Φ[.5718]} / {1 - Φ[2.071] } = (137.7) (0.2837)/0.0191 = 2045.
second moment = exp(2µ +2σ2) {1- Φ[ln(d) -µ - 2σ2)/σ]} / {1- Φ[(ln(d) -µ)/σ] } =
e12.1{1 - Φ[-.928]} / {1 - Φ[2.071] } = (179,872) (0.8233)/ 0.0191 = 7.7 million.]
Higher moments can be computed using the formula:286
nth moment of the data truncated from below at d = (E[Xn ] + S(d)dn - E[(X ∧ d)n ]) / S(d).
In order to apply the method of moments to a two parameter distribution and data truncated from
below, one has to compute the first and second moments for the given distribution type when the
small losses are not reported and then one has to match the equations for these moments to the
observed values and solve (numerically).
For example, assume you have data truncated from below at 1000, with the observed first moment
is 3000 and the observed second moment is 71 million. Assume a LogNormal Distribution would
have fit the data prior to truncation from below. Then to apply the method of moments in order to fit a
LogNormal Distribution, one writes down two equations, matching the observed and theoretical
values:287
exp(µ + σ2/2) {1 - Φ[(ln1000 − µ − σ2)/σ] } / {1 - Φ[(ln1000 − µ)/σ]} = 3000
exp(2µ +2σ2) {1- Φ[(ln1000 -µ - 2σ2)/σ]} / {1- Φ[(ln1000 -µ)/σ] } = 71 million
These equations can be solved numerically,288 in this case µ = -0.112 and σ = 2.473.
Fitting to Data Truncated and Shifted from Below:289
When data is truncated and shifted from below at the value d, losses of size less than d are not in
the reported data base, and larger losses have their reported values reduced by d.
G(x) = { F(x+d) - F(d) } / S(d), x > 0.
g(x) = f(x+d) / S(d), x > 0.
286
For n =1 this reduces to e(L) + L. For n = 2 it reduces to the formula derived above.
The current formula is derived in the exact same way.
287
The right hand sides are the first and second moments of a LogNormal Normal truncated from below at 1000.
288
While one can not be asked to solve these equations on the exam, in theory you could be asked to set them up.
289
One can always convert data truncated and shifted from below to data truncated (and unshifted) from below or
vice versa. Thus one can work in whichever context is easier for you.
For example, one could fit to data that is truncated and shifted from below via method of moments.
If the original data (ground-up claim amounts) is assumed to follow:
F(x) = 1 - {250000/(250000+x)}α, and if the data is truncated and shifted at 150,000, then
1 - {250,000 / (400,000 + x)}α - {1 - (5 / 8)α }
G(x) = = 1 - {400,000/(400,000+x)}α, for x > 0.
1 - ( 1 - (5 / 8)α }
This happens to be a Pareto with scale parameter = 400,000, and shape parameter equal to α.
Thus the theoretical mean of the data truncated and shifted at 150,000 is: 400,000 / (α-1).
If one truncates and shifts at 150,000 the ungrouped data in Section 2, then the observed mean
turns out to be 534,550. This is 150,000 less than the observed mean of 684,550 for this data
truncated from below at 150,000. Setting the observed and theoretical mean equal:
400,000 / (α-1) = 534,550. Thus α = 1.75.
In general, if the original data prior to truncation (ground up claim amounts) is assumed to follow F(x),
then the mean of the data truncated and shifted at d is e(d), the mean excess loss at d.
So in our example where F(x) = 1 - {250000/(250000+x)}α, the mean excess loss of this Pareto
Distribution is: e(x) = (θ+x)/(α-1). e(150000) =(250000 + 150000)/(α-1) = 400,000 / (α-1),
as determined above.
Coinsurance Factors:
Since a coinsurance factor is applied last, we can remove its effects and convert to a question not
involving coinsurance. For example, if one has a coinsurance factor of 80%, we can divide each
payment by 0.80 in order to convert it to what it would have been in the absence of a coinsurance
factor. Then the fitting can be performed in the usual manner.
Fitting to Data Truncated from Above:
As discussed in a prior section, when data is truncated from above at the value L, losses of size
greater than L are not in the reported data base. The distribution function and the probability density
functions are revised as follows:
G(x) = F(x) / F(L), x ≤ L.
g(x) = f(x) / F(L), x ≤ L.
Mathematically, data truncated from above is parallel to truncation from below, and similar techniques
apply. As discussed in a subsequent section, fitting loss distributions to censored data is more
common in actuarial applications, than fitting to data truncated from above.
The theoretical mean of data truncated from above at L is:290
L L L
∫0 x g(x) dx = ∫0 x f(x) / F(L) dx = ∫0 x f(x) dx / F(L) = {E[X ∧ L] - L S(L)} / F(L).
For example, if assumes the original untruncated losses follow an Exponential:

F(x) = 1 - e-x/θ, then the mean of the data truncated from above at L is:
(E[X ∧ L] - L S(L) )/F(L) = ({θ(1 - e-L/θ) } - L e-L/θ )/ (1 - e-L/θ) =
θ - L e-L/θ/(1 - e-L/θ) = θ - L/(eL/θ - 1).
For example, the mean of the grouped data in Section 3 if it were truncated from above at 20,000
is: $64,897,000 / 7376 = $8798. If we assume that the original untruncated losses follow an
Exponential Distribution: F(x) = 1 - e-x/θ, then the method of moments would say that:
θ - 20000/(e20000/θ - 1) = 8798.
One could then numerically solve for θ ≅ 27,490.
290
The mean of the data truncated from above is similar to the Limited Expected Value. However, the Limited
Expected value includes contributions from all size losses, while the data truncated from above excludes large
losses. The Limited Expected Value consists of two terms. The contribution from the small losses is what is included
in the data truncated from above. The other term of L S(L) is the contribution of large losses, which are not recorded
when data is truncated from above. Thus the numerator of the mean of the data truncated from above is: E[X ∧ L] - L
S(L). The denominator is F(L) since losses of size greater than L are not recorded.
Problems:
21.1 (2 points) The following five ground up losses have been observed from policies each with a
deductible of 10: 60, 75, 80, 105, and 130.
⎛ 100 ⎞ α
We assume that the untruncated data follows the distribution function: F(x) = 1 - ⎜ ⎟ .
⎝ 100+ x ⎠
The method of moments is used to fit this distribution.

What is the estimate of the parameter α?
A. less than 2.2
E. at least 2.5

From a policy with a $10,000 deductible, you observe 6 claims with the following amounts paid by
the insurer: $22,000, $28,000, $39,000, $51,000, $80,000, and $107,000.
21.2 (1 point) You fit an Exponential Distribution to this data via the method of moments.
What is the fitted θ parameter?
(A) 40,000 (B) 45,000 (C) 50,000 (D) 55,000 (E) 60,000
21.3 (2 points) You fit a Pareto Distribution with α = 4 to this data via the method of moments.
What is the fitted θ parameter?
E. at least 160,000
21.4 (5 points) You fit a LogNormal Distribution with σ = 1.5 to this data via the method of
moments. What is the fitted µ parameter?
A. 9.0 B. 9.2 C. 9.4 D. 9.6 E. 9.8
21.5 (3 points) The size of accidents is believed to follow a distribution function:

F(x) = 1 - (50000 / x)q , for x > 50,000.
Suppose a sample of data is truncated from above at x = 150,000. The sample of 7 accidents is
then observed to be: 55,000, 60,000, 65,000, 70,000, 75,000, 80,000, and 90,000.
The parameter q is fit to this data using the method of moments.
Which of the following intervals contains the fitted value of q?
A. less than 2.4
E. at least 3.0
21.6 (2 points) From a policy with a $50 deductible and a 70% coinsurance factor, you observe 9
claims with the following amounts paid by the insurer: $28, $70, $105, $140, $203, $224, $350,
$504, $665. You assume the losses prior to the impact of the deductible and coinsurance factor
follow an Exponential Distribution F(x) = 1 - e-x/θ.
Using the Method of Moments, what is the fitted θ?
A. less than 370
E. at least 400
21.7 (2 points) Losses are uniform from 0 to ω.

After the application of a franchise deductible of 100, you observe the following four payments:
130, 370, 500, 750.
Determine ω using the method of moments.
21.8 (4, 5/87, Q.62) (2 points) Assume that the random variable x has the probability density
function:
f(x ; θ) = θ + 2(1 - θ)x , 0 ≤ x ≤ 1, with parameter θ, 0 ≤ θ ≤ 2.
Suppose that a sample is truncated at x = 0.60 so that values below this amount are excluded.
The sample is then observed to be
0.70, 0.75, 0.80, 0.80
Using the method of moments, what is the estimate for the parameter θ?
1
You may use ∫0.6 x f(x; θ) dx = 0.5227 - 0.2027θ.
A. Less than 1.00
E. 1.90 or more.
21.1. C. The distribution function for the data truncated from below at 10 is:
G(x) = {F(x) - F(10)} / {1-F(10)} = {(100/110)α - (100/(100+x))α } / (100/110)α
= 1 - {110/(100 + x)}α, x > 10.
The survival function is 1 from 0 to 10, and {110 /(100 +x)}α from 10 to ∞.
Therefore, the mean of truncated distribution is:
10 ∞
∫0 1 dx + 10∫ 110α / (100 + x)α dx = 10 + 110/(α - 1).

Set the mean of the truncated distribution equal to the mean of the truncated data:
(60 + 75 + 80 + 105 + 130)/5 = 90 = 10 + 110/(α - 1). Thus α = 19/8 = 2.375.
Alternately, the distribution function for the data truncated and shifted by 10 is:
G(x) = {F(x+10) - F(10)} / {1-F(10)} = {(100/110)α - (100/(110+x))α } / (100/110)α
= 1 - {110/(110 + x)}α. This is a Pareto Distribution, with parameters α and 110,

and mean 110 / (α - 1). The truncated and shifted data is the losses minus the deductible of 10.
Setting the mean of the truncated and shifted distribution equal to the observed mean payment of
80: 80 = 110 / (α−1). Thus α = 19/8 = 2.375.
Comment: The average claim for the data truncated and shifted by 10 is e(10), the mean excess
loss at 10. For the Pareto Distribution: e(x) = (x+θ) / (α-1), so
e(10) = (10 + 100) / (α-1) = 110 / (α-1).
21.2. D. After truncation and shifting, one gets the same Exponential Distribution, due to the
memoryless property of the Exponential. Therefore, matching means:
θ = ($22,000 + $28,000 + $39,000 + $51,000 + $80,000 + $107,000)/6 = $54,500.
Alternately, the mean after truncation and shifting is the mean excess loss, e(10000).
For the Exponential, e(x) = θ, so we set θ = observed mean = $54,500.
21.3. C. After truncation and shifting at 10,000, one gets:

G(x) = {F(x+10000) - F(10000)} / S(10000)
= {(1+10000/θ)-4 - (1+(x+10000)/θ)-4} / (1+10000/θ)-4 = 1 - (1+x/(θ+10000))-4,
which is a Pareto Distribution with α = 4 and θʼ = θ + 10000.
This second Pareto has mean: (θ + 10000)/(4-1).
Set this equal to the observed mean of 54,500 and solve for θ = 153,500.
Alternately, the mean after truncation and shifting is the mean excess loss, e(10000).
For the Pareto, e(x) = (θ + x)/(α−1). e(10000) = (θ + 10000)/3.
So we set (θ + 10000)/3 = observed mean = $54,500. Therefore, θ = $153,500.
21.4. C. For the LogNormal Distribution, E[X] = exp(µ + σ2/2) = exp(µ + 1.125) = 3.080eµ.
E[X ∧ 10000] = exp(µ + σ2/2) Φ[(ln(10000) - µ - σ2)/σ] + 10000S(10000) =

3.080eµΦ[4.640 − µ/1.5] + 10000{1 - Φ[6.140 - µ/1.5]}.
S(10000) = 1 - Φ[(ln(10000) - µ)/σ] = 1 - Φ[6.140 - µ/1.5].
The mean after truncation and shifting is the mean excess loss,
e(10000) = (E[X] - E[X ∧ 10000]) / S(10000) =
3.080eµ {1 - Φ[4.640 − µ/1.5]} / {1 - Φ[6.140 − µ/1.5]} - 10000.
We want e(10000) = observed mean = $54,500.
One needs to try all the given choices for µ in order to see which one satisfies this equation.
The equation is satisfied for µ ≅ 9.4.

mu Mean LEV[10000] S(10000) e(10000)
9.0 24,959 6612 0.4443 41,295
9.2 30,485 7036 0.4960 47,277
9.4 37,235 7438 0.5517 54,009
9.6 45,479 7809 0.6026 62,512
9.8 55,548 8150 0.6517 72,730
21.5. C. This is a Single Parameter Pareto, with density f(x) = q 50000q x−(q+1).
Adjusting the density for truncation from above:
g(x) = f(x) / F(150000) = q 50000q x−(q+1) / {1- 3−q}.
The mean of the data truncated from above is the integral of xg(x) from 50000 to 150000:
150,000 150,000
mean = ∫ x g(x) dx = ∫ {q 50,000q / (1- 3 - q )} x - q dx =
50,000 50,000
x = 150,000
⎛ q ⎞ q x - (q - 1) ⎤
= -⎜ ⎟ 50,000 ⎥
⎝ (q - 1) (1 - 3 - q )⎠ ⎦
x = 50,000
= (q / {(q-1)(1- 3−q)}) 50000q {50000−(q-1) - 150000−(q-1)} =
(q / {(q-1)(1- 3−q)}) 50000{1 - (1/3)q-1}.

The observed average is: 495000 / 7 = 70,714.
Thus one wants: 70714 = (q / {(q-1)(1- 3−q)}) 50000{1 - (1/3)q-1}. ⇒
1.414 = q{1 - (1/3)q-1} / {(q-1)(1- 3−q)}.

One can solve for q numerically as about 2.72 or just try values of q:
q q{1 - (1/3)^(q-1)} / { (q-1)(1- (1/3)^q)}
2.6 1.427
2.7 1.416
2.8 1.405
21.6. A. The mean payment is:

($28 + $70 + $105 + $140 + $203 + $224 + $350 + $504 + $665)/9= 254.33.
Prior to the effect of the coinsurance factor, the mean would be: 254.33/0.7 = 363.
After truncation and shifting, one gets the same Exponential Distribution, due to the memoryless
property of the Exponential. Therefore, matching means θ = 363.
Comment: Since the coinsurance factor is applied last, we can remove its effects and convert to a
question involving only a deductible.
21.7. The franchise deductible truncates the data from below at 100.
For ω > 100, the uniform truncated from below is uniform from 100 to ω, with mean (100 + ω)/2.
The mean payment is: (130 + 370 + 500 + 750)/4 = 437.5.
Set 437.5 = (100 + ω)/2. ⇒ ω = 775.
Comment: A potential problem with this method is that the fitted ω can be less than the maximum of
the sample, which is impossible.
21.8. D. The density function for the data truncated from below at 0.6 is g(x) = f(x) / S(0.6).
Integrating f(x) = θ + 2(1 - θ)x we get F(x) = θx + (1 - θ)x2, 0 ≤ x ≤ 1 .
Thus F(0.6) = 0.6θ + (0.36)(1-θ) = 0.36 + 0.24θ. Thus g(x) = f(x) / S(0.6) = f(x) / (0.64 - 0.24θ).
The mean of the truncated distribution is
1 1 1
∫0.6 x g(x) dx = 0.6∫ x f(x) dx / ( 0.64 - 0.24θ) = 0.6∫ θx + 2(1 - θ)x2 dx / ( 0.64 - 0.24θ) =
x =1
{θx2 / 2 + 2(1 - θ)x3 / 3} ] / (0.64 - 0.24θ) = (0.5227 - 0.2027θ) / (0.64 - 0.24θ).
x = 0.6
Setting this equal to the observed mean of (0.7 + 0.75 + 0.8 + 0.8)/4 = 0.7625, we have:
(0.5227 - 0.2027θ) / (0.64 - 0.24θ) = 0.7625. Thus θ = 0.0347 / 0.0197 = 1.76.
2016-C-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/23/15, Page 671
Section 22, Maximum Likelihood Applied to Truncated Data
One can apply the Method of Maximum Likelihood to truncated data, with the distribution assumed
to fit the data prior to truncation.291 Therefore one has to modify either the distribution or density
function for the effects of truncation.
Data Truncated from Below:
For data truncated from below at a value d, also called left truncated at d, losses of size d or less are
not reported.292
For example, if an insurance policy has a deductible of 1000, then the insurance company would
pay nothing unless a loss exceeds 1000. For a large loss, it would pay 1000 less than the loss.
Exercise: On a policy with a $1000 deductible, there are three losses of size: $700, $3000, $5000.
How much does the insurer pay?
[Solution: The insurer pays 0, 2000, and 4000, for a total of $6000.]
The data for this policy would be reported to the actuary in either of two ways:
• Two losses of sizes $3000 and $5000.
• Two payments of sizes $2000 and $4000.293
In either case, the actuary would not know how many losses there were of size $1000 or less, nor
would he know the exact sizes of any such losses.
For an ordinary deductible of size d, the losses have been truncated from below at d.
For a franchise deductible, the small losses would not be reported, and the insurer would pay all of
large loss. This data is truncated from below at the amount of the franchise deductible.
The Method of Maximum Likelihood can be applied to truncated data in order to fit distributions.
If one assumes that the original ground up losses followed F(x), then one gets the likelihood of the
data truncated from below using either G(x) for grouped data or g(x) for ungrouped data.
F(x)- F(d) f(x)

G(x) = , x > d. g(x) = , x > d.
S(d) S(d)
291
In a subsequent section, one assumes the truncated data is directly fit by a Single Parameter Pareto Distribution.
292
This is the usual way data from a policy with a deductible of d is reported.
293
The size of payments would be after the deductible has been subtracted.
The log density for data truncated from below at d is: ln (f(x)/S(d)) = ln[f(x)] - ln[S(d)].
For data truncated from below at d, a loss of size x greater than d contributes f(x) / S(d)
to the likelihood, or ln[f(x) / S(d)] to the loglikelihood.
Exercise: Prior to the effects of a 1000 deductible, the losses follow a distribution F(x), with density
f(x). There were two losses reported of sizes 3000 and 5000. Write down the likelihood.
[Solution: The first loss contributes: f(3000)/S(1000).
The second loss contributes: f(5000)/S(1000).
Thus the likelihood is: f(3000) f(5000) / S(1000)2 .
Comment: The solution would be the same if we had instead observed two payments of sizes
2000 and 4000.]
Exercise: In the previous exercise, determine the maximum likelihood fit of an Exponential
Distribution with mean θ.
[Solution: The likelihood is: (e-3000/θ/θ) (e-5000/θ/θ) / (e-1000/θ)2 = e-6000/θ/θ2.
The loglikelihood is: -6000/θ - 2lnθ. Thus, 0 = 6000/θ2 - 2/θ. ⇒ θ = 6000/2 = 3000.]
In general, the method of maximum likelihood applied to ungrouped data truncated from below
requires that one maximize:294
Σ ln(f(xi)) - N ln(S(d))
For the Exponential Distribution, one can solve in closed form for the method of maximum likelihood
applied to data truncated from below at d as follows:
Σ ln(f(xi)) - N ln(S(d)) = Σ {-ln(θ) -xi / θ} - N(-d/θ).
Taking the derivative of the loglikelihood with respect to θ and setting it equal to zero:
0 = Σ {-1/θ + xi/θ2 } - Nd/θ2 , therefore Σθ = Σxi - Nd. ⇒ θ = Σ(xi -d)/ N = average payment.
Thus for the Exponential Distribution, the method of moments and the method of maximum
likelihood applied to ungrouped data give the same result, as was the case in the absence of
truncation from below.
For more complicated distributions, in order to maximize the likelihood, one must use numerical
methods.
294
As always, one can maximize either the likelihood or the loglikelihood. When fitting to ungrouped data one works
with the density, while when fitting to grouped data one works with the distribution function.
⎛ 1 ⎞α
For example, fitting the Burr distribution, F(x) = 1 - ⎜ γ ⎟ , via maximum likelihood to
⎝ 1 + (x / θ) ⎠
ungrouped data truncated from below at d, one would need to maximize:
Σ ln(f(xi)) - N ln(S(d)) = Σ ln(αγ(xi/θ)γ(1 + (xi/θ)γ)−(α + 1) /xi) - N n((1/(1 + (d/θ)γ))α) =

Σ {ln(α) + ln(γ) + (γ−1)ln(xi) − γln(θ) - (α+1)ln(1 + (xi/θ)γ)} - Nαln(1 + (d/θ)γ)) =
(γ−1)Σln(xi) - (α+1)Σ ln(1 + (xi/θ)γ) + N ln(α) + N ln(γ) - Nγln(θ) - Nαln(1+ (d/θ)γ)).
Numerically maximizing this loglikelihood for the ungrouped data in Section 2 truncated from below at
d = 150,000 gives parameters:295 α = 1.142, θ = 2.917 x 105 , γ = 1.460.
The survival function of the fitted Burr Distribution adjusted for truncation from below at 150,000 is
compared to the empirical survival function (thick) for the data truncated from below at 150,000:
1.00
0.50
0.20
0.10
0.05
0.02
x (000)
200 500 1000 2000 5000
The Burr distribution seems to fit fairly well.
One should apply appropriate caution in inferring the behavior of the size of loss distribution
below the truncation point from a curve fit to data above a truncation point.
295
One would use the exact same numerical techniques to maximize this loglikelihood as discussed in the absence
of truncation from below.
Data Truncated and Shifted From Below:
Similarly one could apply the Method of Maximum Likelihood to data truncated and shifted from
below. If one assumes that the original ground up losses followed F(x), then one gets the likelihood
of the data truncated and shifting using either G(x) for grouped data or g(x) for ungrouped data.
F(x + d) - F(d) f(x + d)

G(x) = , x > 0. g(x) = , x > 0.
S(d) S(d)
Exercise: Assume the losses in the grouped data in Section 3 were truncated and shifted from
below at 50,000. Then how would this data have been recorded?
[Solution:
Interval ($000) Number of Accidents Loss on Accidents in the Interval ($000)
0-25 254 2,603
25-50 57 2,043
50 - ∞ 33 2,645 ]
If one assumed the losses prior to modification followed a distribution F, then the likelihood of this
data truncated and shifted from below at 50,000 would be:
{G(25,000) - G(0)}254 {G(50,000) - G(25,000)}57 {1 - G(50,000)}33 =
{F(75,000) - F(50,000)}254 {F(100,000) - F(75,000)}57 {1-F(100,000)}33 / S(50000)344.
Comparing Apples to Apples:
Let us assume we have data that has been truncated from below at 500 by a deductible.
Then there are an unknown number of small losses missing from the data. There is no way to adjust
the data to what it would have been in the absence of a deductible.
If we assume that the ground up losses prior to the effect of the deductible would have followed
distribution function F(x), then the data truncated from below would follow instead distribution function:
F(x)- F(500)
G(x) = , x > 500.
S(500)
Thus when fitting to truncated data, we adjust the distribution so we are comparing a truncated
distribution to truncated data.296 We wish to compare apple to apples, rather than apples to
oranges. As will be discussed subsequently, when fitting to data censored from above, we need to
adjust the distribution function to what it would have been after the effects of censoring.
296
As discussed in the next section, the Single Parameter Pareto Distribution is an exception.
Data Truncated from Above:
As discussed in a prior section, when data is truncated from above at the value L, losses of size
greater than L are not in the reported data base. The distribution function and the probability density
G(x) = F(x) / F(L), x ≤ L.
g(x) = f(x) / F(L), x ≤ L.
The loglikelihood for grouped data truncated from above at a value L is:
Σni ln[G(ci+1) - G(ci)] =
Σni {ln[(F(ci+1) - F(ci))/F(L)]} = Σni ln(F(ci+1) - F(ci)) - Σni ln[F(L)] =
Σni ln[F(ci+1) - F(ci)] - n ln[F(L)].
where ni are the observed number of losses in the ith interval [ci, ci+1], and Σni = n.
For example, assume the grouped data in Section 3 were truncated from above at 20,000.
Then the data would have been recorded as:
Interval ($000) Number of Accidents Loss on Accidents in the Interval ($000)
0-5 2208 5,974
5 -10 2247 16,725
10-15 1701 21,071
15-20 1220 21,127
Thus the loglikelihood would be:

(2208) ln[F(5000)] + (2247) ln[F(10000) - F(5000)] + (1701) ln[F(15000) - F(10000)]
+ (1220) ln[F(20000) - F(15000)] - (2208+2247+1701+1220) ln[F(20000)].
This could be maximized by the usual numerical techniques in order to solve for the fitted
parameter(s) of the assumed distribution.
Data Truncated from Below and Above:297
For data truncated from below at d and truncated from above at L, the distribution and density
F(x) - F(d) f(x)
G(x) = , d < x ≤ L. g(x) = , d < x ≤ L.
F(L) - F(d) F(L) - F(d)
For ungrouped data, the likelihood to be maximized is the product of these densities at the
∏ f(xi)
observed points: . Each term in the product has in the denominator the
F(L) - F(d)
probability remaining after truncation.
Frequency Distributions:
Frequency Distributions can be truncated from below and/or truncated from above.
For example, assume that the time lag to settle a claim is distributed via a Poisson Distribution with
mean λ: f(t) = e−λλ t/t!, t = 0, 1, 2, ...
If for some reason we only know how many claims were settled at time lags up to 3, then the data is
truncated from above:
g(t) = f(t) / {f(0) + f(1) + f(2) + f(3)} = (λt/t!) / (1 + λ + λ2/2 + λ3/6), t = 0, 1, 2, 3.
Exercise: You observe the following numbers of claims settled at time lags 0, 1, 2, and 3:
7, 10, 15, 8.
What is the loglikelihood for a Poisson with λ = 4?
[Solution: g(t) = 0.04225 4t/t!. g(0) = 0.0423, g(1) = 0.1690, g(2) = 0.3380, g(3) = 0.4507.
The loglikelihood is: 7 ln(g(0)) + 10 ln(g(1)) + 15 ln(g(2)) + 8 ln(g(3)) = -62.6.]
297
See 4B, 5/99, Q.25 and the related 4B, 11/99, Q.6.
Below is a graph of the loglikelihood as a function of λ. The maximum likelihood is for λ = 2.04.
loglikelihood
- 55
- 60
- 65
- 70
- 75
- 80
- 85
lambda
1 2 3 4 5
Normally, one would use numerical methods in order to solve for the maximum likelihood fit.
However, with only 2 time intervals one can solve algebraically.
Exercise: You observe 7 claims settled at a time lag of 0, and 10 claims settled at a time lag of 1.
The number of claims settled at lags of 2 or more are unknown.
What is the maximum likelihood Poisson?
[Solution: g(0) = f(0) / {f(0) + f(1)} = 1/(1 + λ). g(1) = f(1) / {f(0) + f(1)} = λ/(1 + λ).
loglikelihood = 7 ln[g(0))] + 10 ln[g(1)] = -7 ln(1+λ) + 10 ln(λ) - 10 ln(1+λ).
Setting the partial derivative with respect to λ equal to zero: -17/(1+λ) + 10/λ = 0. ⇒ λ = 10/7.]
One could also solve if for example the data were only available for lags 1 and 2.
In this case, the data has been truncated from below as well as from above.
Exercise: You observe 9 claims settled at a time lag of 1, and 12 claims settled at a time lag of 2.
The number of claims settled at other lags are unknown. What is the maximum likelihood Poisson?
[Solution: g(1) = f(1) / {f(1) + f(2)} = e−λλ / (e−λλ + e−λλ 2/2) = 2/(2 + λ).
g(2) = f(2) / {f(1) + f(2)} = λ/(2 + λ).
loglikelihood = 9 ln[g(1))] + 12 ln[g(2)] = 9 ln(2) - 9 ln(2+λ) + 12 ln(λ) - 12 ln(2+λ).
Setting the partial derivative with respect to λ equal to 0, -21/(2 + λ) + 12/λ = 0. ⇒ λ = 8/3.]
One can combine the loglikelihoods from different years, assuming the expected settlement pattern
is the same for each year. For example, assume the following claims settlement activity for a book
of claims as of the end of 1999:298
Number of Claims Settled

Year Year Settled
Reported 1997 1998 1999
1997 Unknown 13 11
1998 15 12
1999 14
Assuming the expected claim settlement pattern is Poisson (with the lag in years), with the same λ
for each report year. Then the loglikelihood is the sum of the loglikelihoods from each year.
For report year 1997, with data for only lags 1 and 2, the loglikelihood is:
13 ln[g(1)] + 11 ln[g(2)] = 13 ln[2/(2 + λ)] + 11 ln[λ/(2 + λ)] =
13 ln(2) + 11 ln(λ) - 24ln(2 + λ).
For report year 1998, with data for only lags 0 and 1, the loglikelihood is:299
15 ln[g(0)] + 12 ln[g(1)] = 15 ln[1/(1 + λ)] + 12 ln[λ/(1 + λ)] = 12 ln(λ) - 27 ln(1 + λ).
For report year 1999, the loglikelihood is: 14 ln[g(0)] = 14 ln(1) = 0.
The total loglikelihood is: 13 ln(2) + 11 ln(λ) - 24ln(2 + λ) + 12 ln(λ) - 27 ln(1 + λ).
Setting the derivative of the loglikelihood with respect to λ equal to zero:

23/λ - 24/(2 + λ) - 27/(1 + λ) = 0. ⇒ 28λ2 + 9λ - 46 = 0.
λ = 1.13, is the maximum likelihood Poisson.
The Geometric Distribution, has the same memoryless property as the Exponential Distribution.
Therefore, under truncation from below, one gets something that looks similar to the original
distribution.
Exercise: What is the density for a Geometric Distribution, after truncation from below at zero?
[Solution: f(t) = βt/(1 + β)t+1. g(t) = f(t)/(1 - f(0)) = {βt/(1 + β)t+1} / {1 - 1/(1 + β)} =
β t-1/(1 + β)t, t = 1, 2, 3 ... ]
298
Similar to 4, 5/01, Q.34. See also somewhat simpler 4, 11/04, Q.32.
299
While for simplicity I have used the same letter h for years 1997 and 1998, the truncation points are different, and
therefore the densities after truncation are different.
Assume the following settlement activity for a book of claims:

Number of Claims Settled
Year Year Settled
Reported 1 2 3
1 Unknown 123 37
2 300 111
3 282
Exercise: Assume the expected settlement pattern is a Geometric Distribution as per Loss Models.
Given the above settlement activity, what is the contribution to the loglikelihood from report year 2?
[Solution: f(t) = βt/(1 + β)t+1. g(t) = f(t)/(f(0) + f(1)) = {βt/(1 + β)t+1} / {1/(1 + β) + β/(1 + β)2 } =
β t(1 + β)1-t/(1 + 2β). g(0) = (1 + β)/(1 + 2β). g(1) = β /(1 + 2β). Thus the contribution to the
loglikelihood is: 300 ln(g(0)) + 111 ln(g(1)) = 300 ln(1 + β) + 111 ln(β) - 411 ln(1 + 2β). ]
Exercise: Assume the expected settlement pattern is a Geometric Distribution as per Loss Models.
Given the above settlement activity, what is the contribution to the loglikelihood from report year 1?
[Solution: f(t) = βt/(1 + β)t+1. g(t) = f(t)/ {f(1) + f(2)} = {βt/(1 + β)t+1} / {β/(1 + β)2 + β 2 /(1 + β)3 }.
g(1) = (1 + β)/(1 + 2β). g(2) = β /(1 + 2β). Thus the contribution to the loglikelihood is:
123 ln(g(1)) + 37 ln(g(2)) = 123 ln(1 + β) + 37 ln(β) - 160 ln(1 + 2β). ]
Assuming a Geometric Distribution, the overall loglikelihood is:
300 ln(1 + β) + 111 ln(β) - 411 ln(1 + 2β) + 123 ln(1 + β) + 37 ln(β) - 160 ln(1 + 2β) + 0 =
423 ln(1 + β) + 148 ln(β) - 571 ln(1 + 2β).
Setting the derivative of the loglikelihood with respect to λ equal to zero:
423/(1 + β) + 148/β - (2)(571)/(1 + 2β) = 0. ⇒ -275β + 148 = 0.

β = 148/275 = 0.538, is the maximum likelihood fit.
Here is a graph of the loglikelihood as a function of β:
Loglikelihood
- 327
- 328
- 329
- 330
beta
0.4 0.5 0.6 0.7 0.8
Some of you would have found it convenient to reparameterize the Geometric Distribution as
f(t) = (1-p)pt, with p = β/(1+β).300
In that case the loglikelihood would be: 148 ln(p) - 571 ln(1+ p).
The maximum likelihood p = 148/ (571 - 148) = 148/423.
Which implies β = p/(1 - p) = 148/275, matching the result above.
300
See 4, 5/01, Q.34.
An Example of a Practical Application to Hurricane Data:
As a practical example of fitting to severity data truncated from below, consider the following data on
losses from 164 hurricanes, all adjusted to a common level.301
Losses are in thousands of dollars:
502, 2439, 3954, 5717, 8223, 8796, 9005, 9557, 10278, 10799, 11414, 11446,
11837, 12256, 13296, 13449, 15474, 16208, 17116, 17658, 17866, 17976, 18497,
18825, 18891, 18946, 19065, 19357, 19612, 19929, 20289, 20416, 27091, 28690,
29419, 31069, 31388, 31447, 32860, 34636, 37659, 41277, 41746, 42825, 43577,
47299, 55046, 58145, 58548, 61970, 63152, 63918, 64533, 65024, 65139, 66228,
67732, 69972, 71158, 75739, 75760, 76846, 86278, 87098, 96877, 118642,
122518, 127951, 132787, 133682, 133959, 147702, 155351, 169071, 189781,
189900, 192283, 194630, 194890, 208433, 217219, 224907, 275001, 283869,
293910, 305313, 348405, 356558, 362090, 366142, 368245, 393073, 400501,
403169, 438296, 465074, 484223, 525681, 534237, 547711, 596026, 605316,
643598, 646193, 662658, 668635, 687544, 696402, 775971, 783072, 825054,
836911, 888088, 894836, 923918, 942310, 956927, 970828, 1028039, 1119560,
1163819, 1176396, 1191386, 1270333, 1356989, 1371030, 1378549, 1435127,
1460391, 1624995, 1650468, 1709809, 1755434, 1910703, 1979274, 2087738,
2124106, 2584891, 2728296, 2735157, 2853627, 2949789, 3096434, 3476218,
3686521, 3746855, 3762550, 3912101, 4568366, 4709959, 5432151, 5529261,
5855343, 6265912, 7976601, 8196810, 9816472, 9965606, 10009409, 11518111,
16146375, 16485683, 24486691, 49728840.
While this data has not been truncated from below at a specific value, it has still been restricted to
larger storms. Only those large tropical storms in which winds attain speeds of at least 74 miles per
hour are labeled hurricanes.302 While this is a useful distinction for practical purposes, this dividing line
is somewhat arbitrary. There is a continuum from weak tropical storms, to those just below hurricane
strength, to those hurricanes of category 1, up to those of category 5. However, this data base
excludes any tropical storm which did not make it to hurricane strength. Thus the data has been in
some sense truncated from below.
In order to analyze the severity I have truncated from below at 10,000 (meaning $10 million in
insured losses), leaving 156 severities. Then in each case I will assume the underlying density prior
to truncation was f(x), while the density after truncation is: f(x)/S(10,000).
Therefore, the loglikelihood is: Σln(fxi) - N ln[1-S(10,000)].
301
These are insured losses for hurricanes hitting the continental United States from 1900 to 1999. The reported
losses have been adjusted to a year 2000 level for inflation, changes in per capita wealth (to represent the changes
in property value above the rate of inflation), changes in insurance utilization, and changes in number of housing
units (by county). See “A Macro Validation Dataset for U.S. Hurricane Models”, by Douglas J. Collins and Stephen P.
Lowe, CAS Forum, Winter 2001.
302
I have seen the definition of hurricane vary by one m.p.h.
As a first step I graphed the empirical mean excess loss (mean residual life):
e(x) ($ million)
20000
15000
10000
5000
x ($million)
2000 4000 6000 8000 10000
The mean excess loss is increasing, perhaps a little less than linearly.303 Thus I tried to fit, distributions
with increasing mean excess losses. The best fits via maximum likelihood were:
Distribution Parameters Negative Loglikelihood
LogNormal µ = 11.9113, σ = 2.53218 2282.56
Weibull θ = 224,510, τ = 0.327437 2279.32
Mixed θ = 400,847, τ = 0.370266, 2271.00

Weibull-LogNormal µ = 9.84798, σ = 0.0461508,
p = weight to Weibull = 0.94854
5 Point Mixture θ1 = 6453, θ2 = 44,138, θ3 = 667,790, 2276.42
of Exponentials θ4 = 3,921,035, θ5 = 18,114,718,

p 1 = 0.239005, p2 = 0.256331,
p 3 = 0.304175, p4 = 0.172696
The Pareto was too heavy-tailed to fit this data, while distributions like the Weibull for τ < 1, and the LogNormal,
303
whose mean excess losses increase less than linearly, fit well.
Based on the loglikelihoods, the mixture of a Weibull and a LogNormal seems to be the best fit.
Exercise: Use the Loglikelihood Ratio test in order to test the fit of the mixed Weibull-LogNormal.
[Solution: The 2 parameter Weibull Distribution is a special case of the 5 parameter
Weibull-LogNormal Distribution. Thus we compare (2)(2279.32 - 2271.00) = 16.64 to a
Chi-Square Distribution with 5 - 2 = 3 degrees of freedom. Since 16.64 > 12.838, we reject at
1/2% the hypothesis that the simpler Weibull Distribution should be used, and use instead the
mixed Weibull-LogNormal Distribution.
Comment: Since the 2 parameter LogNormal Distribution has a worse loglikelihood than the
Weibull, the mixed Weibull-LogNormal fares even better in comparison to the LogNormal than it did
to the Weibull.]
One could also apply the Schwarz Bayesian Criterion:
Distribution # Parameters Loglikelihood Penalty304 Penalized Loglikelihood

LogNormal 2 -2282.56 5.05 -2287.61
Weibull 2 -2279.32 5.05 -2284.37
Mixed 5 -2271.00 12.62 -2283.62
Weibull-LogNormal
5-Point Mixture 9 -2276.42 22.72 -2299.14
of Exponentials
Based on the Schwarz Bayesian Criterion, the best fit is the mixed Weibull-LogNormal, followed
by the Weibull, LogNormal, and the 5 point mixture of Exponentials.
One can also compute the Kolmogorov-Smirnov and Anderson-Darling statistics:

Distribution K-S Statistic Anderson-Darling Statistic
LogNormal 6.84% 1.089
Weibull 6.32% 0.610
2 point mixture of Weibull & LogNormal 4.72% 0.257
5 point mixture of Exponentials 3.63% 0.118
Using the K-S statistics, all of these fits are not rejected at 20%, since the critical value for 156 data
points is: 1.07/ 156 6 = 8.57%. The 9 parameter 5-point mixture of Exponentials has a somewhat
better (smaller) K-S statistic than the 5 parameter mixed Weibull-LogNormal.
Using the Anderson-Darling statistics, all of these fits are not rejected at 10%, since the critical value is
1.933. The 9 parameter 5-point mixture of Exponentials has the best (smallest) Anderson-Darling
statistic.
304
penalty = (# of parameters) ln(# points)/2 = (# of parameters) ln(156)/2 = (# of parameters)(2.525).
Here are graphs of the differences between the empirical and fitted distributions, out to $1 billion:305
Empirical Distribution Function minus Maximum Likelihood LogNormal:
D(x)
0.06
0.04
0.02
x ($million)
20 50 100 200 500 1000
- 0.02
- 0.04
- 0.06
305
With the x-axis on a log scale and only shown up to $1 billion. Since one is likely to be very interested in the
righthand tail, for the two best fits, below are also shown difference graphs from $1 billion to $100 billion in size.
Empirical Distribution Function minus Maximum Likelihood Weibull:
D(x)
0.06
0.04
0.02
x ($million)
20 50 100 200 500 1000
- 0.02
- 0.04
Empirical Distribution Function minus Maximum Likelihood 2 point mixture of Weibull & LogNormal:
D(x)
0.04
0.02
x ($million)
20 50 100 200 500 1000
- 0.02
Empirical Distribution Function minus Maximum Likelihood 5 point mixture of Exponentials:
D(x)
0.03
0.02
0.01
x ($million)
20 50 100 200 500 1000
- 0.01
- 0.02
Note that the K-S Statistic of 0.0363 for the 5-point mixture of Exponentials, corresponds to the
maximum distance this difference curve gets from the x-axis, either above or below.
Graphs of the differences between the empirical and fitted distributions, out to $100 billion:
Empirical Distribution Function minus Maximum Likelihood 2 point mixture of Weibull & LogNormal:
D(x)
0.015
0.010
0.005
x ($billion)
2 5 10 20 50 100
- 0.005
Empirical Distribution Function minus Maximum Likelihood 5 point mixture of Exponentials:

D(x)
0.010
0.005
x ($billion)
2 5 10 20 50 100
- 0.005
- 0.010
The nth moment for a distribution truncated from below at 10,000 (meaning $10 million in insured
losses) is:
∞
∫ xn f(x) dx
10,000
.
S(10,000)
Using a computer to compute these integrals,306 I obtained the first four moments, which in turn were
used to compute the Coefficient of Variation, Skewness, and Kurtosis:
Data 5pt. Exp. Weibull LogNormal Weib-LogN

Mean 1.841 1.849 2.068 4.289 2.019
($ billion)
CV 2.75 2.87 3.73 22.8 3.24
Skewness 6.46 8.27 17.68 13955 13.01
Kurtosis 55 110 833 1.2 x 1011 405
The LogNormal is much too heavy-tailed in comparison to the data! The 5-point mixture of
Exponentials is similar to the data, with the mixture of the Weibull and LogNormal less so.307
Based on the empirical data over the last century, the frequency of hurricanes hitting the continental
U.S. with more than $10,000,000 in insured loss (equivalent to 10,000 in the data base) is:
156/100 = 1.56.308 Using any of the fitted distributions, the chance that such a hurricane will have
damage greater than x is: S(x)/S(10,000).
Exercise: Using the fitted Weibull Distribution, what is the expected frequency of hurricanes with
damage > $10 billion?
[Solution: Given a hurricanes of size > 10,000, the probability it is of size > 10,000,000 (equivalent
to $10 billion) is: S(10,000,000) / S(10,000) =
exp[-(10000000/224,510)0.327437]/exp[-(10000/224,510)0.327437] = 0.0312/0.6970 = 4.48%.
Thus the expected frequency per year is: (4.48%)(1.56) = 6.99%.]
306
Using Mathematica.
307
The empirical skewness is very sensitive to the presence or absence of a single very large hurricane.
The empirical kurtosis is extremely sensitive to the presence or absence of a single very large hurricane.
308
This point estimate is subject to error.
2016-C-6, Fitting Loss Distributions §22 Maximum Like.

H Mahler Exam C Study Guides, 2016

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

H Mahler Exam C Study Guides, 2016

Încărcat de

Drepturi de autor:

Formate disponibile

hmahler@mac.

com, Exam C Study Guides, 10/27/15, Page 1

Introduction to 2016 Exam C Study Guides

Mahlerʼs Guides for Exam C have 14 parts,

Study Guides for Exam C

1 6% Mahler's Guide to Frequency Distributions

My Practice Exams are sold separately.

My Seminar Style Slides are sold separately.

Syllabus Changes for the October 2013 Exam:

Syllabus Changes for the June 2015 Exam:

The Anderson-Darling test has been dropped.

Howard C. Mahler is a Fellow of the Casualty Actuarial Society,

Loss Models, Fourth Edition9 Mahler Study Guides

Chapter 3.1 Loss Distributions: Sections 2-4, 7-8.

Chapter 4 Loss Distributions: Sections 21, 38.

Chapter 5.2 Loss Distributions: Sections 29, 39, 40.

Chapters 6 Frequency Distributions: Sections 1-6, 9, 11-14, 19.

Chapter 9.1-9.7, 9.8.1-9.8.210 Aggregate Distributions

Chapter 11 Loss Dists.: Section 4, Fitting Loss Dists.: Sections 5-6,

Chapter 12 Survival Analysis: Sections 1-3, 5-6, 8-11.

Chapter 17.2-17.7 Classical Credibility

Chapter 20 Simulation, Risk Measures Section 7

A 14 week Study Schedule for Exam C:

2. Start of Loss Distributions: sections 1 to 30.

3. Rest of Loss Distributions: Remainder.

5. Fitting Frequency Distributions

7. Start of Fitting Loss Distributions: sections 1 to 10.

8. Rest of Buhlmann Credibility and Bayesian Analysis: Remainder.

9. More Fitting Loss Distributions: sections 11 to 18.

10. Rest of Fitting Loss Distributions: Remainder.

11. Conjugate Priors

12. Survival Analysis

13. Semiparametric Estimation

During the last several weeks do my practice exams, sold separately.

Feel free to send me any questions or suggestions:

Please send me any suspected errors by Email.

Pass Marks and Passing Percentages for Past Exams:22

Pass Number Effective Number Number Percent % Effective

Spring 2010 66% 1674 1559 702 41.9% 45.0%

Feb. 2011 64% 1470 1304 598 40.7% 45.9%

Feb. 2012 67% 1461 1318 665 45.5% 50.5%

Feb. 2013 67% 1728 1579 786 45.5% 49.8%

Feb. 2014 67% 1669 1513 753 45.1% 49.8%

Feb. 2015 67% 1741 1600 869 49.9% 54.3%

Study Aid 2016-C-1

Mahlerʼs Guide to Frequency Distributions

Highly Recommended problems are double underlined.

Section # Pages Section Name

11 193-214 (a, b, 0) Class of Distributions

Past Exam Questions by Section of this Study Aid2

Course 3 Course 3 Course 3 Course 3 Course 3 Course 3 CAS 3 SOA 3 CAS 3

This material was moved to Exam 4/C in 2007.

Frequency Distributions are discrete functions on the nonnegative integers: 0, 1, 2, 3, ...

There are three named frequency distributions you should know:

In addition, one can make up a frequency distribution.

Section 2, Basic Concepts

One can calculate the moments of such a distribution.

For example, the first moment or mean is:

E[X] = Σ i f(i) = Average of X = 1st moment about the origin = 6.1.

E[X2 ] = Σ i2 f(i) = Average of X2 = 2nd moment about the origin = 54.9.

The second moment is:

Mean = E[X] = 6.1.

Variance = second central moment = E[(X - E[X])2 ] = E[X2 ] - E[X]2 = 17.69.

Standard Deviation = Square Root of Variance = 4.206.